Welcome back to part two of this series on using statistical arbitrage to develop mean reversion trading strategies, also known as StatArb.
My name is Leo Smigel, and I enjoy the puzzle of creating algorithmic trading strategies, which I write about at analyzingalpha.com.
Previously, we discussed what cointegration is and how it can help us potentially profit in the market. Part 1 of the series can be found below:
This post will learn how to create a statistically significant mean reversion strategy that uses more than two assets.
Start With Why
When developing algorithmic trading strategies, It's important to understand why the strategy should work and not that it "just does" to reduce overfitting and increase conviction.
Let's discuss an example put forth by Ernie Chan in his book Algorithmic Trading: Winning Strategies and Their Rationale.
The price and profitability of gold miners are highly dependent on the price of gold. If the price of gold goes up, gold miners are more profitable. When the price of gold goes down, they make less money. This relationship is easy to understand and makes intuitive sense -- and it's also backed up by the data.
Between May 23, 2006, and July 14, 2008, gold prices (GLD) and gold miners (GDX) cointegrate with 99% probability!.. until they didn't.
Gold and GDX lost their cointegration. But why?
The price of black gold, also known as oil, skyrocketed around that period. And since extracting gold uses a lot of oil, it hurt the gold miner's profitability.
Makes sense, right?
The good news is that we can adjust our strategy to add USO to our pair, which is also known as a triplet.
The Cointegrated Augmented Dicky-Fuller test won't work for us. We need a test that can use more than two assets -- enter the Johansen test.
The Johansen test allows us to test for cointegration for multiple time series. And while we won't get into how this vector error correction model works, I suggest you get a deep understanding if you start using Johansen tests to develop live trading strategies.
Let's start with grabbing our data from Alpaca and aligning the dates.
Let's visualize the data to see if everything looks correct.
Hmmm, it looks like there was a significant price change in oil in late April. More on this later.
We get the following output:
We're looking for our trace statistic currently 20.089, to be above the critical values at the 90%, 95%, and 99% threshold. Unfortunately, they're not. So while this triplet strategy used to work, it doesn't work now, or does it?
The Challenges of Live Data
The problem with live data is that it isn’t adjusted. When I look at the data adjusted for splits, dividends, and corporate actions, it looks vastly different.
Look at the jump on the 28th - 29th, it appears as though there was an 8-for-1 reverse split on April 29th.
With the adjusted data, it appears as though we can reject that the assets are not cointegrated with over 95% certainty!
In other words, you have to develop and backtest on adjusted data, and trade on live data. So if we were trading this strategy, what would that look like?
If you've made it this far, you've probably realized determining if assets are cointegrated using Python is the easy part -- the hard part is finding cointegrations and wrangling the data.
I will leave this up to you to develop a trading strategy around this cointegrating triplet. I suggest using Pandas to prototype, and then use Backtrader or QuantConnect to further analyze and live trade.
Also, if you enjoy algorithmic trading, please check out Analyzing Alpha for more information on mean reversion strategies -- there’s a lot more to them.
And as always, all of the code is hosted on GitHub.
Technology and services are offered by AlpacaDB, Inc. Brokerage services are provided by Alpaca Securities LLC (alpaca.markets), member FINRA/SIPC. Alpaca Securities LLC is a wholly-owned subsidiary of AlpacaDB, Inc.