Understanding What Market Data You Are Using
An important but often under-appreciated facet of algo trading is market data. After all, data is what an algorithm uses to make buying and selling decisions. And in many cases, that data will consist of stock prices and volumes. But when it comes to the U.S. stock market, are all data feeds the same? No! There are many variants of market data being distributed, each with its own cost and attributes, and it’s incredibly valuable to understand what market data you are using.
For that reason, I’d like to shed some light on the differences between U.S. stock market data feeds, which can broadly be broken down into consolidated feeds and direct exchange feeds.
Here is a link to the Market Data section on Alpaca Docs.
Consolidated Market Data Feeds
Consolidated stock market data is an aggregated reporting of all securities exchanges’ and alternative trading venues’ price and volume data. It is the most relied upon type of market data, providing investors and traders globally with a unified view of U.S. stock market prices and volumes. It also underpins the National Best Bid and Offer (NBBO), which provides investors with a continuous view of the best available displayed buy and sell prices, and through Rule 611 ensures that investors receive the best available displayed prices on their trades, with a few exceptions.
Consolidated reporting has been standardized since the 1970s, when Congress directed the Securities and Exchange Commission (“the SEC”) to establish a “National Market System” for securities trading. Specifically, data collection, aggregation, and distribution is formalized in the Consolidated Tape Association (CTA) Plan and the Unlisted Trading Privileges (UTP) Plan. These plans are administered by the Consolidated Tape Association and UTP Plan market participants, and they dictate that each exchange is required to send its trade and quote information to the Securities Information Processor, also known as the SIP (of which there are two, one for Tape A/B and one for Tape C). Additionally, dark pools are required to report their transactions to a FINRA Trade Reporting Facility, which in turn also report to the SIP. The SIPs then aggregate all of this information and disseminate it to investors, traders, and data distributors. Rather than focus on SIP details (you can read more about the CTA Plan here and the UTP Plan here), I’d like to share more practical insights regarding consolidated data:
- Consolidated data is not suitable for latency-sensitive trading. Although SIP technology has been upgraded over the years, it still takes time to collect, process, and distribute exchange and dark pool data coming from different data centers. This becomes especially noticeable during fast moving markets where a large number of messages are generated. In other words, if you need to react as fast as possible, you can’t count on consolidated data!
- Consolidated quote data only provides “top of book” quotes. You can only see each exchange’s best bid and ask prices and quantities along with the NBBO. Orders placed deeper in the order book at inferior prices to the best bid and ask on each exchange are not reported in consolidated feeds. These deep book orders may provide valuable information to make better trading decisions.
- Consolidated quote data currently only includes round lots. Although both SEC staff and industry participants have advocated for changing this rule, currently odd lots are not eligible to set the best bid or offer. This may result in an incomplete picture of the best priced liquidity, particularly for higher priced stocks like Amazon where intra-BBO odd lots are frequently displayed.
- Consolidated trade data includes all last sale and last size information. Regarding actual transactions, consolidated data does a good job of including every trade, including odd lots.
- Consolidated data is not necessarily raw SIP data. There are many data distributors, including but not limited to Bloomberg, Reuters, Xignite, and broker-dealers. The consolidated data they distribute may not actually be SIP data, but some derivative or variant thereof. Distributors might throttle and bunch trades and quotes in snapshots. Or they might have symbol or symbol count limitations. Or they might have frequent outages or significant delays in distribution. Or they might have bad “ticks”. Or they might exclude one or more exchanges and create their own version of a consolidated feed. These issues are typically either due to limitations in the data distributor’s infrastructure or due to source data distribution costs.
In summary, consolidated market data feeds provide a useful aggregated reporting of trades and quotes. Although it isn’t perfect, the full SIP feed provides you with more information than other consolidated feeds. As a real money brokerage account holder at Alpaca (brokerage services are offered through Alpaca Securities LLC), you can easily consume and analyze SIP data using Alpaca’s API.
Exchange Proprietary Market Data Feeds
While the SIP or another consolidated stream is entirely sufficient for most investors and traders as their source for stock quotes and trades, most exchanges sell their own proprietary data feeds, which provide additional quote and trade information. These proprietary feeds are typically used by professional day traders, who visually interpret the full order book for additional information. The feeds are also used in a non-display format by for high-frequency traders and market makers, who may incorporate all order book and trade information into their trading strategies and may benefit from reacting as quickly as possible to this information. Although this sounds concerning, there’s nothing nefarious here, as these are all public feeds that any individual or company can purchase.
So why don’t all algo traders use direct exchange feeds?
Sadly, this is mainly due to cost. These feeds cost tens of thousands of dollars per month, and to optimally use them, one would likely want to subscribe to all of them, co-locate their servers in the same data centers, and use microwaves to transmit the information amongst the data centers. Needless to say, this is incredibly expensive to operate and can only be justified by companies operating at scale. Nevertheless, there has been a lot of discussion by the SEC and industry participants regarding stock market data costs as of late, and there appears to be some hope that data costs will be reined in.
Practically speaking, do direct exchange feeds give traders an advantage?
For short-term quantitative trading, I think most participants would agree direct exchange feeds provide material information about individual orders and are needed to reduce latency. However, I think participants would also agree that the feeds alone do not provide a trading edge, and particularly if you are trading longer-term strategies algorithmically, you likely aren’t missing out.
IEX Market Data
One proprietary exchange feed worth highlighting is IEX’s market data feed. Unlike other exchanges, IEX currently provides its market data for free. There’s no catch (at least for now), as IEX wants to promote activity on its exchange and has stated that “legacy stock exchanges obstruct transparency and create an uneven playing field by overcharging for market data on orders they did not create.” IEX’s market data includes both top of book and last sale information as well as aggregated deep book information (although not a full stream of every event, which IEX does not provide). You can read more about IEX’s market data here, but the important point to remember is that these feeds only include orders and executions on the IEX order book.
Considering that IEX’s market share based on total consolidated volume is around 2.5 to 3% as of December 2018, the IEX trade and quote feeds may be missing out on significant information contained in the other venues’ trades and quotes. Nevertheless, IEX data is a good starting point and is easy to access and use for paper trading with Alpaca’s API.
Market data is the lifeblood of automated trading, and whether you are new or experienced to the field, you should be aware of the details of the market data powering your algorithms. There are pros and cons to each data feed, and some will be more useful than others for specific types of trading. Slight differences in feeds can potentially have significant effects on your strategy research or trade execution. Alpaca’s API makes it easy to consume and analyze both IEX’s data or SIP data.
Technology and services are offered by AlpacaDB, Inc. Brokerage services are provided by Alpaca Securities LLC (alpaca.markets), member FINRA/SIPC. Alpaca Securities LLC is a wholly-owned subsidiary of AlpacaDB, Inc.
You can find us @AlpacaHQ, if you use twitter.