In our last post “How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes”, we introduced how to set up bitcoin price data in five minutes and we got a lot of good feedback and contributions to the open source MarketStore.
The data speed is really important
Today, I wanted to tell you how fast MarketStore is using the same data so that you can see the performance benefit of using the awesome open source financial timeseries database.
Faster data means more backtesting and more training in machine learning
Faster data means more backtesting and more training in machine learning for our trading algorithm. We are seeing a number of successful machine learning-based trading algos in the space, but one of the key points we learned is the data speed is really important. It's important not just for backtesting, but also for training AI-style algorithms since it by nature requires an iterative process.
This is another post to walk you through step by step. But TL;DR, it is really fast.
Setup

Last time, we showed how to setup the historical daily bitcoin price data with MarketStore.
This time, we store all the minute-level historical prices using the same mechanism called background worker, but with a slightly different configuration.
Almost 2.5 years with more than 1 million bars
The difference from last time is that background worker is configured to fetch 1-minute bar data instead of 1-day bar data, starting from 2016-01-01. That is almost 2.5 years with more than 1 million bars. You will need to keep the server up and running for a day or so to fill all the data, since GDAX’s historical price API does not allow you to fetch that many data points quickly.
Again, the data fetch worker carefully controls the data fetch speed in case the API returns “Rate Limit” error. So you just need to sleep on it.
Additional configuration here is something called “on-disk aggregate” trigger. What it does is to aggregate 1-minute bar data for lower resolutions (here 5 minutes, 15 minutes, 1 hour, and 1 day).
Check the longer time horizon to verify the entry/exit signals
In a typical trading strategy, you will need to check the longer time horizon to verify the entry/exit signals even if you are working on the minute level. So it is a pretty important feature. You would need pretty complicated LEFT JOIN query to achieve the same time-windowed aggregate in SQL. But with MarketStore, all you need is this small section in the configuration file.
The machine we are using for this test is a typical Ubuntu virtual machine with 8 of Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 32GB RAM and SSD.
The Benchmark
Unfortunately lots of people in this space are using some sort of SQL database
We are going to have a DataFrame object in python which holds all the minute level historical price data of bitcoin since January of 2016 from the server. We compare MarketStore and PostgreSQL.
PostgreSQL is not really meant to be the data store for this type of data, but unfortunately lots of people in this space are using some sort of SQL database for this purpose since there is no other alternative. That’s why we built MarketStore.
The table definition of the bitcoin data in PostgreSQL side looks like this.
The code
The input and output is basically the same, in that one symbol name is given, query the remote server over the network, and get one DataFrame. One strong benefit of MarketStore is you don’t need much client code to get the DataFrame object since the wire protocol is designed to give an array of numbers efficiently.
The Result
First, PostgreSQL
And MarketStore
Both results of course look the same like below.
50 times difference
A bitcoin was about $430 back then… Anyway, you can see the difference between 0.3 vs 15 seconds which is about 50 times difference. Remember, you may need to get the same data again and again for different kinds of backtesting and optimization as well as ML training.
Also you may want to query not just bitcoins but also other coins, stocks and fiat currencies, since the entire database wouldn’t fit into your main memory usually.
Scalability advantage in MarketStore
MarketStore can serve multiple symbol/timeframe in one query pretty efficiently, whereas with PostgreSQL and other relational databases you will need to query one table at a time, so there is also scalability advantage in MarketStore when you need multiple instruments.
Querying 7.7K symbols for US stocks
To give some sense of this power, here is the result of querying 7.7K symbols for US stocks done as an
Again, the amount of data is the same, and in this case each DataFrame is not as large as the bitcoin case, yet the difference to expand to large number of instruments is significant (more than 10 times). You can imagine in real life these two (per instrument and multi-instruments) factors multiply the data cost.
Alpaca has been using MarkStore in our production
Alpaca has been using MarkStore in our production for algo trading use cases both in our proprietary customers and our own purposes. It is actually amazing that this software is available to everyone for free, and we leverage this technology to help our algo trading customers.
Thanks for reading and enjoy algo trading!