How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes

One of the great benefits of algorithmic trading is that you can test your trading strategy against historical data.

Especially for new strategies you developed on your own, you don’t really know how it will perform without testing it against reliable data. Algo trading is not that different from software development. Today, most good software code is continuously tested. As the code evolves, developers continually test it against real use cases to make sure that alterations won’t result in future failures. You want to test your algo when you drop the very first version. But you also want to test the algo when you make changes or adjustments while the algo is running.

Here is one problem you face; data.

While it is always good to test as many angles as possible, running a number of backtests is going to be a very data-intensive workload that requires access to enough data to have visibility into a very long history. This is why you cannot run this iteration using GDAX API directly. You need to store the data somewhere for your own purposes.

The Tutorial

So the first question to come to mind is always how to get the data and prepare it for successful backtesting. Well, today I am going to tell you how to use MarketStore to acquire a long history of Bitcoin price data for this purpose of running the most accurate backtest possible. And this setup tutorial is going to be quick. Don’t waste your time setting up this and that. All you need is:

  • docker (either on Windows, Mac or Linux),
  • a console terminal,
  • and a big cup of strong coffee! ☕

Got’em? Alright, let’s get started.

Architecture

Here is the high level picture of today’s system. We will start a MarketStore instance using docker container, and run a background worker that calls GDAX price API so that we can pull the bitcoin historical price from their endpoint quickly and make it available for backtest clients to query over HTTP.

We will start another container for the client using python anaconda with python3 image. We use the official client package named pymarkestore. You will get a DataFrame from MarketStore.

Setup MarkeStore Server

There is the official build of MarketStore docker image today publicly available in DockerHub, but first, let’s write a config file for the server.

In the github repository you can find an example config file in YAML format: https://github.com/alpacahq/marketstore/blob/master/mkts.yml but I’m putting our example here.

root_directory:/project/data/mktsdb  listen_port: 5993  log_level: info  queryable: true  stop_grace_period: 0 wal_rotate_interval: 5 enable_add: true enable_remove: false enable_last_known: false bgworkers:  - module: gdaxfeeder.so   name: GdaxFetcher config:   symbols:   - BTC:   base_timeframe: "1D" query_start: "2018-01-01 00:00"

This configures the server so that it fetches the GDAX historical price API for 1-day bars since 2018–01–01. Save this config as $PWD/mkts.yml file. The server listens on the port 5993 as default. Now let’s bring up the server.

$ docker : run -v $PWD/mktsdb:/project/data/mktsdb -v  $PWD/mkts.yml:/tmp/mkts.yml --net host  alpacamarkets/marketstore:   marketstore -config /tmp/mkts.yml

The server should automatically download the docker images from DockerHub if you haven’t, and start the server process with the config. Hopefully, you will see something like this..

I0430  05:54:56.091770 1 log.go:14] Disabling "enable_last_known" feature until it is fixed...
I0430  05:54:56.092200 1 log.go:14] Initializing MarketStore...
I0430  05:54:56.092236 1 log.go:14] WAL Setup: initCatalog true, initWALCache true, backgroundSync true, WALBypass false:
I0430  05:54:56.092340 1 log.go:14] Root Directory: /project/data/mktsdb 
I0430 05:54:56.097066 1 log.go:14] My WALFILE: WALFile.1525067696092950500.walfile
I0430  05:54:56.097104 1 log.go:14] Found a WALFILE: WALFile.1525067686432055600.walfile, entering replay...
I0430  05:54:56.100352 1 log.go:14] Beginning WAL Replay
I0430 05:54:56.100725 1 log.go:14] Partial Read
I0430  05:54:56.100746 1 log.go:14] Entering replay of TGData 
I0430  05:54:56.100762 1 log.go:14] Replay of WAL file /project/data/mktsdb/WALFile.1525067686432055600.walfile finished
I0430  05:54:56.101506 1 log.go:14] Finished replay of TGData 
I0430  05:54:56.109380 1 plugins.go:14] InitializeTriggers 
I0430  05:54:56.110664 1 plugins.go:42] InitializeBgWorkers
I0430  05:54:56.110742 1 log.go:14] Launching rpc data server... 
I0430  05:54:56.110800 1 log.go:14] Launching heartbeat service...
I0430  05:54:56.110822 1 log.go:14] Enabling Query Access...
I0430  05:54:56.110844 1 log.go:14] Launching tcp listener for all services...

If you see something like “Response error: Rate limit exceeded”, that’s a good sign, not a bad one, since it means the background worker successfully fetched the price data and reached to rate limit. The fetch worker will suspend for a while and restart to catch up to the current price automatically. You just need to keep it running.

Client Side

MarketStore implements JSON-RPC and MessagePack-RPC for query. MessagePack-RPC is particularly important for performance of a query on a large dataset. Thankfully, there is already python and go client library so you don’t have to implement the protocol. In this article, we use python. We start from miniconda3 image from another terminal.

$ docker : run -it --rm -v $PWD/client.py:/tmp/client.py --net host   continuumio/miniconda3 bash#  pip  install ipython pymarketstore 

We have installed ipython and pymarketstore, including their dependencies. From this terminal, let’s start an ipython shell and query MarketStore data.

#ipython (base) root@hq-dev-01:/# :$PWD/client.py:ipython Python 3.6.4 |Anaconda, Inc.|(default, Jan  2018, 18:10:19) Type 'copyright', 'credits' or 'license' for more information IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help. 
In [1]: import pymarketstore as pymkts 
In [2]: param pymkts.Params('BTC', '1D', 'OHLCV', limit=100) 
In [3]: df pymkts.Client(' http://localhost:5993/rpc') , query(param).first().df() 
In [4]:df[-10:]Out[4]: Open High Low Close Volume Epoch 2018-04-14 00:00:00+00:00  7893.19  8150.00  7830.00  8003.11   9209.196953  2018-04-15 00:00:00+00:00  8003.12  8392.56  8003.11  8355.25   9739.103514  2018-04-16 00:00:00+00:00  8355.24  8398.98  7905.99  8048.93  13137.432715  2018-04-17 00:00:00+00:00  8048.92  8162.50  7822.00  7892.10  10537.460361  2018-04-18 00:00:00+00:00  7892.11  8243.99  7879.80  8152.05  10673.642535  2018-04-19 00:00:00+00:00  8152.05  8300.00  8101.47  8274.00  11788.032811  2018-04-20 00:00:00+00:00  8274.00  8932.57  8216.21  8866.27  16076.648797  2018-04-21 00:00:00+00:00  8866.27  9038.87  8610.70  8915.42  11944.464063  2018-04-22 00:00:00+00:00  8915.42  9015.00  8754.01  8795.01  7684.827002  2018-04-23 00:00:00+00:00  8795.00  8991.00  8775.10  8940.00   3685.109169

Voila! You just got the daily bitcoin price in hand in the DataFrame format. Note the second line (param = …) determines which symbol and timeframe to query, with some query predicates such as the number of rows or date range to query. From here, you can do a number of things including calculating indicators such as moving average and bollinger band, or find the statistical volume anomaly using some scipy package.

Conclusion

I want to emphasize that it is very important to build a performant historical dataset to study and develop a trading algorithm, and you can do it quickly with MarketStore as we have just walked through. This article demonstrated how to work with the bitcoin prices from GDAX, but you can hook up other data sources as well pretty easily using pymarketstore’s write method. You can also write your own custom background data fetcher.

Again, the query performance is going to be critical when in comes to backtesting, since you want to iterate quickly to get the results. now You may wonder how fast MarketStore can be. I will show the lightning fast query speed with huge data set in the next post.

In the meantime, please leave any questions in the comments or ask @AlpacaHQ regarding this tutorial. Leave your email below so we can notify you when we can grant access to the full trading platform! You can also check us out at https://alpaca.markets.

Happy algo trading!

Interested in learning more about Broker API?

Don't miss out on any updates about our Broker API suite of solutions! Share your details with us and be the first to know about our latest content.

Author image
Co-founder & CEO of Alpaca
You've successfully subscribed to Alpaca Blog | Developer-First API for Stocks, Options, and Crypto
Great! Next, complete checkout for full access to Alpaca Blog | Developer-First API for Stocks, Options, and Crypto
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.