You've successfully subscribed to Alpaca Resources
Great! Next, complete checkout for full access to Alpaca Resources
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.
  • Home
  • We're hiring
  • Forum
  • Docs
  • Log in
  • Sign up
Market Data

Generating Alternative Bars in Alpaca (Part 1 - Introduction)

Harkishan Singh Baniya
Harkishan Singh Baniya

Bars (or candles) are the most common form of financial data consumed by finance practitioners. Now, when I say bars, most of you must be thinking about time bars (e.g. daily bars, hourly bars or minute bars). I must say they are quite popular but the concept of building/sampling time bars carry two major flaws  -

Bar Definition
A bar is a graphical representation of a financial instrument’s price movement that usually contains the open, high, low and closing prices for a set period.

1) Markets are not chronological: While sampling time bars we assume that markets will process information at a constant time interval (say every hour, for hourly bars). That means we are assuming trades occur at a constant time interval (like every hour), which is not true. For example, we see a high volume during the market opening and closing hours and steady during the rest of the day for most assets. Also, the trade activity increase near earnings call or similar market events. These events cannot be captured by simple time bars.

2) Poor statistical properties: By sampling time bars at a constant time interval also gives rise to serial correlations, non-normality, non-IID, and heteroscedasticity.

Non Normal Distribution - Statistics How To
Non normal distribution definition and examples. Dozens of articles and videos explaining non normal distributions. Statistics made simple!

These properties are against the assumptions of most statistical models, making it a difficult choice for analysis and predictions. These properties arise due to the prevailing market noise and incorrect sampling.

Regression Model Assumptions
The following linear regression assumptions are essentially the conditions that should be met before we draw inferences regarding the model estimates or before we use a model to make a prediction.

Alternative Bars

Are there any solutions that tackle both problems?

The answer is yes. We can solve both problems by sampling on the trading activities. And in this three-part article, I will discuss how to deal with problems using various “alternative” bars. I will talk about various sampling methods and how can you generate alternative bars in Alpaca using its Trade API. Also, we will discuss how you leverage “alternative” bars for building a better trading strategy with examples.

Part - I (Introduction): In this part, we will learn to generate various alternative bars especially event-driven bars like tick bars, volume bars and dollar bars.

Part - II (Analysis & Trading Strategy): In this part, we will analyse various event-driven bars that were generated in Part-I and will also develop a simple trading strategy using the same bars.

Part - III (Integration in Alpaca API and an ML Trading Strategy): In this part, we will learn to build a live machine learning based strategy that uses the Alpaca Trade API integration for alternative bars.

The book Advances in Financial Machine Learning discusses various ideas for generating alternative bars that can categorize as - 1) Event-driven bars and 2) Imbalance/ Information driven bars. In this article, we will only discuss Event-driven bars and the Information driven bars are kept for the future.

Advances in Financial Machine Learning: Lopez de Prado, Marcos: 9781119482086: Amazon.com: Books
Advances in Financial Machine Learning [Lopez de Prado, Marcos] on Amazon.com. *FREE* shipping on qualifying offers. Advances in Financial Machine Learning

Event-Driven Bars

The main objective behind event-driven bars is to overcome asymmetric information contained inside time bars i.e. the bars sampled at a constant time interval and ignore the prevailing market activities or events that are taking place. So, the idea is to sample frequently when there is more information or the market is aggressive and sample less when markets are steady or less informative. To explain it better I would like to consider using an example: A typical stock receives high amounts of volume or trades during the open and closing hours. So bars are generated more frequently during those hours and less during the rest of the day. And by sampling, I mean to aggregate the trades on a trading metric like a number of ticks or volume of trades. And by aggregating on different trade metrics, we can generate bars for that specific aggregating metric.

The below function is responsible for aggregating metrics like volume (cum_volume) and if it exceeds the given threshold a volume bar is sampled. Similarly,  we can pass different metrics to aggregate like tick, volume or dollar value and generate tick bars, volume bars or dollar bars respectively.

***Warning: The following code needs access to Polygon tick data therefore if you do not have a funded live trading account through Alpaca you will receive a Warning Code 1000 followed by an error message of “Invalid Polygon Credentials”.***

def aggregate_bar(self, data):
        """
        Aggregate with the arrival of new trades data

        :param data : A data object containing the ticks of a
                      single timestamp or tick.
        """
        self.cum_count['cum_tick'] += 1
        self.cum_count['cum_volume'] += data.size
        self.cum_count['cum_dollar_value'] += data.price*data.size
        self.price.append(data.price)
        self.volume.append(data.size)
        #check the side of the trade
        tick_sign = self._check_tick_sign(data.price)
        if tick_sign > 0:
            self.cum_count['cum_buy_tick'] += 1
            self.cum_count['cum_buy_volume'] += data.size
            self.cum_count['cum_buy_dollar_value'] += data.price*data.size

        if self.cum_count[self.stat] >= self.threshold:
            #getting the vwap
            vwap = np.multiply(self.price, self.volume).sum()/
                   sum(self.volume)
            bar = {'timestamp': str(data.timestamp),
                   'symbol' : data.symbol,'open':self.price[0], 
                   'high':max(self.price), 'low': min(self.price), 
                   'close': data.price, 'vwap' : vwap}
            #join the cumulative metrics to the bar
            bar.update(self.cum_count)
            #save the bar
            self.save_bar(list(bar.values()))
            print(bar)
            self._reset_cache()
            return bar
        return False

Tick Bars

The intuition behind tick bars is that we will sample a bar when a predefined number of transactions (or ticks/trades) takes place. E.g. - We set the algorithm to sample a bar when 10000 ticks (threshold) of a particular asset have been completed.  This way we can know the arrival of new information in the market because the bars will be formed frequently.

def get_tick_bars(symbols:Union[str, list], threshold:Union[int, dict], 
                  save_to:str):
    """
    Get RealTime Tick Bars.

    :param symbols :(str or list) a ticker symbol or a list of ticker 
                    symbols to generate the bars.
    :param threshold :(int or dict) threshold for bar formation or 
                      sampling. A dictionary must be given if bars 
                      to generate for multiple symbols. The dictionary 
                      keys are ticker symbols and values are the 
                      thresholds respectively.
    :param save_to :(str) the path to store the bars.
    :return :(None)
    """
    get_bars('tick_bar', symbols, threshold, save_to)

Volume Bars

It aims to solve the problem order fragmentation which is very common with electronic trading. Instead of sampling for number transactions, we can sample for actual volume that was traded i.e. for a pre-defined number of shares that was exchanged. E.g. - we can sample a bar every time 10000 shares (threshold)  of an asset was traded, without considering how many transactions it had involved.

Reading Into Fragmentation
Fragmentation is the use of different suppliers and component manufacturers in the production of a good.
def get_volume_bars(symbols:Union[str, list], threshold:Union[int, dict],
                    save_to:str):
        """
        Get RealTime Volume Bars.

        :param symbols :(str or list) a ticker symbol or a list of ticker
                        symbols to generate the bars.
        :param threshold :(int or dict) threshold for bar formation or 
                          sampling. A dictionary must be given if bars 
                          to generate for multiple symbols. The dictionary 
                          keys are ticker symbols and values are the 
                          thresholds respectively.
        :param save_to :(str) the path to store the bars.
        :return :(None)
        """
        get_bars('volume_bar', symbols, threshold, save_to)

Dollar Bars

The idea behind dollar bars is that we will sample a bar when a predefined amount of dollar value is exchanged in the market. This way we can try to analyse significant price movements empirically. E.g. - we can sample a bar for SPY ETF whenever 10 million dollars (threshold) worth of shares has been exchanged/traded. The dollar value of a trade is the share traded times the share price of that asset.

def get_dollar_bars(symbols:Union[str, list], threshold:Union[int, dict],
                    save_to:str):
        """
        Get RealTime Dollar Bars.

        :param symbols :(str or list) a ticker symbol or a list of ticker
                        symbols to generate the bars.
        :param threshold :(int or dict) threshold for bar formation or
                          sampling. A dictionary must be given if bars
                          to generate for multiple symbols. The dictionary
                          keys are ticker symbols and values are the
                          thresholds respectively.
        :param save_to :(str) the path to store the bars.
        :return :(None)
        """
        get_bars('dollar_bar', symbols, threshold, save_to)

Generation Bars

Currently, I am working on integrating the event-driven bars with Alpaca API, so that a user can generate the bars in real time directly from the API. But till the time a developer can clone the GitHub repo of this project and follow instructions.

Harkishan-99/Alternative-Bars
Generate various Alternative Bars both historically and at real-time. - Harkishan-99/Alternative-Bars

First of all to get the streaming data we need to connect to the Alpaca API. We will be using the WebSocket streaming API to receive data in real time and aggregate them. So, I created a convenient script called connection.py which contains a Client class that initializes the connection with the API using the API credentials from a config file.

import configparser as ConfigParser
import alpaca_trade_api as tradeapi

class Client:
    """
    A class to establish a connection with the brokers API.
    """

    def __init__(self):
        """
        Establish the API connection
        """
        configParser = ConfigParser.RawConfigParser()
        configFile = 'config.cfg'
        configParser.read(configFile)
        self.api_key = configParser.get('alpaca', 'api_key')
        self.api_secret = configParser.get('alpaca', 'api_secret')
        self.base_url = configParser.get('alpaca', 'base_url')

    def connect(self):

Bars are generated using the get_bar function assigning different aggregating metrics for different bar types. The function creates CSV file for saving the generate bars and assigns the column headers if the file doesn’t exist before. Else, it will append the bar to the same existing file. Then we create instances of the EventDrivenBar class for all the symbols and set an event handler to handle the incoming trades in an asyncio loop and aggregate it till the threshold is met. The bar is then appended to the CSV file. Sample datasets of bars can be seen in the Github repository of this project under sample datasets.

def get_bars(bar_type:str, symbols:Union[str, list], 
             threshold:Union[int, dict], save_to:str):
    """
    Get the realtime bar using the Streaming API.

    :param bar_type :(str) Type of bar to form. Either "tick_bar", 
                    "volume_bar" or "dollar_bar".
    :param symbols :(str or list) a ticker symbol or a list of ticker 
                    symbols to generate the bars.
    :param threshold :(int or dict) threshold for bar formation or 
                      sampling. A dictionary must be given if bars 
                      to generate for multiple symbols. The 
                      dictionary keys are ticker symbols and values 
                      are the thresholds respectively.
    :param save_to :(str) the path to store the bars.
    """
    #create a save file and a directory structure
    save_to = save_to + '/' + bar_type
    if not os.path.exists(save_to):
        os.makedirs(save_to)
    #the file path and name
    save_to = save_to + '/' + 'realtime.csv'
    #check if the file exist
    if not os.path.exists(save_to):
        #write the header of the CSV file
        with open(save_to, 'w', newline='') as f:
            #the header
            header_ = ['timestamp', 'symbol', 'open', 'high', 
                       'low', 'close', 'vwap', 'cum_tick', 
                       'cum_volume', 'cum_dollar_value','cum_buy_tick',
                       'cum_buy_volume', 'cum_buy_dollar_value']
            # Create a writer object from csv module
            csv_writer = csv.writer(f)
            # Add contents of list as last row in the csv file
            csv_writer.writerow(header_)
    #if the file exists then we will append the bars to the same file.
    #initiate instances of symbols
    instances = {}
    if isinstance(symbols, list):
        #multi-symbol
        channels = ['trade_updates']+['T.'+sym.upper() for sym in symbols]
        for symbol in symbols:
            #create a separate instance for each symbols
            instances[symbol] = EventDrivenBars(bar_type, 
                                                threshold[symbol], 
                                                save_to)
    else:
        #single symbols
        channels = ['trade_updates', f'T.{symbols.upper()}']
        if isinstance(threshold, int):
            #threshold is given as a int type
            instances[symbols] = EventDrivenBars(bar_type, 
                                                 threshold,
                                                 save_to)
        elif isinstance(threshold, dict):
            #threshold is given as a dict type
            instances[symbols] = EventDrivenBars(bar_type, 
                                                 threshold[symbol], 
                                                 save_to)
        else:
            raise TypeError(f'The given threshold is a {type(threshold)}\
                            expecting a int or a dict')

    @conn.on(r'T$')
    async def on_trade(conn, channel, data):
         if data.symbol in instances and data.price > 0 and data.size > 0:
             bar = instances[data.symbol].aggregate_bar(data)

    conn.run(channels)

Real-time Bars

To generate the bars in real time users need to set up the config file with their API credentials i.e. api_key and api_secret.

Alpaca - Commission-Free API First Stock Brokerage
Alpaca is a modern platform for trading. Alpaca’s API is the interface for your trading algorithms, bots, or applications to communicate with Alpaca’s brokerage and other services.
[alpaca]
api_key = "enter your api key here without quotes"
api_secret = "enter your api secret here without quotes"
base_url = https://paper-api.alpaca.markets

Now, the bars can be generated using the bars.py script. In the example code below I have how the setups for generating types of event-driven bars discussed in this article. I have defined the same threshold for all the individual assets but it is not recommended as different assets have unique trading properties like market capitalization, liquidity, share price etc. So, it is highly recommended to use a threshold specific to an asset. We will discuss more on thresholds and analyse how it affects the properties of the bars in the next part of this blog.

from bars import get_tick_bars, get_volume_bars, get_dollar_bars

#assets for receiving the event-driven bars
symbols = ['AAPL','TSLA','AMZN']
#setting static threshold for individual stocks for different bars
tick_bar_threshold = {'AAPL':1000,'TSLA':1000,'AMZN':1000}
volume_bar_threshold = {'AAPL':50000,'TSLA':5000,'AMZN':50000}
dollar_bar_threshold = {'AAPL':2000000,'TSLA':2000000,'AMZN':2000000}

#setup for receiving tick bars for given thresholds
get_tick_bars(symbols, tick_bar_threshold, 'sample_datasets')
#setup for receiving volume bars for given thresholds
get_volume_bars(symbols, volume_bar_threshold, 'sample_datasets')
#setup for receiving dollar bars for given thresholds
get_dollar_bars(symbols, dollar_bar_threshold, 'sample_datasets')

Historical Bars

Since alternative bars heavily depends on the quality of tick data and for generating the historical bars and a good amount of historical tick data is required. I haven’t included a code for generating historical bars because there is an excellent Python package called mlfinlab, which can be used if you have some cleaned and quality historical tick data. I will highly suggest using mlfinlab, it has a good set of documentation and IPython notebooks demonstrating the use.

hudson-and-thames/mlfinlab
MlFinLab helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools. - hudson-and-thames/mlfinlab

Bonus

As a bonus, I would like to talk about the intra-bar factors/features that can be generated. In this article, we have generated cumulative factors like total buy volume,  total buy dollar value etc. These features can be used to determine how much volume of a candle/bar is buy initiated and vice-versa. Also, you can add your intra-bar factors like entropy, kyle-lambda or other market microstructure features, with minimal modification of the original code. You can read more about these factors in the book Advances in Financial Machine Learning (Chapter - 18 and 19). I will add some more features in the coming time, I also welcome contributions. Some of these factors will be explained in depth while we develop a trading strategy in the coming parts of this article.

Kyle’s Lambda - Mingze Gao
A Random Walk Away Finance.
Market Microstructure Explained
Markets microstructure deals with issues of market structure and design, market maker and investor behavior, and how exchange occurs in markets.

To understand the idea of Alternative Bars in depth I recommend you the following resources and also the book mentioned above:

Financial Machine Learning Part 0: Bars
Recently, I got my copy of Advances in Financial Machine Learning by Marcos Lopez de Prado. Lopez de Prado is a renowned quant researcher who has managed billions throughout his career. The book is…
Information-driven bars for financial machine learning: imbalance bars
In previous articles we talked about tick bars, volume bars and dollar bars, alternative types of bars which allow market activity-dependent sampling based on the number of ticks, volume or dollar…
The Volume Clock: Insights into the High Frequency Paradigm by David Easley, Marcos Lopez de Prado, Maureen O’Hara :: SSRN
Over the last two centuries, technological advantages have allowed some traders to be faster than others. We argue that, contrary to popular perception, speed i
hudson-and-thames/mlfinlab
MlFinLab helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools. - hudson-and-thames/mlfinlab

***Once again, the following code discussed today needs access to Polygon tick data therefore if you do not have a funded live trading account through Alpaca you will receive a Warning Code 1000 followed by an error message of “Invalid Polygon Credentials”.***


**This project’s Github Repository: https://github.com/Harkishan-99/Alternative-Bars

Harkishan-99/Alternative-Bars
Generate various Alternative Bars both historically and at real-time. - Harkishan-99/Alternative-Bars

Technology and services are offered by AlpacaDB, Inc. Brokerage services are provided by Alpaca Securities LLC (alpaca.markets), member FINRA/SIPC. Alpaca Securities LLC is a wholly-owned subsidiary of AlpacaDB, Inc.

You can find us @AlpacaHQ, if you use twitter.

Market DataUse CasesCode Examples

Harkishan Singh Baniya

I am a Quantitative trader for the past 3 years, developing trading systems. I have been leveraging Machine Learning to improve risk management and generate better trading signals.