When working on a machine learning task, the network architecture and the training method are the two key factors to turning a set of data-points into a functional model.
But where should different training methods be applied? How do they work? And which is “best”? In this post, we list up three types of training methods and make comparisons among Supervised, Unsupervised and Reinforcement Learning.
1. Supervised learning
In “Intro into Machine Learning for Finance (Part 1)” we covered some high level theory on how a network is trained to improve the accuracy of its model, but we never discussed where the target values for comparison actually come from.
Classification Task
For a classification task, its easy to see. You wish for the model to be able to identify a falling wedge pattern, for example, so you feed it sets of inputs, each labeled as either being a falling wedge or not. If the model mislabels an input the error is back-propagated to change its prediction for the future.
Regression Task
Similarly, for a regression task, the label for each set of inputs will likely be the next value in the time series. For example, you might try to make a model which can learn to predict the price at the close of the next day based on a set of previous market movements.
Both of these cases are examples of “supervised learning” where the model is trained against an already labeled set of data and the error function is calculated as the difference between the predicted output and the supervised labels for the dataset.
This is generally a very simple and efficient process, as the network weightings are updated to minimise the error between the prediction and the target output for each batch of data-points. You already have the target function/ decision process to label the data, its just a case of fitting the model to try to emulate it.
In “Forecasting Market Movements Using Tensorflow — Intro into Machine Learning for Finance (Part 2)” we put supervised learning into practice with a simple neural network to make long/short calls.
2. Unsupervised learning
Meanwhile, unsupervised learning tasks revolve around the model learning complex relationships within data that you haven’t been able to determine yet. This can be through tasks such as clustering data-points, which help to give insight to the structure of the data.
The application to real life and, indeed, trading is often harder to see, which is exactly the reason it can’t be a “supervised” task — it is trying to find relationships in the data that we haven’t found yet.
One good use may be in the analysis of portfolios. By clustering equities and financial instruments you can get a unique view of the distribution of exposure and risk, and either hedge accordingly or look to maximise the efficiency of exposure to one area of the market.
An interesting paper on the creation of diverse portfolios via clustered stocks can be found here.
3. Reinforcement learning
Reinforcement learning is an interesting mix of both supervised and unsupervised learning. While it does require a specific target function to be trained towards, the error that trains the network is deferred from the actual decision making period. The network is instead trained against a “reward” and/or “punishment” function.
The model is attempting to learn a policy of actions to be able to maximise its reward function, such as learning the optimal time to hedge a portfolio in a turbulent market.
There is no immediate profit or loss at the point when the decision is made. Instead, its reward function will be based on how successful the hedging was in protecting portfolio value in coming time-steps. If it were to hedge too early, it could miss out on market upside. Where as, if it hedged too late then the portfolio will suffer greater drawdown.
Since RL is set up to generate actions for an environment, rather than to output a simple prediction, it requires a simulated training environment for the agent to react to and interact with. This can prove challenging both in terms of basic implementation and especially in optimizing to train in any reasonable time frame.
Again, further reading on reinforcement learning for portfolio hedging can be found here.
Comparison and Drawbacks
Dataset generation:
A supervised learning task, such as classification, requires a set of label data to train against. For a simple price predictor this will only require a small pre-processing script which sets the target value as the close price the next day. However, for more complex functions it will require a much more complex algorithm, or even manual pattern identification and labeling. And, due to the large data-sets needed for effective training (10s of thousands at minimum), this can be extremely time consuming, if not infeasible.
Unsupervised learning, on the other hand, has no such issue — as its trying to find relationships in unlabeled data. However, you still need to gather the dataset for use, verify the data is clean enough and interpret the results of the model relative to the data.
Reinforcement learning is similar to supervised, as you need a reward function for the model to train towards. However, instead of a labeled dataset, per se, the model is trained against a simulated environment. This can allow for a simpler function to identify the behavior to be rewarded, but brings added complexity to the setup of the data feeds and how they interact with the model. RL also has the added issue of data requirements — needing huge datasets to train effectively.
Training times:
While both supervised and unsupervised models can require significant time and resources to train adequately, they pale in comparison to reinforcement learning due to the nature of its deferred rewards when training. So, not only will you need a larger dataset to train against for RL, but you will need more powerful machines to run the training process in a comparable time.
Meanwhile, it’s hard to contrast the training process of supervised vs unsupervised methods, as both the time per training step and the number of training steps required will vary greatly depending on the size of the network used and the optimizer/ optimizer settings.
In all cases, it’s advisable to look at using newer optimizers, such as Adam Optimizer, as they can provide faster and less noisy training for the network, achieving a more accurate fit over fewer epochs.
Overfitting:
Overfitting is a serious concern for all training types and, once again, varies more by the quality of data and architecture than type of training method.
A rule of thumb to avoid overfitting: once you’ve found an architecture than can learn to accurately predict training data, if validation accuracies diverge during training then slowly reduce the size of the network until you find the smallest network that still trains to a good accuracy on the training dataset.
If the accuracy against the validation dataset still fails to converge then your issue likely lies with lack of causation between training features and outcome, rather than overfitting of the model.
Conclusion
Each training method is suited to a specific type of machine learning task and data, with supervised being the most likely candidate to create simple trading signals and unsupervised being used for analysing relationships in data to help refine strategies.
While reinforcement learning has great promise in certain tasks, it is unlikely to be particularly feasible vs other machine learning techniques due to huge data and computing requirements.
By Matthew Tweed