• Nebyly nalezeny žádné výsledky

For this kind of task, the quality of the model might be evaluated using different metrics. If the model would be constructed as a continuous task, then we have just one trading session and the return achieved at the end would be the appropriate metric. However, our model is designed as an episodic task and in each episode the agent starts with the same amount in spite of how he did in a previous episode.

For our problem we will use two metrics for evaluation. The first one is the cumulative return over all episodes which simply means the sum of 85 evaluation episodes’ returns (for evaluation each episode has the length of 672 steps which is equal to one week). This metric is appropriate if we want to start every trading session with the same initial amount. However, in the real-world investors usually do not have an unlimited capital to maintain the fixed initial balance for all trading sessions. If for example we lose the whole initial balance in one trading session, we might not have enough capital to start the next trading session with the same amount of money, moreover we might not be able to continue to trade at all. Furthermore, if we constantly achieve the profit, we do not want to withdraw the capital but rather keep it while it’s appreciating. In this case, the compound return is more suitable as it represents the cumulative effect that a series of gains or losses has on an original amount of capital over a period of time. If we use compound return, then we assume that any gains will be used in later trading sessions and any losses will not be covered by additional capital.

Two versions of the A2C model “base” and “retrained” will be compared to the baseline long strategy which means simply keep long position all the time. The first set of plots displays the returns in chronological order for each model during evaluation period as well as the distribution of returns.

Figure 5.9 Returns (left) and return distribution (right) for retrained model. Source: own elaboration

l

Figure 5.10 Returns (left) and return distribution (right) for base model. Source: own elaboration

Figure 5.11 Returns (left) and return distribution (right) for long strategy. Source: own elaboration

At first look it seems the returns are highly correlated among all three models. For example, all the models had a huge loss in 29th week and in 71st week the substantial profit was achieved by all models. However, by looking at distributions of returns we could see that even though the returns are correlated, they have different magnitude as all the models have different mean value with the smallest one of 2.67 achieved by the long strategy and the largest one of 4.63 achieved by the retrained A2C model. Moreover, the retrained model seems to have the highest variability of returns which was anticipated because in this case it is not a single model, but a collection of models where each one might have a slightly different distribution of returns.

Looking from another perspective, we can investigate the difference in returns between the models as represented by the following charts.

Figure 5.12 Return difference: retrained model vs long strategy (left) and retrained model vs base model (right). Source: own elaboration

It is clearly evident that the retrained model consistently beat the simple long strategy with the difference exceeding 6% in favor of the long strategy happened only once in the 76th week when the market went through the strong bull run. When comparing the retrained model with the base one, the difference in returns is not as huge, but still the retrained model returns dominate.

Finally, the last set of charts illustrates the comparison between the models in terms of metrics defined at the beginning of this section.

Figure 5.13 Cumulative return (left) and compound return (right). Source: own elaboration

As was already indicated by distribution charts, the retrained A2C model performs best on the evaluation data while the simple long strategy has the lowest return. In terms of cumulative return, the retrained A2C model achieves about 393% over 85 weeks of evaluation period whereas the long strategy– only 227% and the base model is in between with 277% of cumulative return. During the first 25 weeks when the market was flat without dramatic price movements, both A2C models were able to achieve consistent positive returns. After the substantial price drop at the beginning of the coronavirus pandemic, the market started a strong bull run and both A2C models did well without any dramatic changes in performance. The same conclusion could be made by looking at a compound return chart. Here, we can clearly see the power of compound returns. While the difference between retrained A2C model and the long

strategy is less than two times in terms of cumulative return, the difference in compound return is more than 7 times.

Conclusion

This work aimed to find out if it is possible to use artificial intelligence in order to achieve robust results in cryptocurrency trading. Among different types of machine learning methods, the reinforcement learning model was chosen because the problem of financial trading perfectly fits into the framework of RL. Rather than a predictive model, we created an advisory model which outputs the actions to be taken in the market in order to achieve the highest profit.

The reinforcement learning model was combined with an LSTM neural network for a better value function approximation and to handle properly the time series data. It was trained using only historical BTC trading data such as price, traded volumes, and indicators of technical analysis. There were presented two versions of an RL model which are compared to the baseline which for this problem was a simple long strategy, formally known as “Buy & Hold”. The first

“base” version of the model was trained once and used thereafter as such without any changes.

The second “retrained” version was continuously retrained in order to better reflect current market conditions.

Summing up the results, we can conclude that both models achieved better results than the simple long strategy. The retrained version performed best during the evaluation period having the cumulative return higher by 73% comparing to the long strategy, whereas the base model outperformed it only by 22%. Thus, as suggested by the numbers, it can be seen that even on a strongly rising market the active trading using the RL model can yield impressive results.

Moreover, the idea of actively retraining the base model to capture the most recent market conditions proved to positively affect the final results.

However, before deploying the model into production one should be aware of limitations of the presented models which affect their performance in a real market. Among the most important limitations are the following:

- The absence of transaction costs is the biggest problem of the presented models. Because the RL agent is trading actively, it is making a lot of trades which brings high costs in real trading. By including transaction costs into the simulated environment, the reward function should be adjusted, thus affecting the behavior of the agent.

- The absence of price slippage which is the difference between the expected price of a trade and the price at which the trade is actually executed. In a simulated environment, the trade is assumed to be executed at the close price of the last 15-minute time step.

However, in practice this could not be assumed to be true all the time as markets are changing rapidly, therefore correction should be made to this assumption.

- The observation space of the presented models is limited to historical trading data from a single exchange. It might be beneficial to include other data like fundamental data or

sentiment data. Moreover, if we aggregate the trading data from multiple exchanges, it might improve the model as well.

- The complexity of the neural network and hyperparameter tuning were limited by the computing resources available. The training and optimal hyperparameter search are time and hardware intensive tasks and if more computing resources would be utilized, potentially better results could be achieved.

The code used in this work can be found at https://github.com/ilidmytro/RL-trading

List of Figures

Figure 2.1 Artificial Intelligence, Machine Learning, Deep Learning ... 5

Figure 2.2 Comparison of ML vs classical programming ... 6

Figure 3.1 Neural network architecture ... 9

Figure 3.2 Information flow in a single node ... 10

Figure 3.3 The flow of a neural network training... 13

Figure 3.4 The representation of a node's output... 17

Figure 3.5 The information flow in RNN ... 22

Figure 3.6 The chain of RNN modules... 23

Figure 3.7 The chain of LSTM modules ... 23

Figure 3.8 The forget gate in LSTM... 24

Figure 3.9 The transformation of new inputs to be added ... 24

Figure 3.10 Update of a cell state ... 25

Figure 3.11 The output from an LSTM cell ... 25

Figure 4.1 The general overview of actor-critic models ... 35

Figure 5.1 The summary of an A2C common body ... 41

Figure 5.2 The summary of A2C actor and critic heads ... 41

Figure 5.3 The table of hyperparameters ... 42

Figure 5.4 The mean batch value (left) and average training reward (right)... 43

Figure 5.5 Entropy loss (left) and policy loss (right) ... 44

Figure 5.6 Value loss (left) and total loss (right) ... 44

Figure 5.7 Average position (left) and total number of trades (right) ... 45

Figure 5.8 Advantage values (left) and L2 of gradients (right) ... 45

Figure 5.9 Returns (left) and return distribution (right) for retrained mode ... 46

Figure 5.10 Returns (left) and return distribution (right) for base model... 47

Figure 5.11 Returns (left) and return distribution (right) for long strategy ... 47

Figure 5.12 Return difference: retrained model vs long strategy (left) and retrained model vs base model (right) ... 48

Figure 5.13 Cumulative return (left) and compound return (right) ... 48

Bibliography

Abu-Mostafa, Y. S., & Atiya, A. F. (1996). Introduction to financial forecasting. Applied Intelligence, 6(3), 205–213.

Antonopoulos, A. M. (2017). Mastering bitcoin: Programming the open blockchain (2nd ed.).

O’Reilly Media.

Bansal, S. (2020, July 12). Who is the Father Of Artificial Intelligence? Retrieved from Analytixlabs.co.in website: https://www.analytixlabs.co.in/blog/father-of-artificial-intelligence/

Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.

Chan, E. (2013). Algorithmic trading: Winning strategies and their rationale. Wiley.

Chollet, F. (2017). Deep learning with python. New York, NY: Manning Publications.

Downey, L. (2021). Efficient Market Hypothesis (EMH). Retrieved from Investopedia.com website: https://www.investopedia.com/terms/e/efficientmarkethypothesis.asp

Ethier, S. N., & Kurtz, T. G. (2009). Markov processes: Characterization and convergence (1st ed.). Nashville, TN: John Wiley & Sons.

Fama, E. F. (1965). Random Walks in Stock Market Prices. Financial Analysts Journal, Vol. 21, No. 5, 55-59.

Geron, A. (2017). Hands-on machine learning with scikit-learn and TensorFlow. Sebastopol, CA: O’Reilly Media.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. London, England: MIT Press.

Haykin, S. (1999). Neural networks: Comprehensive foundation (2nd ed.). Piscataway, NJ:

I.E.E.E. Press.

Jordan, D. W., & Smith, P. (2002). Mathematical techniques: An introduction for the engineering, physical, and mathematical sciences (3rd ed.). London, England: Oxford University Press.

Lapan, M. (2020). Deep Reinforcement Learning Hands-On: Apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more, 2nd Edition (2nd ed.). Birmingham, England: Packt Publishing

Linoff, G. S., & Berry, M. J. A. (2011). Data mining techniques: For marketing, sales, and customer relationship management (3rd ed.). Chichester, England: John Wiley & Sons.

Malkiel, B. G. (2003). The efficient market hypothesis and its critics. The Journal of Economic Perspectives: A Journal of the American Economic Association, 17(1), 59–82.

Nielsen, M. (2019). Neural Networks and Deep Learning. Available at:

http://neuralnetworksanddeeplearning.com

Olah, C. (2015). Understanding LSTM Networks. (n.d.). Retrieved from Github.io website:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Pandey, P. (2019, March 18). Understanding the mathematics behind gradient descent.

Retrieved from Towards Data Science website: https://towardsdatascience.com/understanding-the-mathematics-behind-gradient-descent-dde5dc9be06e

Sutton, R. S., & Barto, A. G. (2012). Reinforcement Learning: An Introduction. MIT Press.

ISBN 0262193981.

VESELÁ, Jitka, 2007. Investování na kapitálových trzích. 1. Praha: Wolters Kluwer. ISBN 978-80-7357-297-6.

Wiering, M., & van Otterlo, M. (Eds.). (2012). Reinforcement learning: State-of-the-art (2012th ed.). Berlin, Germany: Springer.

Wladawsky-Berger, I. (2018). What machine learning can and cannot do. Wall Street Journal (Eastern Ed.). Retrieved from https://www.wsj.com/articles/what-machine-learning-can-and-cannot-do-1532714166?tesla=y

Zai, A., & Brown, B. (2020). Deep reinforcement learning in action. New York, NY:

Manning Publications.