• Nebyly nalezeny žádné výsledky

Oponentura70436_xtomp33.pdf, 58 kB Stáhnout

N/A
N/A
Protected

Academic year: 2022

Podíl "Oponentura70436_xtomp33.pdf, 58 kB Stáhnout"

Copied!
2
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

Assessment of Master Thesis – Opponent

Study programme:Quantitative Methods in Economics Field of study:Quantitative Economic Analysis

Academic year:2020/2021

Master Thesis Topic:A Comparative Study of Financial Time Series Forecasting Using Machine Learning and Traditional Statistical Methods – An Application To Stock Market Data

Author’s name:Mesut Yasar Ozturk

Ac. Consultant’s Name:doc. Ing. Tomáš Formánek, Ph.D.

Opponent:Ing. Petra Tomanová, MSc

Criterion Mark

(1–4)

1. Clarity and comprehensibility of the thesis topic and aims 1

2. The extent and relevance of the description of the current state of knowledge 3

3. The complexity of the thesis topic 1

4. Method adequeteness for solving the given issue, correctness of the choice and use 3

5. The extent, quality and precision of the result description 4

6. Relevance and correctness of the result discussion 3

7. Factual contribution of the thesis result 3

8. Information source relevance and citation correctness 4

9. Logical structure and cohesion among individual parts 3

10. Grammar, linguistic style, terminology and overall arrangement 3

Comments and Questions:

This thesis aims to explore advanced machine learning models and to optimize machine learning algorithms while applying them to financial time series. It strives to answer the question of whether

machine learning algorithms can be optimized in a way to provide robust prediction results on stock market data; whether machine learning regression models are more accurate than the conventional ARIMA model in terms of predictions; and whether Facebook’s Prophet ARIMA has higher prediction power than the conventional ARIMA.

The strengths of the thesis include clearly defined and structured research objectives and research questions. Also, I would like to highlight that the topic is rather challenging for students of Quantitative Methods in Economics. Before mentioning my concerns, I would like to emphasize that the thesis

extensively elaborates (on 110 pages) a nontrivial topic and is properly motivated. It should be appreciated that the author accepted this challenge.

In terms of my concerns, I miss accurate definitions, statements, mathematical formulas,… in other words, clear academic writing. The vague and inaccurate part starts in Section 3.2 Bias-variance trade-off and does not improve afterward. Many formulas contain mistakes, use undefined variables, or are inaccurately defined. The source of the problem might be the fact that the thesis adopts a non-academic style that is used in the books for machine learning practitioners. When I read Section 2.4 (From Statistics to Machine Learning), I found several statements to be misleading, for example, it tries to distinguish between statistical and machine learning models without giving any proper definition. In general, I disagree that statistical models are those that fit the best possible hyperplane and machine learning minimizes the errors by solving an optimization problem and not vice versa. Moreover, it is not true that statistical modeling requires a multi-collinearity check and machine learning models do not. Many well-known models contradict this statement!

This led me to the source – the book of Dangeti (2017). Based on a very brief look, it seems that Dangeti (2017) thinks that the statistical models start and end with linear models estimated by the OLS. As a researcher focusing on dynamic nonlinear models I have to say that it is not definitely true. Sadly, the thesis is written in this spirit and it adopts some poor standards and old-fashioned/non-logical opinions

(2)

mentioned in the book. I also opened another key book that is written by Tatsat et al. (2020), which academics with a statistical background should also find unacceptable (also confirmed by several online reviews). One “masterpiece” is for example the section devoted to the OLS, especially Figure 4-2 which is unfortunately referred in the thesis on page 42, Figure 4.9. The author of the thesis should explain how this figure can possibly explain the classical linear model estimated by the OLS. Another example is Table 4.4, which is misleading, questionable, and inaccurately defined. These books are absolutely not a good choice as an academic reference.

It also seems to be an explanation why I personally do not like the formatting and visual aspect of the thesis, which is probably inspired by the books. For example, Figures have very poor quality; the thesis is written in a book-like structure, however, it is left-aligned and some chapters start on a new page, some do not; paragraphs are unbalanced/poorly structured (in many cases one sentence = one paragraph without a logical reason); it contains typos;… The book of Tatsat et al. (2020) and especially the book of Dangeti (2017) contain Figures of extremely poor quality, so once it is copy-pasted into the thesis, it becomes even worse.

Regarding the hyperparameters tuning for neural networks or random forest – their choice is sometimes rather questionable. For example, the best random forest model from the grid search has the maximum depth equal to 90 (!) and the minimum number of observations in the leaf equal to 5 which might indicate massive overfitting. However, no cross-validated test and train fit quality metrics are reported to confirm this suspicion. This is also one of my concerns in general – the overfitting degree of machine learning algorithms is not considered in the comparison at all.

Moreover, the author should answer the following questions:

1. The thesis states that “We concluded that by fine-tuning hyperparameters, suitable feature engineering, and by the aid of powerful computers, it is doable to utilize ML and deep learning algorithms to receive brilliant outcomes.” – Based on which specific piece of evidence you concluded that? What does “brilliant outcome” mean? Do you expect a huge profit? How can you know that?

Based on what I have seen in the thesis I doubt so.

2. The thesis mentions the GARCH models and states that “However, volatility clustering is a key phenomenon in financial time series and the ARMA model lacks in capturing this key phenomenon.” – So why the ARMA models are adopted instead of ARMA-GARCH models which can also be easily adopted? Figure 6.6 clearly shows that volatility clustering is present and not captured.

3. The author should explain Figure 4.9: why are the red dots called residuals? Why are the orthogonal distances plotted?

Despite some poor choices of sources, many mistakes, and a rather shallow implementation and

evaluation of machine learning models, the thesis fulfills all requirements. I recommend it for defense, with a suggested grade:Good.

Conclusion: The Master Thesis is recommended for the defence.

Suggested Grade: 3

Date: 24/05/2021 Ing. Petra Tomanová, MSc

Opponent

Odkazy

Související dokumenty

Master Thesis Topic: Ikea effect in rule-based machine learning models Author’s name: German

Market entry strategy for Germany and Austria through collection and delivery points: Alza Case Study Author of the Master´s Thesis:..

Market entry strategy for Germany and Austria through collection and delivery points: Alza Case Study Author of the Master´s Thesis:..

By adopting a suitable feature set, meticulous data pre-processing, and training optimization, our findings demonstrate that machine learning algorithms along with the

Master Thesis Topic: A Comparative Study of Financial Time Series Forecasting Using Machine Learning and Traditional Statistical Methods – An Application To Stock Market Data..

(2006) Seasonal Time Series Forecasting: A Comparative Study of ARIMA and ANN Models. Stock return predictability and the adaptive markets hypothesis: Evidence from century-long

Master Thesis Topic: Generic web-service client for cloud-based machine learning platforms Author’s name: Mehmet Ali

This practically-oriented thesis addresses the emerging topic of machine learning as a service (MLaaS).. In the thesis, the student demonstrates how to implement a web