Data preparation and general assumptions

2. GOLD MARKET

4.3 Technical indicators

5.1.1 Data preparation and general assumptions

Before diving into modeling, data preparation was performed. We have 14 weekly time series for a set of variables, including gold futures contracts. Gold futures prices have extensive history on various frequencies. However, not all the factors that have been chosen, have the same history and frequency available. Therefore, for the sake of comparison univariate (ARIMA) and multivariate models (ARIMAX, VAR, VECM) weekly data was chosen. The joined dataset is split into a train set, which contains data from 2006 till 2016 (548 observations in total) and will be used for model estimation, and test set, which has remaining 248 observations up to October 2021 and will be used for evaluation of the model effectiveness.

The main requirement for time series is stationary, which means that statistical properties of time series do not change over time. This can be tested by ADF or Augmented Dickey-Fuller, which is aimed to identify the presence of unit root. The hypothesis can be formulated as:

𝐻₀: 𝑡ℎ𝑒 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒 𝑜𝑓 𝑢𝑛𝑖𝑡 𝑟𝑜𝑜𝑡 (𝑛𝑜𝑛𝑠𝑡𝑎𝑡𝑖𝑜𝑛𝑎𝑟𝑖𝑡𝑦) 𝐻₁: 𝑡ℎ𝑒 𝑎𝑏𝑠𝑒𝑛𝑐𝑒 𝑜𝑓 𝑢𝑛𝑖𝑡 𝑟𝑜𝑜𝑡 ( 𝑠𝑡𝑎𝑡𝑖𝑜𝑛𝑎𝑟𝑖𝑡𝑦)

It is a common practice to choose a significance level of 5%, although for financial series analysis, it can be increased up to 10%. It should be noted that for all the tests in this work 5%

significance level has been chosen if anything else is specified. The null hypothesis is rejected and its alternative is accepted if the p-value of the tested time series is lower than the chosen

53

significance level (p ≤ α). In case, that null hypothesis cannot be rejected, time series should be transformed by differencing. The procedure keeps repeating unless time series are proved to be stationary.

Initial time series were transformed to logarithmic values. This is a commonly used technique that allows transforming time series with an exponential trend to time series with a linear trend. Descriptive statistics of the achieved dataset can be found in appendix B. Then an ADF test was performed to detect the existence of stationarity. High p-values do not allow us to reject the null hypothesis about non-stationarity. Therefore, we should transform data to achieve stationarity. Taking the first difference would be the first step. For the sake of consistency in interpretation, stationary variables were also replaced with their first differences. Repeated ADF tests proved stationarity, meaning that time series can be used for modeling. The results of the ADF test before and after transformation for used time series can be found in appendix C.

Having data prepared , we can proceed to the model estimation. The models will be estimated in the following order:

- Regression approach: simple linear regression (weekly frequency), elastic net (weekly frequency)

- Time series approach: ARIMA, ARIMAX, VAR, VECM (weekly frequency).

Evaluation of the model will be made indirectly by comparing the total profit from the applied strategy: if the model predicts a positive return, a long position is taken. If the model indicates negative returns, a short position should be entered. If the prediction of the model equals zero, “no market action” is performed, hence the output of this iteration is zero. The returns from the taken positions are added cumulatively to get the total payoff from the strategy. Besides, further assumptions are considered:

- returns are not reinvested,

- no limits for the positions to be taken, - transaction costs and spreads are omitted.

At first, the performance of the model will be back-tested on historical data (in-sample testing). For the out-sample testing predictions for the period, t will be made based on all known data for the period t-1. Achieved results will be compared to each other and customized benchmarks: random and perfect strategies. The random strategy assumes, that positions are being entered randomly, without considering expected returns. There have been generated 10 000 strategies. Cumulative returns of every strategy have been calculated separately. The estimated model will be considered as effective if the cumulative return of the strategy, performed on the predictions from the model are higher than 95% quantile of the cumulative

54

returns from the strategy based on randomness. Perfect strategy, on the contrary, should be an indicator of the maximum possible profitability. Returns of the strategy are calculated under the assumption that every taken position is profitable. The comparison of all models, which were estimated and tested will be performed in the end of the section, followed by the discussion about potential areas of improvement.

5.1.2 Regression analysis

To start with, the existence of the relationship between returns of the chosen factors and gold futures will be checked by applying regression analysis. As it was mentioned in the theoretical part, it is an extensive group of various approaches, hence will try with the simplest – linear regression. Linear dependency between gold futures returns (so-called target) and returns of the chosen exogenous factors will be taken as a benchmark. Just to remind, those factors are expected inflation, futures on industrial index S&P500, the US Federal funds rate, volatility index VIX, positions of market participants from COT report, and returns of other commodities: natural gas, crude oil, silver, and platinum.

Before diving into model estimation, univariate analysis was performed, that was aimed to provide us with an overview of the possible relationships in the system. On the graph of the linear regression between logarithmic returns of gold futures and logarithmic returns of SP500 futures, we cannot see a clear linear relationship, although there might be a non-linear one. Hence linear regression might not be able to capture enough information to produce results for profitable trading. Univariate analysis for other variables has a similar shape, hence it will not be presented here.

Figure 5. Univariate linear regression between logarithmic returns of gold futures and SP500 Source: own processing in R software

55

After the multivariate regression model has been estimated, we have proceeded to model diagnostics. Estimated coefficients can be found in the table below. Most of the variables appeared to have statistically insignificant coefficients (with a significance level of 5%). In the case of regression, R-squared should be checked, which is a coefficient of determination, or in other words, a proportion of the variability of gold returns that was explained by the model.

For our model it is relatively low – 0.21. This means that only 21% of data variability was captured by the model.

LINEAR REGRESSION

Variable Estimated

coefficient Std.Error p-value

(Intercept) 0.00 0.00 0.23

dLnVIX 0.01 0.01 0.48

dLnEI -1.34 0.78 0.08

dLncFFR 0.15 1.13 0.90

dLnDE 0.78 0.08 0.00

dLnSP500 0.01 0.05 0.76

dLnSl 0.01 0.03 0.62

dLnPl 0.00 0.02 0.88

dLnCO -0.04 0.02 0.10

dLnDXY -0.05 0.11 0.66

dLnNG -0.02 0.02 0.19

dLnMPL 0.00 0.02 0.81

dLnMPS 0.03 0.02 0.17

dLnMML 0.04 0.02 0.01

dLnMMS 0.01 0.00 0.02

*5% significance level

Table 1. Estimated coefficients of the linear regression Source: own processing

The next step will be analyzing residuals. Residuals of the model should be normally distributed, replicating the properties of the white noise. From the graph below is seen, that this condition has not been met.

56

Figure 6. Residuals of the estimated model of linear regression Source: own processing in R software

For financial analysis the lack of normality is a known issue, and this condition is usually omitted. Although it may cause biased parameter estimates and biased values of the model diagnostic tests. That is why we will proceed to test the profitability of the strategy, based on the values, predicted by the model.

Figure 7. Out-sample cumulative returns of the strategy based on the results from Linear regression (LR), Linear regression without insignificant variables (LR, restricted). Source: own processing

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6

Returns

Out-sample cumulative returns

LR LR, restricted 5% quantile 95% quantile

57

The profit of the strategy is seen as statistically significant if cumulative returns for the tested period are higher than 95% quantile of the returns, achieved from the random strategy. Using the predicted returns from the estimated model, the applied trading strategy appeared to be loss-making. The cumulative loss by the end of the tested period is -0.33. Aft It was decided not to use any regularization methods. The main reason is the nature of the relationship between the variables. It seems, that there does not seem to be any linear regression present.

Hence, regularization techniques will not bring any improvement. Overall, linear regression appeared to be less successful and will be excluded from the final comparison. From here we will proceed to time series analysis.

5.1.3 Time series analysis

5.1.3.1 ARIMA and ARIMAX

In the following sub-section, we will focus on time series analysis. We will start with a univariate model ARIMA, built on weekly data of gold futures. Then it will be transformed into its extended version ARIMAX by adding exogenous variables. This model is included to test the existence of a linear relationship between gold futures returns and returns from the number of factors, that are believed to have an impact on futures gold price. The reasons behind chosen factors have been listed in the theoretical part. For the purpose of reminding, those factors are expected inflation, futures on industrial index S&P500, the US Federal funds rate, volatility index VIX, positions of market participants from COT report, and returns of other commodities: natural gas, crude oil, silver, platinum. It should be mentioned that all the factors are shifted in time for one period backward with respect to the gold futures returns. This is done due to the fact, that we will not possess the real future values for exogenous variables at the time of predicting. Either predictions will be used, which will have an impact on the quality of predictions, or the concept of the model should be changed: the linear relationship between exogeneous variables at the time t-1 and endogenous variable at the time t. The process of model estimation and diagnostics will be described in the following subsections. After that, the models will be evaluated by applying a simple trading strategy described earlier.

Model estimation

The first step, that should be taken to estimate the parameters of the model is to check the graphs of ACF, or Autocorrelation function and PACF, or Partial autocorrelation function.

ACF and PACF show the correlation between delayed variables, with the only difference that PACF omits the influence of intermediate values. These two graphs should be analyzed

58

simultaneously: gradually decreasing ACF with only several significant lags of PACF means that the AR model should be chosen. The opposite scenario suggests that it should be MA and the order corresponds to the number of lags significant on the graph. The analysis of graphs of gold futures is not clear: PACF and ACF are replicating each other, keeping all the lags inside confidential intervals. Besides, PACF has a sign of cyclicity with repetitive ups and downs. It might be caused by seasonality, which, however, was rejected by the test on seasonality. The other hypothesis, that may lead to the repetitiveness on the graph is rollover ⁴⁹ or the junction of two contracts, which requires adjustment of the returns the day after the initial contract was closed and the new one was opened. However, adjusting the time series did not bring any significant change. For further analysis, it will not be used.

Figure 8. ACF and PACF, gold futures weekly time series Source: own processing in R software

49 https://www.investopedia.com/terms/r/rollover.asp

59

To estimate appropriate parameters for ARIMA model, a build-in function auto.arima ⁵⁰was used, which selects parameters with the minimal information criteria. The lowest AIC and BIC correspond to ARIMA (2,0,0) and ARIMAX (1,0,1). ARIMA (2,0,0) model, that is suggested, assumes that the price of the gold futures contracts might linearly depend on its two preceding lagged values. In the case of ARIMAX, the estimated model is dependent on one lagged value of its time series, exogeneous variables for the period t-1, and one preceding value of an error term. Estimated coefficients for both models can be found in the table below. The correct model is supposed to have all the estimated coefficients statistically significant. For simple ARIMA (2,0,0) p-values of estimated coefficients are higher than our 5% level of significance, indicating that estimated relationship might not be statistically significant. Increased order of ARIMA did not change the results. Despite being misspecified, the model can be profitable in practice, therefore we will proceed to test the statistical significance of the model performance on the out-sample. ARIMAX, on the contrary, performed much better, despite the fact, that not all the variables appeared to have a statistically significant relationship. For the purpose of comparison, we will try excluding the insignificant variables, and the re-estimated model will be reported under the name of ARIMAX (1,0,1), restricted.

ARIMA (2, 0, 0) ARIMAX (1, 0, 1)

Variable Estimate Std.error p-value Variable Estimate Std.error p-value

ar1 -0,032 0,043 0,46 ar1 0.328 0.186 0.08**

*5% significance level, ** 10% significance level

Table 2. Estimated coefficients for ARIMA (2,0,0) and ARIMAX (1,0,1) Source: own processing

50 https://www.rdocumentation.org/packages/forecast/versions/8.3/topics/auto.arima

60

Model diagnostics

Statistically correct model is supposed to have normally distributed residuals, with constant variance and the absence of autocorrelation. The distribution of the residuals is tested in two steps. First, residuals are plotted on the histogram. At a first glance, the distribution does not look like normal, mostly because of the high peak, which indicates increased kurtosis. This will be tested in the next step by applying the Jarque-Bera test (Jarque, Bera,1987). Null hypothesis (𝐻₀) states, that the distribution is normal. The alternative hypothesis (𝐻₁) rejects normality. The results of the test show, that with the reported p-values close to zero, on 5% of significance level null hypothesizes about normal distribution for ARIMA (2,0,0) and ARIMAX (1,0,1) are rejected. It should be noted that financial time series are not normally distributed in most of cases because of the high portion of noise. This leads to the distribution having unknown shape with mutable moments (variance, skewness, kurtosis), which is complicated to model.

Figure 8. Histograms of the ARIMA (2,0,0) residuals Source: own processing in R software

ARIMA (2, 0, 0) ARIMAX (3, 0, 1)

JB-test JB-test

Test statistic p-value Test statistic p-value

56,374 0,000 329.11 0,000

Table 3. Jarque-Bera test results for ARIMA (2,0,0) and ARIMAX (1,0,1) Source: own processing in R software

61

Constant variance, or homoscedasticity is the next step in the model diagnostic procedure. R package ⁵¹ has two predefined tests: Portmanteau-Q test (PQ), which tries to identify whether the squared residuals form the sequence of white noise, and Lagrange-Multiplier test (LM), which tests the significance of the linear regression model for the squared residuals. 𝐻0 states that the variance is constant or homoscedastic residuals, 𝐻1 indicates heteroscedasticity. The output can be found in the table below.

ARIMA (2,0,0) Source: own processing in R software

In order to accept the null hypothesis, residuals should be evenly distributed around the red line on the left graph of the Portmanteau-Q test, which can be found below, replicating the behavior of the white noise, and on the right side, copying the linear model on the graph of Lagrange-Multiplier test. On the confidence level of 95% and for 24 tested lags, we cannot reject the null hypothesis about homoscedasticity of residuals for ARIMA (2,0,0). An alternative hypothesis would mean that the volatility of the residuals changes as the value of the factor changes over the time. Consequently, the coefficients of the model can be less

51 https://search.r-project.org/CRAN/refmans/aTSA/html/arch.test.html

62

precise, meaning that they may be further from the actual values from the population. Taking into account the results of the test, we will extend the estimated model with the GARCH component. GARCH, or generalized autoregressive conditional heteroskedasticity, models the volatility of the timeline, that can be changing over the time, which is a common issue for financial data (Christoffersen, 2012). Models will be compared in order to assess the effectiveness of the added part.

Figure 9. Portmanteau-Q test (left) and Lagrange-Multiplier test (right) residual decomposition for ARIMA (2,0,0) Source: own processing in R software

From the performed tests for ARIMAX (1,0,1), on the contrary, we are not able to accept the null hypothesis about homoscedasticity. Hence, similar to ARIMA (2,0,0), we will proceed with the estimated model and its extended version ARIMA + GARCH.

The last test is aimed to identify the presence of correlation between residuals at different times. Autocorrelation means, that current values are dependent on preceding values.

It can also mean, that not all of the factors that have an impact on the target variable have been included. To start with, ACF and PACF should be examined. Interestingly, the ACF and PACF of the residuals for both models replicate the corresponding graphs of the initial timeline. That can indicate that the time series is “noisy” and the predictive power of the model can be reduced.

Figure 10. ACF function of the ARIMA (2,0,0) residuals Source: own processing in R software

63

Figure 11. PACF function of the ARIMA (2,0,0) residuals Source: own processing in R software

Next step is performing Ljung-Box test, which, compared to Durbin-Watson statistics tests not only correlation at the first lag, but examines the randomness of the overall system.

Testing hypotheses are the following: 𝐻₀ states that residuals are not correlated, 𝐻 ₁ indicates the presence of correlation. Achieved p-values of 0.974 and 0.917 for ARIMA (2,0,0) and ARIMAX (3,0,1) do not allow us to reject the null hypothesis and conclude that residuals are not correlated up to ten tested lags. Durbin-Watson test produced the same results.

ARIMA (2,0,0) ARIMAX (1,0,1)

Box-Ljung test Box-Ljung test

Test statistic p-value Test statistic p-value

0,001 0,974 0,0107 0,917

Table 5. Box-Ljung test results for ARIMA (2,0,0) and ARIMAX (3,0,1) residuals Source: own processing in R software

The conclusion of diagnostics procedure is that the models do not fulfill all the requirements. As it was mentioned, the absence of normality is a common issue for financial time series, which distribution is of an unknown shape. The consequence of the heteroscedasticity isthat the amount of error in the models is not stable as the predictive power of the model is constantly changing with the changes of the factors. Models with the added GARCH part are supposed to solve this issue.

Information about estimated models ARIMA (2,0,0) + GARCH (1, 1) and ARIMAX (1,0,1) + GARCH (1,1) including performed tests can be found in the table below. Parameters of GARCH (1,1) means that the volatility of the tested time series is following the process of

64

ARIMA (1,0,1). It is usually sufficient to use the simplest form of GARCH to model financial time series, that is why it was used in our case.

Analyzing the estimated coefficients, it is seen, that added GARCH component did not improve the significance of the AR component: the p-value of the estimated coefficients is still higher than 5% significance level. However, the parameters of the GARCH appeared to be statistically significant. For ARIMAX, however, all the added factors resulted as statistically insignificant.

Performed ARCH test on standardized residuals for ARIMA (2,0,0) and ARIMA (1,0,1) with GARCH component could not reject the hypothesis of constant variance for three tested lags. Then p-value is constantly decreasing and starting from fifth lag the null hypothesis should be rejected, suggesting heteroscedasticity.

The results from the test did not fulfill all the assumptions of the correct model.

Besides, we assume, that statistical insignificance of all the exogenous variables after partial elimination of the impact of heteroscedasticity can suggest, that added factors can be useful only during the periods of increased volatility. Despite being misspecified and being unable to

In document Hlavní práce75201_onym00.pdf, 2.6 MB Stáhnout (Stránka 52-74)