• Nebyly nalezeny žádné výsledky

R ECURRENT N EURAL N ETWORKS

One of the major shortcomings of FFNN application within finance is the inability to deal with sequential data. Whenever the inputs are time series or any other kind of sequence data, the classical neural network cannot properly handle them. It is possible to feed the entire sequence of data to a NN at once but then the notion of time will be lost. For this kind of data there is a more suitable solution called recurrent neural network.

A recurrent neural network is a generalization of FFNN that has an internal state, called memory.

In simple terms, RNN remembers the past and its decisions are affected by knowledge it has learned from the past. RNN is recurrent in nature as it performs the same function in a loop for every input while the output for every input depends on the past computation. After producing the output, it is copied and sent back into the network. For making a decision, it considers the current input and the output that it has learned from the previous input. In RNN the output is influenced not just by weights applied on inputs like in a regular NN, but also by a “hidden”

state vector representing the context based on prior inputs and outputs.

Figure 3.5 displays how the RNN works. First the input is sent to the network and the output is produced. When the next input arrives (the next time step in financial time series), it is fed to RNN together with the previous output.

Figure 3.5 The information flow in RNN. Source: Chollet, 2017

Unlike feedforward neural networks, RNNs can use their internal state to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. In other neural networks, all the inputs are independent of each other but in RNN, they are related to each other (Chollet, 2017).

Each node in RNN has two sets of weights: in addition to the input weights as in FFNN, we use weights for the outputs from the previous time step. The output 𝑦 from a single layer RNN can be computed as follows:

𝑦𝑡 = 𝑔(𝑥𝑡∗ 𝑤𝑥+ 𝑦𝑡−1∗ 𝑤𝑦+ 𝑏) (3.30) The part of an RNN that preserves the state is known as a memory cell. In general, a cell state at time 𝑡 is a function of some inputs at that time step and its state at the previous time step:

𝑡 = 𝑓(ℎ𝑡−1, 𝑥𝑡) (3.31) An output at time step 𝑡, denoted 𝑦𝑡, is also a function of the previous state and the current inputs. In the case of the basic cell, the output is simply equal to the state, but in more complex cells as in LSTM networks, this is not always the case.

For the problems that require learning long-term temporal dependencies, it can be difficult to train standard RNNs. This is because the gradient of the loss function decays exponentially with time, so-called vanishing gradient problem. The long-term dependency problem can be solved by using the improved version of plain RNNs that focuses on long-term dependencies called Long Short-Term Memory. All RNNs have the form of a chain of repeating modules where the repeating module have a very simple structure, such as a single tanh layer.

Figure 3.6 The chain of RNN modules. Source: Olah, 2015

LSTMs also have this chain like structure, but the repeating module is more complicated. Instead of having a single layer, there are four, interacting in a very special way.

Figure 3.7 The chain of LSTM modules. Source: Olah, 2015

The key building block of LSTM is the cell state which is the horizontal line running through the top cells’ chain. The cell state has only basic linear interactions allowing it to preserve the information as much as possible. The structures called gates control the information flow into the cell state. There are 3 such gates composed out of a sigmoid layer and a pointwise multiplication operation.

The first gate known as forget gate decides what information should be thrown away from the cell state. It receives the input and the output from the last cell and applies a sigmoid function which returns the number between 0 and 1, the higher the number – more information will be kept (Olah, 2015).

Figure 3.8 The forget gate in LSTM. Source: Olah, 2015

The second gate controls what new information will be stored in the cell state, and it has two parts. First, the sigmoid function is applied again to the input and previous output (but with different weights and bias) to decide which values will be updated. Next, a tanh function is applied to create a vector of new candidate values, 𝐶̃𝑡, that could be added to the state. In the next step, these two are combined to create an update to the cell state (Olah, 2015).

Figure 3.9 The transformation of new inputs to be added. Source: Olah, 2015

To update the cell state, the old state is multiplied by the output from the forget gate, throwing away the things we decided to forget. Then, new information is embedded by adding new candidate values, scaled by how much we decided to update each state value (Olah, 2015).

Figure 3.10 Update of a cell state. Source: Olah, 2015

Now, the final output can be calculated using the updated cell state. Again, the sigmoid function is applied (with new weights and biases) to decide what parts of the cell will be used. Then, the tanh function is applied to the cell state and multiplied by the output from the sigmoid gate, so that we only output the parts we decided to (Olah, 2015).

Figure 3.11 The output from an LSTM cell. Source: Olah, 2015

4 Reinforcement Learning

This chapter is fully devoted to the concept of reinforcement learning and the theoretical ideas described here will be applied on the real data in the next chapter of this thesis. This chapter begins with an overview of the main building blocks in reinforcement learning along with a theoretical framework widely used as a formulation of RL problems called Markov decision process. The chapter then looks at value functions and policy which are two most important entities in the whole reinforcement learning field. Finally, one of the most popular RL method called actor-critic will be described in more detail.

As was mentioned, reinforcement learning is the distinct category of machine learning algorithms. Reinforcement learning is learning what to do in specific situations to maximize a numerical objective, which is represented by a reward signal. The learner is not provided by best actions explicitly, but instead must discover them on its own through the reward received by trying different actions. In complex problems, each action affects not only immediate reward but also future rewards and future actions. These two characteristics - trial-and-error search and delayed reward - are the two most important distinguishing features of reinforcement learning (Lapan, 2020). The term reinforcement comes from the fact that reward obtained by an agent should reinforce its behavior in a positive or negative way (Sutton & Barto, 2012).

There are two major RL entities – agent and environment, and the communication channels are – rewards, actions and observations.

- An agent is a piece of software that interacts with the environment by making observations about the environment’s state, executing certain actions, and receiving eventual rewards for this. The agent's goal is to maximize the cumulative reward it receives in the long run (Sutton & Barto, 2012).

- The environment is everything outside an agent, it has a certain state at a particular time.

- Observations are pieces of information that the environment provides the agent to tell what is going on around the agent.

- Reward is a scalar value the agent obtains periodically from the environment, and it is the main force that drives the agent's learning process. The purpose of reward is to tell our agent how well it has behaved. Reward is local, meaning that it reflects the success of the agent's recent activity and not all the successes achieved by the agent so far (Lapan, 2020).

- Actions are things that an agent is able to do in the environment.