Neural Networks - Neural networks for

The neural network or artificial neural network is one of the most popular machine learning algorithms at present. The concept of the artificial neural network was inspired by the neural architecture of a human brain. The functionality of artificial neuron is similar to a human neuron, it takes in some inputs and creates an output [dee20].

A neuron in machine learning contains a mathematical function that termed as an activation function. The most popular activation functions are sigmoid, tanh, ReLU and softmax [Nig18]. The idea behind artificial neuron

cell bodies and axons using simplified mathematical models. Signals can be received from dendrites and sent down the axon once enough signals were received. This outgoing signal can then be used as another input for other neurons, repeating the process [Nag18]. Thus, neurons are gathered into layers of a neural network.

Any neural network has one input, one output layer and a required num-ber of hidden layers. This numnum-ber of hidden layers depends upon the com-plexity of the problem to be solved. Moreover, each of the hidden layers can have a different activation function which depends on the problem in question and the type of data being used.

Neurons learn certain weights at every layer to make a prediction. The algorithm through which they learn the weights is called backpropagation.

A neural network that have more than one hidden layer is generally called a Deep Neural Network [Nig18].

2.2.1 Convolutional neural networks - CNN

AConvolutional neural networkis basically a neural-based approach, that is applied on matrix. The word "convolution"n, explains that the basic operation in this network is the convolution operation. The CNN contains one or more convolutional layers, pooling or fully connected. These layers (Figure 2.1) calculate the mathematical operation between a part of input and kernel. The final result of this process of the convolutional operation is called a feature map. The parts are defined by the size of matrix, step size, variant of moving kernel over the input and else [IG16].

The Convolutional Layer (The Kernel/Filter)

The Convolutional Layer is an element involved in the convolution operation in the first part of the convolutional layer. It contains a set of independent filters that are randomly initialized. This filters can detect low-level features [Sah18].

Pooling Layer

The Pooling Layer is responsible for reducing the matrix size of the Con-volved Feature. This layer should reduce the processing power needed to process the data by reducing the dimensionality. Many pooling types exist, but most common are Max Pooling and Average Pooling [Sah18].

• Max Pooling returns the maximum value. It can be used as a Noise Suppressant.

Figure 2.1: An illustration of basic CNN model for text classification taken from the [SR19]

• Average Pooling returns the average of all the values.

The main goal of Pooling layer is to reduce the amount of information in each feature obtained at the convolutional level, leaving only the most essential information.

Classification (Fully Connected Layer)

The fully connected Layer on Figure 2.1 takes the output of previous layers and flattens them, then turns them into one vector, which can be input for the next stage. The softmax activation function, which is on the last fully connected layer, is normalizing the output into probabilities. The fully-connected layer with softmax is used to get the final probabilities.

Natural Language Processing (NLP) can be implemented with Deep Con-volutional neural network (DCNN) when the output of one layer is fed to the next layer [Mic17].

2.2.2 Recurrent neural network - RNN

A Recurrent neural network is one of the classes of artificial neural net-works with a linear recursive structure. In contrast with CNN, that spe-cialized in processing with grid-like data. The RNN owning to the linear

basic idea of RNN is to share parameter across a deep computational graph.

At each step, RNN takes part of the input and produces a function of the output, that is used in the next step to prepare the next output. This recur-rent formulation makes it attainable to use similar weights over completely different positions in time [IG16].

Often two architecture of recurrent unit are used: the Long Short Term Memory (LSTM) and theGated Recurrent Unit (GRU). LSTM is described in the next section.

In conclusion, when RNN process sequences of tokens, it keeps track of state that represents the memory of the previous tokens. This ability makes RNN very useful in language processing.

Although an RNN is a simple and powerful model, in practice, it is difficult to train. There are many learning algorithms for this model, such as Backpropagation Through Time (BPTT), Real-time Recurrent Learning and others. Most of these approaches are based on gradients, which have little success in solving complex problems [Nab19]. The main reasons why this model is so unwieldy are the vanishing gradient and exploding gradient problems described in Bengio et al. (1994) [BSF94].

Long-short-term-memory (LSTM)

The LSTM is one of RNN architecture, that is used in the deep learning field. This variation was introduced by German researchers Sepp Hochreiter and Juergen Schmidhuber [SH97]. The main idea of LSTM is to solve the vanishing gradient problem partially, that can be backpropagated through time and layers.

The goal of LSTM is to remember the information for a long period of time. This specificity is very suitable or processing, predicting and classify-ing data in time. Ownclassify-ing to this fact LSTM can predict the next word after a big distance between dependencies in sequences.

A typical architecture is consist of a cell and three neural gates (input, forget and output). The cell is representing a memory part of the LSTM unit, that chooses what information it needs to store and then it allows to read, write or erasures by gates that open and close.

In document Neural networks for (Stránka 11-14)