Object Detection - Define crucial components of the model

1. Introduction

7.2 Define crucial components of the model

7.2.5 Object Detection

The object detection is a crucial part of the system where the high precision defines how well and adequate the whole system will behave in conditions when other agents surround it.

In this case there is a need for a highly efficient algorithm which outputs the most relevant and precise knowledge about vehicle surroundings.

The most efficient way for solving mentioned task of object detection is to use the convolutional neural networks and algorithms based on their architecture.

 YOLO

One of such algorithms based on the convolutional neural networks which shows results of great precision is named YOLO.

YOLO's ideals are comparatively basic. At the same time, for these boxes, a single, convolutional network predicts multiple bounding boxes and class probabilities.

YOLO trains the images in full and directly optimizes the detection performance.

(Redmon, Divvala et al., 2016) This unified model has varying benefits compared to traditional methods of object detection. To predict detections simply run the neural network at the time of the test on a new picture. (Redmon, Divvala et al., 2016) The base network runs at 45 frames per second with no batch processing, and a fast version runs at more than 150 fps. (Redmon, Divvala et al., 2016) That means it can process streaming video in real time with a latency of less than 25 milliseconds. (Redmon, Divvala et al., 2016) Moreover, YOLO reaches the average mean precision of other real-time systems more than twice. (Redmon, Divvala et al., 2016)

Second, when making predictions, YOLO gives global reasoning for the picture.

Unlike sliding window and region-based proposal-based methods, YOLO sees the whole picture during training and testing time so it encodes qualitative information about classes and their presence implicitly. (Redmon, Divvala et al., 2016) Fast R-CNN, a top detection device, mistakes background patches for objects in an image, because there is no visibility of the larger context. (Redmon, Divvala et al., 2016) YOLO does less than half the amount of background errors as opposed to Fast R-CNN.

(Redmon, Divvala et al., 2016)

Figure 8 – Object Detection - YOLO algorithm

Thirdly, YOLO discusses generalizable representations of objects. Trained on natural images and tested on artwork, YOLO outperforms the top detection methods like DPM and R-CNN (Redmon, Divvala et al., 2016) by a wide margin. Since YOLO is highly generalizable, it is less likely to break down when applied to new contexts or unanticipated inputs. When implementing the object detection method, all the listed benefits make YOLO extremely profitable. (Redmon, Divvala et al., 2016)

 Sparse Scene Flow

Another object detection algorithm is called Sparse Scene Flow. It relies on 3D representation of the object and uses a sparse scene field. It uses optical field vectors which are associated with 3D points from the points cloud. (Hermes et al, 2010) To achieve that it computes the average velocity in a neighborhood around it. (Hermes et al, 2010)

As a result there is an a point cloud attributed with motion, also knows as scene flow field.

In order to separate moving groups of 3D points which are behaving differently from each other a clustering algorithm based on graphs is used.

In order to achieve the planned, separation of moving objects from stationary objects is performed. For that it takes just those focuses from the scene stream point set 𝑆 = {𝑠1, … , 𝑠𝑀} with a speed ||[𝑠𝑖, 𝑥, 𝑠𝑖, 𝑦]𝑇|| > 𝜀𝑣 comparative with the fixed condition, which brings about a decreased scene stream point set 𝑆𝜀𝑣 ⊆ 𝑆. (Hermes et al, 2010) Items stopping when entering the sensors field of view will be considered as fixed articles. (Hermes et al, 2010) A bunch 𝐴 ⊆ 𝑆𝜀𝑣 can be characterized as a point set with greatest comparability though the likeness between various groups is limited.

(Hermes et al, 2010) For each point 𝑠𝑖 ∈ 𝑆𝜀𝑣\𝐴 a 𝛿-vicinity exists with the end goal that si is added to 𝐴𝑛 if ∃𝑠𝑗 ∈ 𝐴: 𝑑(𝑠𝑗, 𝑠𝑖) < 𝛿, where 𝑑(·,·) signifies the Euclidean separation metric and 𝛿 is a separation limit. (Hermes et al, 2010)

Followed vehicles may decrease their speed or stop totally, particularly at crossing points. Henceforth, a bunch won't generally be distinguished, to such an extent that it can't be utilized as the main following element.

This part of model is implemented in python and placed in the appendix as a

“YOLOLayer” class and others with an appropriate functionality.

Figure 9 – Sparse scene flow module

73 7.2.6 Object Tracking

For the following, it is important to focus fundamentally on following by close recognition draws. These are approaches which affect models of deep learning-based article recognition. (Chou et al., 2019) This is because in the first video description, following strategies that do not perform identification require manual, near, ideal introduction of the state data of each street operator. (Chou et al., 2019) Furthermore, techniques that do not use object discovery need to know from the beginning the quantity of street operators in each edge so they do not struggle with cases where new street specialists join the scene during the video process. (Chou et al., 2019) Following by location approaches beat these impediments by utilizing a recognition system to perceive street specialists entering anytime during the video and introduce their state-space data. (Chou et al., 2019)

Figure 10 – Object tracking module algorithm

This part of model is implemented in python and placed in the appendix in method “generate_detections” with an appropriate functionality.

7.2.7 Planning

As this research involves development of vehicle control system and taking it on various street conditions and driving use cases, there is a need to move to an increasingly particular, situation explicit and holistic approach for arranging its direction. In this methodology, each driving use case is treated as an alternate driving situation. This is helpful in light of the fact that an issue presently detailed in a specific situation can be fixed without influencing the working of different situations rather than the past adaptations, wherein an issue fix influenced other driving use cases as they were completely treated as a solitary driving situation. Planning module spotlights on check to-control self-ruling driving on urban streets and presents two new arranging situations.

Figure 11 – Planning module algorithm

 Planning Component

In this research, the planning module engineering is adjusted to join new check to-control driving situations on urban streets. As found in the figure underneath, there are two new complex situations Emergency and Park-and-go. So as to design these situations successfully, there are two new Deciders - Path Reuse Decider and Speed Bound Decider and have refreshed existing deciders making the arranging engineering powerful and adaptable enough to deal with a wide range of sorts of urban street driving situations.

Each driving situation has its arrangement of driving parameters that are exceptional to that situation making it more secure, effective, simpler to tweak and investigate and progressively adaptable.

 Lattice Planner

The main trajectory planner used in this word is a Lattice Planner. Lattice planners compel the pursuit space by restricting the activities that the sense of ego vehicle can take anytime in the workspace. This arrangement of activities is known as the control set of the lattice planner. This control set, when combined with a discretization of the workspace, describes an implicit graph. This graph could then be able to be looked through utilizing a graph search algorithm, for example, Dijkstra's or A*, which brings about quick calculation of ways. Impediments can set edges that cross them to unbounded expense. So the chart search permits us to perform impact checking too. While the cross section organizer is frequently very quick, the nature of ways are touchy to the chosen control set. A typical difference on the cross section organizer is known as the conformal grid organizer. Where an objective focuses are chosen some separation in front of the vehicle, along the side balance from each other as for the course of the street and a way is upgraded to every one of these focuses. The way that best fulfills some goal while likewise remaining crash free is then picked as the way to execute.

Figure 12 – Lattice Planner algorithm

This part of model is implemented in python and placed in the appendix in such methods as a “Proc” and “Plan” and appropriate functionality.

78 7.2.8 Prediction

The Prediction module contemplates and predicts the conduct of the considerable number of obstacles distinguished by the recognition module. Expectation gets snag information alongside fundamental discernment data including positions, headings, speeds, increasing velocities, and afterward produces anticipated directions with probabilities for those hindrances.

Figure 13 – Prediction module algorithm

Together with forcefully accentuating on alert when continuing to an intersection, this model will presently filter all deterrents that have entered the intersection insofar as figuring assets grant.

 Evaluator

The Evaluator predicts way and speed independently for some random hindrance. An evaluator assesses a way by yielding a likelihood for it.

There exists five sorts of evaluators. As Cruise and Junction situations were incorporated, their relating evaluators (Cruise MLP and Junction MLP) were included too. The rundown of accessible evaluators include:

• Cost evaluator: likelihood is determined by many cost capacities

• MLP evaluator: likelihood is determined utilizing a MLP model

• RNN evaluator: likelihood is determined utilizing a RNN model

• Cruise MLP + CNN-1d evaluator: likelihood is determined utilizing a blend of MLP and CNN-1d models for the voyage situation

• Junction MLP evaluator: likelihood is determined utilizing a MLP model for intersection situation

• Junction Map evaluator: likelihood is determined utilizing a semantic guide based CNN model for intersection situation. This evaluator was made for alert level impediments

• Social Interaction evaluator: this model is utilized for people on foot, for transient direction expectation. It utilizes social LSTM. This evaluator was made for alert level obstructions

• Semantic LSTM evaluator: this evaluator is utilized in the new Caution Obstacle model to produce momentary direction focuses which are determined utilizing CNN and LSTM.

 Predictor

Indicator creates anticipated directions for impediments. Presently, the bolstered indicators include:

 Empty: impediments have no anticipated directions

 Single path: Obstacles move along a solitary path in parkway route mode.

Impediments not on path will be disregarded.

 Lane arrangement: impediment moves along the paths

 Move arrangement: impediment moves along the paths by following its motor example

 Free development: deterrent moves openly

 Regional development: deterrent moves in a potential district

 Junction: Obstacles push toward intersection exits with high probabilities

 Interaction indicator: process the probability to make back expectation results after all evaluators have run. This indicator was made for alert level obstructions

 Extrapolation indicator: expands the Semantic LSTM evaluator's outcomes to make a 8 sec direction.

The forecast module appraises the future movement directions for every single apparent hindrance. The yield expectation message wraps the discernment data.

Forecast both buys in to and is activated by recognition snag messages, as demonstrated as follows:

The expectation module likewise takes messages from both restriction and arranging as information.

 Flow-based Prediction

The flow-based prediction technique is based on the approach proposed by Hermes in his work Vehicle Tracking and Motion Prediction in Complex Urban

Scenarios (Hermes et al., 2010). The prediction based on stream uses the velocity data from the scene stream region. (Hermes et al., 2010) Using the linear approach with median velocity of all scene stream focuses within the container model with time contrast between two successive casings, a vehicle is predicted to the present time phase. (Hermes et al., 2010) Zero values indicate the non-attendance of improvement in calculation of the model. (Hermes et al., 2010) The model uses a two-dimensional vector that corresponds to the picture planes of the frame, on the grounds that the optical stream does not yield any depth data. (Hermes et al., 2010) The cloud-based point mean shift is required to manage the alteration in depth adequately well.

 Trajectory Particle Filter

In this research, trajectories are contrasted with construe a prediction of the movement state. So as to locate a suitable coordinating technique, lets characterize the accompanying necessities to the measurement:

1. Handling of various testing rates: Different sensors run on various frequencies. (Hermes et al., 2010)

2. Insensitivity to anomalies: It is possible that noise may occur in the information.

3. Different direction lengths: Different movement designs are not confined to windows of time which are.

4. Invariance of translation: Similar movement designs don't rely upon the beginning stage.

5. Rotational invariance: Similar movement designs don't rely upon course, and their examination should be free of the spectator's perspective.

 Longest Common Subsequence (LCS)

To apply this strategy to directions, a likeness coordinating capacity between two states ai and bj from the given direction focuses and must be characterized. The base standard deviation is utilized in each measurement as a choice limit and applied a sigmoid capacity to make the separation esteem more smoothed in a range from zero to minimum standard deviation. It is an adequate approach to utilize a direct capacity to get the separation among ai and bj.

The sizes of the trajectories are corresponding to the quantity of movement states they involve, and a LCS of every direction is performed.

Figure 14 – Longest Common Subsequence algorithm

 Trajectory Classification

The LCS metric is used for distances inside the structure of a radial basis function (RBF) classification network. It is stated by Hermes in his work that the RBFs are, basically, Gaussians which are radially symmetric. (Hermes et al., 2010) If the quantity of classes is 𝑁𝑐, the number of models and in this manner RBF by 𝑁𝑃 = 𝑐𝑁𝑐, and every class is depicted to by a similar number of models. (Hermes et al., 2010) The yield vector 𝛿(𝐴) of network for an information design An is given by the vector with a length 𝑁𝑃 which is constituted from the RBF values. (Hermes et al., 2010) Each example has a related wanted yield vector τm as characterized above for classifier and a comparing output vector of RBF. (Hermes et al., 2010) Minimization of error function of the RBF network is comprising the training process regarding the components of the weight grid W, the width parameters ηp, and the models 𝐶𝑝. (Hermes et al., 2010)

Figure 15 – Trajectory classification algorithm

 Motion Prediction

In this work, as a probabilistic following scheme, movement assumption is performed. There is an expectation to predict the object state at a certain point in time later when the measurements are viewed in historical format up to the present time. It is possible to schedule the weakness of this expectation as a delivery.

In the result of the motion prediction algorithm is a distribution which shows how the predicted state differs from the present state of the object which is given in the beginning of prediction.

When the trajectories are discussed, the present state of the object is a set of points in trajectory which includes the location with a distance covered by the object.

It is also stated by Hermes that the distribution represents the probability that, when the model trajectory is given, the measurement trajectory can be observed.

(Hermes et al., 2010) This can be obtained via the LCS metric, which is necessary to sample the particles from the 𝑝(𝑡) distribution from a motion database as follows, resulting in a successful probabilistic implicit movement model. (Hermes et al., 2010)

The key component analysis is used to decrease dimensionality, following the procedure set out in the LCS approach. Even the particles transformed into this vector of low dimensional space.

It is also useful to utilize the way how the Hermes works with the database of the samples which is arranged into a binary tree, using the already determined coefficients.

The top node in the tree corresponds to the coefficient that in the database captures the dimension of the greatest variance, where lower levels capture the finer motion structure. (Hermes et al., 2010) At each point a sub-trajectory is assigned to the left subtree if the value of a suitable coefficient is lower than zero, and is assigned to the right if the value of a suitable coefficient is greater than zero. (Hermes et al., 2010)

The described tree can be organized so that its ending nodes, the leaves, contain an index for movement. A balanced tree is built, at indicated in LCS.

With usage of algorithm described above, considering the future trajectories the anticipated states are obtained up to a fixed period of time. In this case there is a good approach to estimate the local densities of the forecasted states by creating a kernel over-state and then iteratively move the states to higher densities with a kernel function Gaussian derivative of the mean move vector. In this work, a Gaussian kernel is used.

(Hermes et al., 2010) The output of this approach are predictions that are weighted and anticipate different levels of time.

There is an approach where the strongest translation from trajectory A into trajectory B can be achieved using the mean values μa and μb. (Hermes et al., 2010) For optimal rotation a closed-form solution is given for the least-square problem of absolute orientation with quaternions, where a quaternion with three distinct imaginary parts can be interpreted as a complex number or a combination of a scalar with a cartesian 3D vector. (Hermes et al., 2010)

This part of model is implemented in python and placed in the appendix in methods “Predictions” and “EvaluateObstacle” and others for prediction and obstacles evaluation.

8 Testing and evaluation of the proposed model

8.1 Dataset and its description

For evaluation purpose in order to get the most objective results of the developed model, the target dataset should contain data of the heterogeneous agents in urban environment.

It is not essential how the data was acquired, the main focus here is on the data itself and how accurate it is.

In order to meet requirements mentioned above, a number of datasets were investigated, among which, the ApolloScope was chosen as the closest to the target requirements.

The dataset contains data acquired from camera, LIDAR, and manually labeled motion trajectories. (Ma et al., 2019)

The data itself structured in three categories: trajectories sample, prediction training data and prediction testing data.

The prediction data contains data related to the five agent types: small vehicles, big vehicles, pedestrians, motorcyclists and bicyclists and others. (Ma et al., 2019)

The data contains such agent’s attributes as: type, x coordinate position, y coordinate position, z coordinate position, length, height and heading.

Sample of data used is depicted on the Figure 15.

Figure 16 - Sample of dataset

In scope of this work, the focus is made on the trajectories of the surrounding agents, their horizontal sizes and positions.

So that, the height and z coordinate position are not essential for this work and can be omitted.

The data processing flow is depicted on the Figure 16.

Figure 17 - Data Processing Flow

In order to make the dataset compatible with Python libraries it was converted to csv format. After conversion it is ready for use in the evaluation process.

frame_id object_id object_type position_x position_y object_length object_width heading

0 1 2 119.459 64.591 11.101 3.134 -3.116

0 2 3 111.692 82.858 0.348 0.594 0.03

0 7 4 117.838 79.801 1.101 0.593 0.022

0 8 1 174.83 78.838 3.949 1.749 -3.104

0 9 3 153.298 62.936 0.452 0.456 -3.106

0 10 4 158.785 86.926 1.909 0.714 0.001

0 12 1 93.832 66.304 2.799 1.481 0.011

0 13 1 168.183 82.375 4.472 2.171 0.042

0 14 2 108.069 72.644 1.648 2.594 0.011

0 18 4 106.277 77.869 0.83 0.554 0.022

0 19 4 106.789 75.883 0.84 0.588 0.022

1 1 2 125.377 65.911 10.953 3.029 -3.121

1 2 3 110.794 82.738 0.479 0.522 0.046

1 7 4 114.473 79.071 1.048 0.648 0.041

1 8 1 177.066 79.565 4.032 1.814 -3.041

8.2 Evaluation process

8.2.1 Prerequisites

The evaluation methodology in this works is mainly based on methodologies described in the works “Long-term Vehicle Motion Prediction” (Hermes et al., 2009) and “How would surround vehicles move? A Unified Framework for Maneuver Classification and Motion Prediction” (Deo, Rangesh and Trivedi, 2018). The work

“Long-term Vehicle Motion Prediction” (Hermes et al., 2009) is used as a basement for evaluation part of this thesis, and “How would surround vehicles move? A Unified Framework for Maneuver Classification and Motion Prediction” (Deo, Rangesh and Trivedi, 2018) is used more as a state of art for some of the crucial parts of evaluation process.

As it is mentioned, the approach “How would surround vehicles move? A Unified Framework for Maneuver Classification and Motion Prediction” (Deo, Rangesh and Trivedi, 2018) is used as a state of art for some of the crucial parts of evaluation process and also it provides a state of art algorithm which is used across the evaluation process for comparison with designed in this work model.

In addition, for the evaluation process a model from the “Long-term Vehicle

In document Hlavní práce71533_bieb00.pdf, 1.3 MB Stáhnout (Stránka 68-0)