Calculating a suitable value of the exploration

3.4 Implementation selection

4.1.1 Calculating a suitable value of the exploration

found, the difference of the maximum and minimum payoff using the best exploration constant found (out of the same number of episodes as used in Section 4.1), and our exploration constant, to help us adjust our current way of calculating this parameter.

Table 4.2: Comparison of exploration constant Values

Maze The Best Exp.

Const. Found

MaxPayoff − MinPayoff

Our Default Exp. Const.

dense/16x16_1g - 0 1,179

balanced/16x8_1g 100 1,178 1,178

sparse/8x8_1g 200; 500 59 1,212

dense/16x16_4g_sparsely 200 2,107 4,143

balanced/16x8_4g_sparsely 200; 500 1,152 4,155

sparse/8x8_4g_sparsely 500 1,160 4,165

dense/16x16_4g_closely 500 4,124 4,166

balanced/16x8_4g_closely 500 155 4,176

sparse/8x8_4g_closely 1,000 79 4,180

balanced/16x16_2g_sparsely 100 498 2,615

balanced/32x16_2g_sparsely 100 2,864 3,134 balanced/32x32_2g_sparsely 100 2,872 4,104

Despite knowing the information written in Section 4.1, it would be difficult to create a formula capable of calculating a perfect exploration constant for every maze. The main problem is estimating the number of steps the robot would have to perform in order to pick each goal.

Using our default parameter values, the robot always plans for a maximum of horizon hsteps. In a maze with one goal, when the robot finds a path to the goal, it depends in how many stepssfrom the start of solving the maze would the robot reach it, and what is the lowest possible number of stepssmin to reach it.³If smin is relatively

3. For the lowest possible number of steps we assume no viable error movement is done.

high with respect toh, a very low exploration constant should be used to achieve the best results because, in such maze, a simulation that finds a path to the goal can be proclaimed as a very lucky one, and the resulting path should be exploited, instead of wasting simulations on exploring other paths that are unlikely to be more viable. This is the case of the maze used in Figure 4.1. As the opposite, the maze used in Figure 4.3 should be having a high exploration constant, since the odds of finding many goal-reaching paths are high sincesmin is low with respect toh.

The major flaw of setting this parameter toMaxPayoff−MinPayoff, the value we try our default exploration constant to achieve, is the fact that this value tends to be the highest for mazes where s_min is relatively high (i.e., the robot sometimes does, and sometimes does not, reach the goal). Whereas for mazes with relatively lowsmin, the robot always reaches the goal, hence the payoffs are consistent, and the exploration constant is low. That is, as we have just discussed, the opposite of what should be achieved.

The observed behaviour can be summarized as:

∙ The more goals, the higher the exploration constant should be selected.

∙ The denser the maze, the harder it tends to be to find multiple goal-reaching paths, hence a lower exploration constant should be selected to achieve better results more frequently.

∙ The farther the goals are from each other (and the starting posi-tion), the lower the exploration constant should be.

Using this knowledge, we adjust our way of calculating the explo-ration constant to:

ExplorationConstant= ^Goals*GoalReward*(1−WallDensity) AverageDistance(Start,Goals) ^,

(4.1) whereWallDensityis a number from [0; 1) (defined in Section 2.1.1), andAverageDistance(Start, Goals)is the average distance between start-ing position and maze goals. In our case, we calculate both of these values from the input maze.WallDensityas a ratio of the number of

wall tiles to the number of all maze tiles, and theAverageDistanceas:

AverageDistance= ¹ Goals * Distance(Start,Goal0) +

Goals−1 n

∑

=₁

Distance(Goaln−1,Goaln),

(4.2)

whereDistance(Position, Position)is a length of the vector between the two positions; andGoalis a set of maze goal positions sorted left-to-right, top-to-bottom.

However, it would also be possible to approximate both of these variables from the results of running a set of black-box model simula-tions (i.e., similarly to usingMaxPayoff−MinPayoff [8]).

E.g., to setWallDensity, there could be a counter keeping track of how many times the robot performed the action move forward, and out of this number, how many times this action led to the state from which this action has been initiated (i.e., how many times the robot hit a wall). The ratio of these two numbers would be an approximation of theWallDensity.

ForAverageDistance, the approximation could be just like our way of calculating this value, with the exception of Distance implemen-tation, and the goals number and their ordering. More specifically, Distance(a, b)would be the lowest number of steps performed to reach positionb from positiona; the number of goals would the highest number of reached goals; and the goals would be sorted from a set of the found goals as follows:

Goal0= arg min{_Distance(Start,g) | _g∈ _FoundGoals}_, Goaln = arg min{Distance(Goaln−1,g) | g∈ FoundGoals∖

{Goal_i : i<n} }.

(4.3)

Table 4.3 shows a comparison of the best exploration constant found and our new way of calculating this parameter.

As can be seen, the new way of calculating the exploration constant produces values similar to the most viable ones found. In cases where the values differ more, the solving results remain similar.

Table 4.3: Comparison of the best exploration constant found and the new exploration constant

The Best Exploration Con-stant Found

The New Way of Calculating the Exploration Constant

dense/16x16_1g - 0 208 28 0 208

balanced/16x8_1g 100 98.4 97 55 99 100

sparse/8x8_1g 200; 500 100 22 228 100 22

dense/16x16_4g_sparsely 200 78.47 188 126 77.9 191 balanced/16x8_4g_sparsely 200; 500 99.92 106 308 99.95 104 sparse/8x8_4g_sparsely 500 99.67 115 758 99.67 118

dense/16x16_4g_closely 500 42.6 196 194 42.48 197

balanced/16x8_4g_closely 500 100 79 502 100 78

sparse/8x8_4g_closely 1,000 100 36 1,153 100 36

balanced/16x16_2g_sparsely 100 100 186 185 100 185 balanced/32x16_2g_sparsely 100 82.33 908 107 82.83 907 balanced/32x32_2g_sparsely 100 98.67 919 88 98.83 965

Exploration

Average steps 208 208 208 208 208 208 208 208 208 208 208 208

0.1 1 10 100 1,000

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.1: Effect of changing exploration constant for the maze dense/16x16_1g (out of 1,000 episodes for every exploration constant)

Exploration

Average steps 164 158 124 110 102 99 97 97 98 102 102 103

0.1 1 10 100 1,000

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.2: Effect of changing exploration constant for the maze balanced/16x8_1g (out of 1,000 episodes for every exploration con-stant)

100 100 100 100 100 100 100 100 100 100 100 100

Average steps 57 58 51 45 34 30 26 22 22 23 24 25

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.3: Effect of changing exploration constant for the maze sparse/8x8_1g (out of 1,000 episodes for every exploration constant)

Exploration constant

0.05 0.5 10 25 50 75 100 200 500 1,000 5,000

Average Goals Reached [%]

39.08 39.7 54.98 69.6 75.88 77.45 78 78.47 77.55 76.58 74.53

Average Steps 208 208 207 205 198 194 191 188 189 193 196

0.1 1 10 100 1,000

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.4: Effect of changing exploration constant for the maze dense/16x16_4g_sparsely (out of 1,000 episodes for every exploration constant)

Exploration constant

0.05 0.5 10 50 100 125 150 200 500 1,000 10,000

Average Goals Reached [%]

73.3 73.9 84.45 99.58 99.95 99.97 99.9 99.92 99.92 99.83 99.85

Average Steps 204 204 196 145 119 115 110 106 106 111 120

0.1 1 10 100 1,000 10,000

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.5: Effect of changing exploration constant for the maze balanced/16x8_4g_sparsely (out of 1,000 episodes for every explo-ration constant)

Exploration constant

0.05 0.5 10 25 50 100 150 200 500 1,000 5,000

Average Goals Reached [%]

80.95 82.67 92.58 97.35 99.42 99.58 99.53 99.55 99.67 99.35 99.35

Average Steps 198 197 176 157 135 123 118 117 115 120 127

0.1 1 10 100 1,000

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.6: Effect of changing exploration constant for the maze sparse/8x8_4g_sparsely (out of 1,000 episodes for every exploration constant)

Exploration constant

0.05 0.5 10 50 100 200 500 750 1,000 2,000 5,000 10,000 Average Goals

Reached [%]

18.07 16.77 24.02 34.52 38.08 41.35 42.6 42.17 39.05 40.62 39.3 39.42

Average Steps 207 207 207 202 200 197 196 195 196 196 198 198

0.1 1 10 100 1,000 10,000

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.7: Effect of changing exploration constant for the maze dense/16x16_4g_closely (out of 1,000 episodes for every exploration constant)

Exploration constant

0.05 0.5 10 25 50 100 200 500 750 1,000 2,000 5,000 Average Goals

Reached [%]

80.47 84.4 98.65 99.92 100 100 100 100 99.9 99.92 100 100

Average Steps 177 174 147 126 110 95 84 79 80 80 83 88

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.8: Effect of changing exploration constant for the maze balanced/16x8_4g_closely (out of 1,000 episodes for every exploration constant)

99.1 99.17 99.83 99.97 100 100 100 100 100 100 100 100

Average Steps 95 96 90 82 71 63 57 44 37 36 38 40

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.9: Effect of changing exploration constant for the maze sparse/8x8_4g_closely (out of 1,000 episodes for every exploration constant)

Exploration constant

0.05 0.5 10 50 100 200 1,000 2,000 5,000 Average Goals

Reached [%]

81.17 92.5 100 100 100 100 100 100 100 Average Steps 530 467 228 186 183 185 198 198 199

0.1 1 10 100 1,000

100 316

Exploration Const.

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.10: Effect of changing exploration constant for the maze balanced/16x16_2g_sparsely (out of 300 episodes for every explo-ration constant)

Exploration constant

0.5 10 50 100 500 1,000 2,500 5,000 10,000 Average Goals

Reached [%]

71.83 81.67 81 82.33 80.67 80.83 80.67 79 78.5 Average Steps 1026 942 931 908 912 909 912 934 961

1 10 100 1,000

70 800

Exploration Const.

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.11: Effect of changing exploration constant for the maze balanced/32x16_2g_sparsely (out of 300 episodes for each exploration constant)

Exploration constant

0.5 10 50 100 500 1,000 5,000 10,000 Average Goals

Reached [%]

98.33 99.33 98.67 98.67 98.67 98.83 98.17 98.67 Average Steps 1121 954 982 919 939 941 984 952

1 10 100 1,000 10,000

100 1,000

Exploration Const.

Avg. Goals Reached [%]

Avg. Steps Default Exp. Const.

Figure 4.12: Effect of changing exploration constant for the maze balanced/32x32_2g_sparsely (out of 300 episodes for every explo-ration constant)

In document MazeNavigationviaMonteCarloTreeSearch MasarykUniversity (Stránka 38-47)