Computing the lower bound on the value of Stackelberg equilibrium

In this part, we present an heuristic algorithm, which provides a lower bound on the value of the Stackelberg game. The main idea is to use the basic dynamic algorithm as described above, but make the pruning phase more aggressive. Pruning more extreme points of course brings the risk of pruning some points that could be used in the parent nodes to force the follower to play some strategy, that would benefit the leader.

The pruning in this algorithm will only preserve fixed numberk (given to the algorithm as parameter) points with the best utility of the leader for each actiona_f of the follower. In the exact dynamic algorithm, number of facets and therefore also extreme points can grow exponentially in the higher levels of the game-tree (as the facet represents a strategy of the follower throughout the game, and number of those grows exponentially). The main goal of this approach is to prevent that exponential growth of the number of facets, as the memory complexity might be problematic to deal with for the exact dynamic algorithm.

Note that when we restrict the number of extreme points to k, we restrict also number of facets in each sub-game given by node n to k|A_f(n)|, as every facet must contain at least one extreme point, otherwise it’s pruned. This removes the exponential size of the problem. Therefore, even for larger k, the memory complexity should much lower for the approximative algorithm than for the exact one in large games.

3.3. COMPUTING THE LOWER BOUND ON THE VALUE OF STACKELBERG EQUILIBRIUM23

Figure 3.3: Example: visualization of pruning phase for follower’s action d

Also, note that the approximative algorithm prunes only the strategy space of the leader, as there are some points left for every action of the follower (and therefore, follower’s strat-egy space is not affected by the pruning). That means, that the algorithm will always be pessimistic. Hence, we have the algorithm that provides the lower bound on the value of the leader in extensive form games with perfect information and concurrent moves, that should address the biggest problem with the exact algorithm.

While the memory complexity of this algorithm is bound to be much better than that of it’s exact form, there is no reasonable estimate on how precise lower bound will the algorithm provide or how much faster will it be compared to the exact algorithm. However, the ability to selectk should provide some trade-off options between those criteria.

24CHAPTER 3. COMPUTING STRONG STACKELBERG EQUILIBRIUM IN SEQUENTIAL GAMES

Chapter 4

Experiments

In this section, we will first describe the game used for testing of described algorithms, present results of experiments and then compare those results to examine scalability and precision of algorithms. We will first focus on testing of the correctness of results provided by the exact algorithm as well as the performance of both exact and approximative algorithms, therefore we will compare it’s results to results of the baseline algorithm. In this setting, both utility values and all algorithms are deterministic, therefore every experiment was run just once, the difference in computation time between two runs of the same algorithm is marginal. We will then focus on the games with randomly generated utility to better estimate the tightness of lower-bound provided by the approximative algorithm and it’s comparison to the exact form in terms of memory complexity. The experiments were run on standard PC, CPU: 4x Intel(R) Core(TM) i7-4710MQ 2.50Ghz, 16GB RAM, C++ compiled with GCC on 64-bit Windows 8.1.

4.1 Pursuit-evasion Game

Pursuit-evasion game [6] (see Figure 4.1) is a game of two players (pursuer and evader, in this case the evader is the follower and the pursuer is the leader) moving in a graph, in our case represented as a square grid (possible actions in every node are: move 1 field in one of 4 directions, if possible). The game ends after specified amount of actions have been played by each player or if both players are at the same position (the evader is caught). Starting positions of players are known to both of them and they get complete information about current whereabouts after each tuple of steps (concurrent move) is played. The pursuer’s goal is to catch the evader, in which case he is awarded positive utility, though he gets a large penalty if he fails to catch the pursuer in the specified amount of steps. The pursuer’s goal is to escape capture. If the follower is caught, the leader is awarded a positive utility, while the follower gets negative utility. If the follower escapes the leader and is not caught in the specified amount of steps, he is awarded positive utility while the leader gets negative utility. The leader also gets small penalty to the utility for every step he makes, therefore he prefers to catch the follower in least amount of steps possible.

26 CHAPTER 4. EXPERIMENTS

Table 4.1: Dependence of the computation time on the game size for all algorithms Algorithm 2×2×5 2×2×6 2×3×5 2×3×6 2×3×7 3×3×5 3×3×6

Baseline 8.72 9.242 6.756 7.978 N/A N/A N/A

Dynamic 8.72 9.242 6.756 7.978 8.725 4 3.75

Dynamic 2 8.516 8.777 5.904 6.887 7.421 4 3.247

Dynamic 3 8.533 8.868 6.756 7.487 8.348 4 3.75

Dynamic 4 8.72 8.899 6.756 7.978 8.725 4 3.75

Table 4.2: Dependence of value of the utility value of the leader provided by the algorithms on the game size

In document ZADÁNÍ DIPLOMOVÉ PRÁCE (Stránka 37-41)