Petr will be defending his dissertation titled "Solving Undiscounted One-Sided Partially Observable Stochastic Games". Join us in person or online!
Supervisor: Branislav Bošanský
Specialist supervisor: Viliam Lisý
University repository: https://dspace.cvut.cz/handle/10467/122185
In case you want to join via MS Teams, please notify Petr Benda in advance via petr.benda@fel.cvut.cz.
Abstract
Real-world scenarios often involve dynamic interactions among competing agents, where decisions are made considering actions taken by others. These situations can be modeled as partially observable stochastic games (POSGs), with zero-sum variants capturing strictly competitive interactions (e.g., security scenarios). While such models address a wide range of problems, they commonly focus on infinite-horizon scenarios with discounted-sum objectives.Using the discounted-sum objective, however, can lead to suboptimal solutions in cases where the length of the interaction does not directly affect players' rewards.We focus on games with undiscounted objective where every realization of the game is guaranteed to terminate after some number of turns that is either known (finite horizon) or unknown (indefinite horizon).To manage the computational complexity of solving POSGs, we restrict ourselves to games with one-sided partial observability where only one player has imperfect information while their opponent is provided with complete information (i.e., one-sided partially observable stochastic games -- OS-POSGs).Primarily, we focus on a subclass of OS-POSGs with an indefinite horizon representing an extension of stochastic shortest path games (SSPGs) to imperfect information setting.Following the naming conventions for stochastic shortest path problems, we term this subclass Partially Observable Stochastic Shortest Path Games (POSSPGs). We introduce two novel algorithms for solving POSSPGs based on the state-of-the-art method for solving OS-POSGs -- the heuristic search value iteration (HSVI) algorithm.These algorithms iteratively solve sequences of easier-to-solve approximations of the game using fundamentally different approaches for constructing these sequences: (1) based on the limited number of turns in which players can change their actions while iteratively increasing the assumed number of turns, and (2) based on the discounted approximation of the original game while iteratively increasing the assumed discount factor.We provide theoretical qualitative guarantees for these algorithms and we also experimentally demonstrate that they are able to find near-optimal solutions on pursuit-evasion games and games modeling a privilege escalation problem from computer security.Problems with a very long (indefinite) horizon often tend to have extremely large state spaces as well (e.g., problems from the security domain).Extreme state-space sizes directly affect the dimensionality of problem representations, making many algorithms (especially those that use the concept of value function, like HSVI) unusable for solving problems of real-world sizes.For single-agent problems, the typical way is to reduce the state/belief spaces through a projection from a higher-dimensional space to a lower-dimensional one.In our work, we expand on a similar approach for OS-POSGs called compact representation that can be combined with HSVI (which is the core of our proposed algorithms for solving POSSPGs).We present the application of compact representation HSVI on two (finite horizon) security domain problems (security games with sequential attacks -- SGSAs, and lateral movement POSGs), accompanied with experimental evaluation showing that compact representation HSVI is capable of finding high-quality strategies while scaling to larger scenarios compared to the state-of-the-art approaches.