Decentralized Nash Equilibria Learning for Online Game With Bandit Feedback
Abstract
This article studies distributed online bandit learning of generalized Nash equilibria for online games, where the cost functions of all players and coupled constraints are time-varying. The function values, rather than full information about cost and local constraint functions, are revealed to local players with time delays. The goal of each player is to selfishly minimize its own cost function with no future information, subject to a strategy set constraint and time-varying coupled inequality constraints. To this end, a distributed online algorithm based on mirror descent and one-point delayed bandit feedback is designed for seeking generalized Nash equilibria in the online game. It is shown that the devised online algorithm achieves sublinear expected regrets and accumulated constraint violation if the path variation of the generalized Nash equilibrium sequence is sublinear. Simulations are presented to illustrate the efficiency of the theoretical result.