Experimental Validation of Data-Driven Adaptive Optimal Control for Continuous-Time Systems Via Hybrid Iteration: An Application to Rotary Inverted Pendulum
Abstract
In this article, a successive approximation learning framework for adaptive optimal control problems, named hybrid iteration (HI), is presented and validated experimentally. The HI strategy outperforms two well-known adaptive dynamic programming strategies, i.e., policy iteration (PI) and value iteration (VI). Using HI, an approximated optimal control policy is learned without the prior knowledge of an admissible control policy required by PI. At the same time, comparing to VI, the HI algorithm converges to the optimal solution with tremendously less learning iterations and CPU-time. Initially, we present the settings of the data-driven HI for continuous-time nonlinear systems and continuous-time linear systems to learn the optimal control policy without any information of the dynamics of the system. Following that, the proposed HI method is implemented on a nonlinear rotary inverted pendulum, and its online learning performance is validated by learning the optimal control policy using online data. The experimental results reveal the efficacy and practicality of the HI method, and demonstrate its superior online learning performance over the traditional PI and VI methods.