论文部分内容阅读
As an important class of approximate dynamic programming, the direct heuristic dynamic programming (DHDP) is discussed in this paper.DHDP performs well due to its model-free online learning capability.While the classical DHDP is implemented with gradient-based adaptation learning algorithm of neural network, in this paper we present a design strategy of DHDP with a novel hybrid estimation of distribution algorithm for online learning and control, and the proposed design optimization method achieves the weight training of neural networks with faster convergence rate.Our proposed approach can be viewed as an improvement for DHDP.The simulation is conducted on a practical system plant to test the online learning performance by using our approach.Then, the simulation results show the effectiveness of our approach.