Energy system optimization is important in strengthening stability, reliability and economy, which is usually given by static linear or nonlinear programming. However, the challenge faced in real-life currently is how to give the optimization by taking naturally existed energy system dynamics into account. To face this challenge, a multi-layer perception (MLP) based reinforcement learning control (RLC) method is proposed for the nonlinear dissipative system coupled by an arbitrary energy system and its local controllers, which can be able to optimize a given performance index dynamically and effectively without the accurate knowledge of system dynamics. This MLP-based RLC is composed of a MLP-based state-observer and an approximated optimal controller. The MLP-based state-observer is given for identification, which converges to a bounded neighborhood of the system dynamics asymptotically. The approximated optimal controller is determined by solving an algebraic Riccati equation with parameters given by the MLP-based state-observer. Based on Lyapunov direct method, it is further proven that the closed-loop is uniformly ultimately bounded stable. Finally, this newly-built MLP-based RLC is applied to the supervisory optimization of thermal power response for a nuclear steam supply system, and simulation results show not only the satisfactory performance but also the influences from the controller parameters to closed-loop responses.