In recent years, a gradient of the n-step temporal-difference [TD(λ)] learning has been developed to present an advanced adaptive dynamic programming (ADP) algorithm, called value-gradient learning [VGL(λ)]. In this paper, we improve the VGL(λ) architecture, which is called the “single adaptive actor network [SNVGL(λ)]” because it has only a single approximator function network (critic) instead of dual networks (critic and actor) as in VGL(λ). Therefore, SNVGL(λ) has lower computational requirements when compared to VGL(λ). Moreover, in this paper, a recurrent hybrid neuro-fuzzy (RNF) and a first-order Takagi-Sugeno RNF (TSRNF) are derived and implemented to build the critic and actor networks. Furthermore, we develop the novel study of the theoretical convergence proofs for both VGL(λ) and SNVGL(λ) under certain conditions. In this paper, mobile robot simulation model (model based) is used to solve the optimal control problem for affine nonlinear discrete-time systems. Mobile robot is exposed various noise levels to verify the performance and to validate the theoretical analysis.