An inherent difficulty in dynamic distributed constraint optimization problems (dynamic DCOP) is the uncertainty of future events when making an assignment at the current time. This dependency is not well addressed in the research community. This paper proposes a reinforcement-learning-based solver for dynamic distributed constraint optimization. We show that reinforcement learning techniques are an alternative approach to solve the given problem over time and are computationally more efficient than sequential DCOP solvers. We also use the novel heuristic to obtain the correct results and describe a formalism that has been adopted to model dynamic DCOPs with cooperative agents. We evaluate this approach in dynamic weapon target assignment (dynamic WTA) problem, via experimental results. We observe that the system dynamic WTA problem remains a safe zone after convergence while satisfying the constraints. Moreover, in the experiment we have implemented the agents that finally converge to the correct assignment.