Multi-Agent Reinforcement Learning (MARL) is a widely-used technique for optimization in decentralised control problems, addressing complex challenges when several agents change actions simultaneously and without collaboration. Such challenges are exacerbated when the environment in which the agents learn is inherently non-stationary, as agents’ actions are then non-deterministic. In this paper, we show that advance knowledge of environment behaviour through prediction significantly improves agents’ performance in converging to near-optimal control solutions. We propose P-MARL, a MARL approach which employs a prediction mechanism to obtain such advance knowledge, which is then used to improve agents’ learning. The underlying non-stationary behaviour of the environment is modelled as a time-series and prediction is based on historic data and key environment variables. This provides information regarding potential upcoming changes in the environment, which is a key influencer in agents’ decision-making. We evaluate P-MARL in a smart grid scenario and show that a 92% Pareto efficient solution can be achieved in an electric vehicle charging problem, where energy demand across a community of households is inherently non-stationary. Finally, we analyse the effects of environment prediction accuracy on the performance of our approach.