In this work, we propose a multi-goal multi-agent learning (MGMA) framework for task-oriented dialogue generation, which aims to retrieve accurate entities from knowledge base (KB) and generate human-like responses simultaneously. Specifically, MGMA consists of a KB-oriented teacher agent for inquiring KB, a context-oriented teacher agent for extracting dialogue patterns, and a student agent that tries to not only retrieve accurate entities from KB but also generate human-like responses. A “two-to-one” teacher–student learning method is proposed to coordinate these three networks, training the student network to integrate the expert knowledge from the two teacher networks and achieve comprehensive performance in task-oriented dialogue generation. In addition, we also update the two teachers based on the output of the student network, since the space of possible responses is large and the teachers should adapt to the conversation style of the student. In this way, we can obtain more empathetic teachers with better performance. Moreover, in order to build each task-oriented dialogue system effectively, we employ a dialogue memory network to dynamically filter the irrelevant dialogue history and memorize important newly coming information. Another KB memory network, which shares the structural KB tuples throughout the whole conversation, is adopted to dynamically extract KB information with a memory pointer at each utterance. Extensive experiments on three benchmark datasets (i.e., CamRest, In-Car Assistant and Multi-WOZ 2.1) demonstrate that MGMA significantly outperforms baseline methods in terms of both automatic and human evaluation.