Cloud–edge–unmanned aerial vehicle (UAV) computing collaboration is crucial for smart and real-time applications. It highlights the challenges faced by UAVs in task execution, including limited endurance, computing, and communication capabilities, as well as complex external interference. This article proposes a task-driven multimodal communication technology that utilizes a multimodal confusion information reception model to improve the UAV’s ability to demodulate multimodal information. A cloud–edge–UAV collaborative computing model is then proposed to integrate cloud and edge computing capabilities into the UAV task execution system. As a result, the success probability and multimodal diversity of UAV task execution can be improved in complex environments based on both multimodal communication and cloud–edge–UAV collaboration concepts. Numerical results show that: 1) for each single-modal reception, the success probability increases by 15%, and the task execution is robust in severe environmental conditions since the success probability of eavesdropping by malicious users is reduced by 88%; 2) after multimodal combination, additional multimodal receptions results in an increase of the success probability; and 3) the reliability is highly related with the number of independent modals, which equals the multimodal diversity.