Coordinated multipoint joint transmission (JT) is a vital technique of the fifth-generation (5G) networks to increase throughput but causes a significant burden on network backhaul because the same user data must distribute to multiple base stations, called a JT cluster. Thus, many studies have explored how to reduce backhaul data traffic using caching user data at base stations. Millimeter wave (mmWave) mesh backhaul is a promising architecture for small cells to exchange cache data over 5G networks. However, this paper observes that inappropriate JT clusters and routing paths will result in unnecessary mmWave mesh backhaul bandwidth consumption such that the performance gain of the JT will be reduced. Therefore, in this paper, we study the JT clustering problem with the consideration of caches at base stations and mmWave mesh backhaul. The objective is to minimize total backhaul traffic while satisfying the data rate requirement of each user under limited radio resource units. We propose a two-stage algorithm based on dynamic programming approach to solve the problem. We prove that the dynamic programming algorithm can find an optimal solution for our subproblem. The simulation results provide several insights and show that compared with two baseline algorithms, our proposed algorithm can significantly reduce more backhaul bandwidth consumption.