The brain constantly integrates incoming signals across the senses to form a cohesive view of the world. Most studies on multisensory integration concern the roles of spatial and temporal parameters. However, recent findings suggest cross-modal correspondences (eg high-pitched sounds associated with bright, small objects located high up) also affect multisensory integration. Here, we focus on the association between auditory pitch and spatial location. Surprisingly little is known about the cognitive and perceptual roots of this phenomenon, despite its long use in ergonomic design. In a series of experiments, we explore how this cross-modal mapping affects the allocation of attention with an attentional cuing paradigm. Our results demonstrate that high and low tones induce attention shifts to upper or lower locations, depending on pitch height. Furthermore, this pitch-induced cuing effect is susceptible to contextual manipulations and volitional control. These findings suggest the cross-modal interaction between pitch and location originates from an attentional level rather than from response mapping alone. The flexible contextual mapping between pitch and location, as well as its susceptibility to top–down control, suggests the pitch-induced cuing effect is primarily mediated by cognitive processes after initial sensory encoding and occurs at a relatively late stage of voluntary attention orienting.