Combining input from multiple senses is essential for successfully mastering many real-world situations. While several studies demonstrate that the presentation of a simultaneous sound can enhance visual detection performance or increase the perceived luminance of a dim light, the origin of these effect remains disputed. The suggestions range from early multisensory integration to changes in response bias and cognitive influences—implying that these effects could either result from relatively low-level, hard-wired connections of early sensory areas or from associations formed higher in the processing stream. To address this question, we quantified the effect of a simultaneous sound in various contrast detection tasks. A completely redundant sound did not alter detection rates, but only speeded reaction times. An informative sound, which reduced the uncertainty about the timing of the visual display, significantly improved detection rates, which manifested as a significant shift of the contrast detection curve. Surprisingly, this improvement occurred only in a paradigm were there was a consistent timing relation between sound and target and disappeared when subjects were not aware of the fact that the sound offered information about the visual stimulus. Altogether our findings suggest that cross-modal influences in such simple detection tasks are not exclusively mediated by hard-wired sensory integration but rather point to a prominent role for cognitive and attention-like effects.