The Mismatch Negativity (MMN) component of the event-related potentials is generated when a detectable spectrotemporal feature of the incoming sound does not match the sensory model set up by preceding repeated stimuli. MMN is enhanced at frontocentral scalp sites for deviant words when compared to acoustically similar deviant pseudowords, suggesting that automatic access to long-term memory traces for spoken words contributes to MMN generation. Does spectrotemporal feature matching also drive automatic lexical access? To test this, we recorded human auditory event-related potentials (ERPs) to disyllabic spoken words and pseudowords within a passive oddball paradigm. We first aimed at replicating the word-related MMN enhancement effect for Spanish, thereby adding to the available cross-linguistic evidence (e.g., Finnish, English). We then probed its resilience to spectrotemporal perturbation by inserting short (20ms) and long (120ms) silent gaps between first and second syllables of deviant and standard stimuli. A significantly enhanced, frontocentrally distributed MMN to deviant words was found for stimuli with no gap. The long gap yielded no deviant word MMN, showing that prior expectations of word form limits in a given language influence deviance detection processes. Crucially, the insertion of a short gap suppressed deviant word MMN enhancement at frontocentral sites. We propose that spectrotemporal point-wise matching constitutes a core mechanism for fast serial computations in audition and language, bridging sensory and long-term memory systems.