Tongue positioning at certain locations in the oral cavity is an essential attribute of speech sound production. A scalable baseline for such target locations, referred here as phoneme landmarks, would be helpful to both speech-language pathologists (SLPs) and their patients. Most previous attempts to identify positional landmarks associated with various phonemes have relied on electromagnetic articulography (EMA), which can track the tongue with high precision, but is hardwired, time-consuming to setup, and cost-prohibitive as a tool in SLP practice. We have investigated the feasibility of generating such a baseline using the multimodal speech capture system (MSCS), a wireless tongue tracking technology that is considerably more cost-effective and portable. A dataset of five repetitions of 23 phonemes was collected on four SLP-trained subjects. Analysis of lingual positional variability shows that the standard deviation of phoneme landmarks on average is 2.1 ± 1.3 mm and 4.75 ± 2 mm in the subject-dependent and subject-independent (universal baseline) cases, respectively. We have identified areas of improvement for the MSCS for better identification and comparison of phoneme landmarks between subjects to reduce variability.