Andrea Bajcsy, Ya-Shian Li-Baboud, Mary Brady
Technical report, National Institute for Standards and Technology
The presence of marginal marks on voting ballots is a known problem in voting systems and has been a source of dispute during federal and state-level elections. As of today, marginal marks are neither clearly countable as votes or as non-votes by optical mark scanners. We aim to establish quantitative measurements of marginal marks in order to provide an objective classification of ballot-mark types and ultimately improve algorithms in optical scanners. By utilizing 800 publicly available manually-marked ballot image scans from the 2009 Humboldt County, California election, we established a set of unique image features that distinguish between votes, non-votes, and five marginal mark types (check-mark, cross, partially filled, overfilled, lightly filled). The image features are related to semantic labels through both unsupervised and supervised machine-learning methods. We demonstrate the feasibility of developing an automated and quantifiable set of custom features to improve marginal mark accuracy by 4 to 8 percent, depending on classification model.