A new class of image-level detectors that can be adapted by machine learning techniques to detect parts of objects from a given category is proposed. A classifier (e.g. neural network or adaboost trained classifier) within the detector selects a relevant subset of extremal regions, i.e. regions that are connected components of a thresholded image. Properties of extremal regions render the detector very robust to illumination change. Robustness to viewpoint change is achieved by using invariant descriptors and/or by modeling shape variations by the classifier.
The approach is brought to bear on three problems: text detection, face segmentation and leopard skin detection. High detection rates were obtained for unconstrained (i.e. brightness, affine and font invariant) text detection (92%) with a reasonable false positive rate.
The time-complexity of the detection is approximately linear in the number of pixels and a non-optimized implementation runs at about 1 frame per second for a 640× 480 image on a high-end PC.