The recognition of human postures and gestures is considered to be highly relevant semantic information in videos and surveillance systems. We present a new three-step approach to classifying the posture or gesture of a person based on segmentation, classification, and aggregation. A background image is constructed from succeeding frames using motion compensation and shapes of people are segmented by comparing the background image with each frame. We use a modified curvature scale space (CSS) approach to classify a shape. But a major drawback to this approach is its poor representation of convex segments in shapes: Convex objects cannot be represented at all since there are no inflection points. We have extended the CSS approach to generate feature points for both the concave and convex segments of a shape. The key idea is to reflect each contour pixel and map the original shape to a second one whose curvature is the reverse: Strong convex segments in the original shape are mapped to concave segments in the second one and vice versa. For each shape a CSS image is generated whose feature points characterize the shape of a person very well. The last step aggregates the matching results. A transition matrix is defined that classifies possible transitions between adjacent frames, e.g. a person who is sitting on a chair in one frame cannot be walking in the next. A valid transition requires at least several frames where the posture is classified as "standing-up". We present promising results and compare the classification rates of postures and gestures for the standard CSS and our new approach.
SPIE Digital Library