Center for Research in Comptuer Vision
Center for Research in Comptuer Vision

Scene Understanding by Statistical Modeling of Motion Patterns


We present a novel method for the discovery and statistical representation of motion patterns in a scene observed by a static camera. Related methods involving learning of patterns of activity rely on trajectories obtained from object detection and tracking systems, which are unreliable in complex scenes of crowded motion. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic.


Figure: Examples of scenes to be analyzed and desirable patterns

Gaussian Mixture Formulation


  1. Gaussian component estimation

    • Temporal quantization
    • K-means clustering in 4d space
    • No optimization
    • Insensitive to choice of K
    • Numerous, low variance clusters

  2. Component Filtering

    • Optical flow is noisy
    • Filter high directional variance components

  3. Pattern Instance Estimation

    • Sequences of components form spatiotemporal worms (instances)
    • Pattern instances are temporally bounded
    • A pattern itself is periodic

  4. Inter-component Transition

    • Pattern instance occurs over several clips
    • Two components i and j form an instance if,
      • i and j are temporally proximal,
      • j is 'reachable' from i

  5. Instance Learning

    • Define a planar graph G = (V, E)
      • V = {components from all video clips}
      • E = {probability value if temporally proximal}
    • Weak connected component analysis on G
    • Connected components are pattern instances

    • Figure: Left: One instance each from 4 patterns. Right: More instances for each of the 4 patterns.

  6. Motion Patterns

    • Multiple Instances per pattern
    • Each instance is a Gaussian mixture
    • KL divergence defines similarity between instances
    • Approximate with Monte Carlo sampling
    • Graph connected analysis

  7. Conditional Expectation of flow

    • Compute conditional expected orientation / magnitude given a pixel


Related Publication

Imran Saleemi, Lance Hartung, and Mubarak Shah, Scene Understanding by Statistical Modeling of Motion Patterns, IEEE Conference on Computer Vision and Pattern Recognition 2010, San Francisco, CA.

Back to Crowd Analysis Projects