Center for Research in Comptuer Vision
Center for Research in Comptuer Vision



MVA

Volume 25, Issue 5


This issue features the following special issue and original papers.



Background Modeling For Foreground Detection in Real-World Dynamic Scenes
Thierry Bouwmans, Jordi Gonzàlez, Caifeng Shan, Massimo Piccardi, Larry Davis

Editorial.



Special Issue Paper
Video background modeling: recent approaches, issues and our proposed techniques
Munir Shah, Jeremiah D. Deng, Brendon J. Woodford

Effective and efficient background subtraction is important to a number of computer vision tasks. We introduce several new techniques to address key challenges for background modeling using a Gaussian mixture model (GMM) for moving objects detection in a video acquired by a static camera. The novel features of our proposed model are that it automatically learns dynamics of a scene and adapts its parameters accordingly, suppresses ghosts in the foreground mask using a SURF features matching algorithm, and introduces a new spatio-temporal filter to further refine the foreground detection results. Detection of abrupt illumination changes in the scene is dealt with by a model shifting-based scheme to reuse already learned models and spatio-temporal history of foreground blobs is used to detect and handle paused objects. The proposed model is rigorously tested and compared with several previous models and has shown significant performance improvements.



Special Issue Paper
Case-based background modeling: associative background database towards low-cost and high-performance change detection
Atsushi Shimada, Yosuke Nonaka, Hajime Nagahara, Rin-ichiro Taniguchi

Background modeling and subtraction is an essential task in video surveillance applications. Many researchers have discussed about an improvement of performance of a background model, and a reduction of memory usage or computational cost. To adapt to background changes, a background model has been enhanced by introducing various information including a spatial consistency, a temporal tendency, etc. with a large memory allocation. Meanwhile, an approach to reduce a memory cost cannot provide better accuracy of a background subtraction. To tackle the trade-off problem, this paper proposes a novel framework named “case-based background modeling”. The characteristics of the proposed method are (1) a background model is created, or removed when necessary, (2) case-by-case model sharing by some of the pixels, (3) pixel features are divided into two groups, one for model selection and the other for modeling. These approaches realize a low-cost and high accurate background model. The memory usage and the computational cost could be reduced by half of a traditional method and the accuracy was superior to the method.



Special Issue Paper
Mixture of Merged Gaussian Algorithm using RTDENN
Manuel Alvar, Andrea Rodriguez-Calvo, Alvaro Sanchez-Miralles, Alvaro Arranz

Computer vision has been a widely developed research area in the last years, and it has been used for a broad range of applications, including surveillance systems. In the pursuit of an autonomous and smart motion detection system, a reliable segmentation algorithm is required. The main problems of present segmentation solutions are their high execution time and the lack of robustness against changes in the environment due to variations in lighting, shadows, occlusions or the movement of secondary objects. This paper proposes a new algorithm named Mixture of Merged Gaussian Algorithm (MMGA) that aims to achieve a substantial improvement in execution speed to enable real-time implementation, without compromising the reliability and accuracy of the segmentation. The MMGA is based on the combination of a probabilistic model for the background, similar to the Mixture of Gaussian Model (MGM), with the learning processes of Real-Time Dynamic Ellipsoidal Neural Networks (RTDENN) for the update of the model. The proposed algorithm has been tested for different videos and compared to the MGM and SDGM algorithms. Results show a reduction of 30 to 50 % in execution times. Furthermore, the segmentation is more robust against the effect of noise and adapts faster to lighting changes.



Special Issue Paper
Background subtraction using finite mixtures of asymmetric Gaussian distributions and shadow detection
Tarek Elguebaly, Nizar Bouguila

Foreground segmentation of moving regions in image sequences is a fundamental step in many vision systems including automated video surveillance, human-machine interface, and optical motion capture. Many models have been introduced to deal with the problems of modeling the background and detecting the moving objects in the scene. One of the successful solutions to these problems is the use of the well-known adaptive Gaussian mixture model. However, this method suffers from some drawbacks. Modeling the background using the Gaussian mixture implies the assumption that the background and foreground distributions are Gaussians which is not always the case for most environments. In addition, it is unable to distinguish between moving shadows and moving objects. In this paper, we try to overcome these problem using a mixture of asymmetric Gaussians to enhance the robustness and flexibility of mixture modeling, and a shadow detection scheme to remove unwanted shadows from the scene. Furthermore, we apply this method to real image sequences of both indoor and outdoor scenes. The results of comparing our method to different state of the art background subtraction methods show the efficiency of our model for real-time segmentation.



Special Issue Paper
Background subtraction: separating the modeling and the inference
Manjunath Narayana, Allen Hanson, Erik G. Learned-Miller

In its early implementations, background modeling was a process of building a model for the background of a video with a stationary camera, and identifying pixels that did not conform well to this model. The pixels that were not well-described by the background model were assumed to be moving objects. Many systems today maintain models for the foreground as well as the background, and these models compete to explain the pixels in a video. If the foreground model explains the pixels better, they are considered foreground. Otherwise they are considered background. In this paper, we argue that the logical endpoint of this evolution is to simply use Bayes’ rule to classify pixels. In particular, it is essential to have a background likelihood, a foreground likelihood, and a prior at each pixel. A simple application of Bayes’ rule then gives a posterior probability over the label. The only remaining question is the quality of the component models: the background likelihood, the foreground likelihood, and the prior. We describe a model for the likelihoods that is built by using not only the past observations at a given pixel location, but by also including observations in a spatial neighborhood around the location. This enables us to model the influence between neighboring pixels and is an improvement over earlier pixelwise models that do not allow for such influence. Although similar in spirit to the joint domain-range model, we show that our model overcomes certain deficiencies in that model. We use a spatially dependent prior for the background and foreground. The background and foreground labels from the previous frame, after spatial smoothing to account for movement of objects, are used to build the prior for the current frame. These components are, by themselves, not novel aspects in background modeling. As we will show, many existing systems account for these aspects in different ways. We argue that separating these components as suggested in this paper yields a very simple and effective model. Our intuitive description also isolates the model components from the classification or inference step. Improvements to each model component can be carried out without any changes to the inference or other components. The various components can hence be modeled effectively and their impact on the overall system understood more easily.



Special Issue Paper
Change detection by probabilistic segmentation from monocular view
Francisco J. Hernandez-Lopez, Mariano Rivera

We present a method for foreground/background video segmentation (change detection) in real-time that can be used, in applications such as background subtraction or analysis of surveillance cameras. Our approach implements a probabilistic segmentation based on the Quadratic Markov Measure Field models. This framework regularizes the likelihood of each pixel belonging to each one of the classes (background or foreground). We propose a new likelihood that takes into account two cases: the first one is when the background is static and the foreground might be static or moving (Static Background Subtraction), the second one is when the background is unstable and the foreground is moving (Unstable Background Subtraction). Moreover, our likelihood is robust to illumination changes, cast shadows and camouflage situations. We implement a parallel version of our algorithm in CUDA using a NVIDIA Graphics Processing Unit in order to fulfill real-time execution requirements.



Special Issue Paper
Advanced background modeling with RGB-D sensors through classifiers combination and inter-frame foreground prediction
Massimo Camplani, Carlos Roberto del Blanco, Luis Salgado, Fernando Jaureguizar, Narciso García

An innovative background modeling technique that is able to accurately segment foreground regions in RGB-D imagery (RGB plus depth) has been presented in this paper. The technique is based on a Bayesian framework that efficiently fuses different sources of information to segment the foreground. In particular, the final segmentation is obtained by considering a prediction of the foreground regions, carried out by a novel Bayesian Network with a depth-based dynamic model, and, by considering two independent depth and color-based mixture of Gaussians background models. The efficient Bayesian combination of all these data reduces the noise and uncertainties introduced by the color and depth features and the corresponding models. As a result, more compact segmentations, and refined foreground object silhouettes are obtained. Experimental results with different databases suggest that the proposed technique outperforms existing state-of-the-art algorithms.



Special Issue Paper
Background subtraction model based on color and depth cues
Enrique J. Fernandez-Sanchez, Leonardo Rubio, Javier Diaz, Eduardo Ros

Background subtraction consists of segmenting objects in movement in a video captured by a static camera. This is typically performed using color information, but it leads to wrong estimations due to perspective and illumination issues. We show that multimodal approaches based on the integrated use of color and depth cues produce more accurate and robust results than using either data source independently. Depth is less affected by issues such as shadows or foreground objects similar to background. However, objects close to the background may not be detected when using only range information, being color information complementary in those cases. We propose an extension of a well-known background subtraction technique which fuses range and color information, as well as a post-processing mask fusion stage to get the best of each feature. We have evaluated the method proposed using a well-defined dataset and different disparity estimation algorithms, showing the benefits of our method for fusion color and depth cues.



Special Issue Paper
pROST: a smoothed ℓp-norm robust online subspace tracking method for background subtraction in video
Florian Seidel, Clemens Hage, Martin Kleinsteuber

An increasing number of methods for background subtraction use Robust PCA to identify sparse foreground objects. While many algorithms use the ℓ1-norm as a convex relaxation of the ideal sparsifying function, we approach the problem with a smoothed ℓp-quasi-norm and present pROST, a method for robust online subspace tracking. The algorithm is based on alternating minimization on manifolds. Implemented on a graphics processing unit, it achieves realtime performance at a resolution of 160×120. Experimental results on a state-of-the-art benchmark for background subtraction on real-world video data indicate that the method succeeds at a broad variety of background subtraction scenarios, and it outperforms competing approaches when video quality is deteriorated by camera jitter.



Special Issue Paper
Scene appearance model based on spatial prediction
Rami R. Hagege

The appearance of a static scene as sensed by a camera changes considerably as a result of changes in the illumination that falls upon it. Scene appearance modeling is thus necessary for understanding which changes in the appearance of a scene are the result of illumination changes. For any camera, the appearance of the scene is a function of the illumination sources in the scene, the three-dimensional configuration of the objects in the scene and the reflectance properties of all the surfaces in the scene. A scene appearance model is described here as a function of the behavior of static illumination sources, within or beyond the scene, and arbitrary three-dimensional configurations of patches and their reflectance distributions. Based on the suggested model, a spatial prediction technique was developed to predict the appearance of the scene, given a few measurements within it. The scene appearance model and the prediction technique were developed analytically and tested empirically. Two potential applications are briefly explored.



Special Issue Paper
Background modeling in the maritime domain
Domenico D. Bloisi, Andrea Pennisi, Luca Iocchi

Maritime environment represents a challenging scenario for automatic video surveillance due to the complexity of the observed scene: waves on the water surface, boat wakes, and weather issues contribute to generate a highly dynamic background. Moreover, an appropriate background model has to deal with gradual and sudden illumination changes, camera jitter, shadows, and reflections that can provoke false detections. Using a predefined distribution (e.g., Gaussian) for generating the background model can result ineffective, due to the need of modeling non-regular patterns. In this paper, a method for creating a “discretization” of an unknown distribution that can model highly dynamic background such as water is described. A quantitative evaluation carried out on two publicly available datasets of videos and images, containing data recorded in different maritime scenarios, with varying light and weather conditions, demonstrates the effectiveness of the approach.



Special Issue Paper
Dynamic image mosaic via SIFT and dynamic programming
Lin Zeng, Shengping Zhang, Jun Zhang, Yunlu Zhang

Image mosaic is a useful preprocessing step for background subtraction in videos recorded by a moving camera. To avoid the ghosting effect and mosaic failure due to huge exposure difference and big parallax between adjacent images, this paper proposes an effective mosaic algorithm named Combined SIFT and Dynamic Programming (CSDP). Based on SIFT matching and dynamic programming, CSDP uses an improved optimal seam searching criterion that provides “protection mechanisms” for moving objects with an edge-enhanced weighting intensity difference operator and ultimately solves the ghosting and incomplete effect induced by moving objects. The proposed method was compared to three widely used mosaic softwares (i.e., AutoStitch, Microsoft ICE, and Panorama Maker) and Mills’ approach in multiple scenes. Experimental results show the feasibility and effectiveness of the proposed method.



HMPMR strategy for real-time tracking in aerial images, using direct methods
Carol Martínez, Pascual Campoy, Iván F. Mondragón, José Luis Sánchez-Lopez, Miguel A. Olivares-Méndez

The vast majority of approaches make use of features to track objects. In this paper, we address the tracking problem with a tracking-by-registration strategy based on direct methods. We propose a hierarchical strategy in terms of image resolution and number of parameters estimated in each resolution, that allows direct methods to be applied in demanding real-time visual-tracking applications. We have called this strategy the Hierarchical Multi-Parametric and Multi-Resolution strategy (HMPMR). The Inverse Composition Image Alignment Algorithm (ICIA) is used as an image registration technique and is extended to an HMPMR-ICIA. The proposed strategy is tested with different datasets and also with image data from real flight tests using an Unmanned Aerial Vehicle, where the requirements of direct methods are easily unsatisfied (e.g. vehicle vibrations). Results show that using an HMPMR approach, it is possible to cope with the efficiency problem and with the small motion constraint of direct methods, conducting the tracking task at real-time frame rates and obtaining a performance that is comparable to, or even better than, the one obtained with the other algorithms that were analyzed.



A synthetic training framework for providing gesture scalability to 2.5D pose-based hand gesture recognition systems
Javier Molina, José M. Martínez

The use of hand gestures offers an alternative to the commonly used human computer interfaces (i.e., keyboard, mouse, gamepad), providing a more intuitive way of navigating among menus and in multimedia applications. One of the most difficult issues when designing a hand gesture recognition system is to introduce new detectable gestures without high cost, this is known as gesture scalability. Commonly, the introduction of new gestures needs a recording session of them, involving real subjects in the process. This paper presents a training framework for hand posture detection systems based on a learning scheme fed with synthetically generated range images. Different configurations of a 3D hand model result in sets of synthetic subjects, which have shown good performance in the separation of gestures from several dictionaries of the State of Art. The proposed approach allows the learning of new dictionaries with no need of recording real subjects, so it is fully scalable in terms of gestures. The obtained accuracy rates for the dictionaries evaluated are comparable to, and for some cases better than, the ones reported for different real subjects training schemes.



Uncalibrated flatfielding and illumination vector estimationfor photometric stereo face reconstruction
Maria E. Angelopoulou, Maria Petrou

Within the context of photometric stereo reconstruction, flatfielding may be used to compensate for the effect of the inverse-square law of light propagation on the pixel brightness. This would require capturing a set of reference images at an off-line imaging session, which employs a calibrating device that should be captured under the exact conditions as the main session. Similarly, the illumination vectors, on which photometric stereo relies, are typically precomputed based on another dedicated calibration session. In practice, implementing such off-line sessions is inconvenient and often infeasible. This work aims at enabling accurate photometric stereo reconstruction for the case of non-interactive on-line capturing of human faces. We propose unsupervised methodologies, which extract all information that is required for accurate face reconstruction from the images of interest themselves. Specifically, we propose an uncalibrated flatfielding and an uncalibrated illumination vector estimation methodology, and we assess their effect on photometric stereo face reconstruction. Results demonstrate that incorporating our methodologies into the photometric stereo framework halves the reconstruction error, while eliminating the need of off-line calibration.



Detection and localization of specular surfaces using image motion cues
Ozgur Yilmaz, Katja Doerschner

Successful identification of specularities in an image can be crucial for an artificial vision system when extracting the semantic content of an image or while interacting with the environment. We developed an algorithm that relies on scale and rotation invariant feature extraction techniques and uses motion cues to detect and localize specular surfaces. Appearance change in feature vectors is used to quantify the appearance distortion on specular surfaces, which has previously been shown to be a powerful indicator for specularity (Doerschner et al. in Curr Biol, 2011). The algorithm combines epipolar deviations (Swaminathan et al. in Lect Notes Comput Sci 2350:508–523, 2002) and appearance distortion, and succeeds in localizing specular objects in computer-rendered and real scenes, across a wide range of camera motions and speeds, object sizes and shapes, and performs well under image noise and blur conditions.



Abnormal behavior detection using dominant sets
Manuel Alvar, Andrea Torsello, Alvaro Sanchez-Miralles, José María Armingol

Smart surveillance systems are increasingly being used to detect potentially dangerous situations. To do so, the common and easier way is to model normal human behaviors and consider as abnormal any new strange behavior in the scene. In this article, Dominant Sets is adapted to model most frequent behaviors and to detect any unknown event to trigger an alarm. It is proved that after an unsupervised training, Dominant Sets can robustly detect abnormal behaviors. The method is tested in several different cases and compared to other usual clusterization methods such as KNN, mixture of Gaussians or Fuzzy K-Means to confirm its robustness and performance. The overall performance of abnormal behavior detection based on Dominant Sets is better, being the error ratio at least 1.5 points lower than the others.