Center for Research in Comptuer Vision
Center for Research in Comptuer Vision



MVA

Volume 25, Issue 3


This issue features the following special issue and original papers.



Car Navigation and Vehicle Systems
Fatih Porikli, Luc Van Gool

Editorial.



Special Issue Paper
Event classification for vehicle navigation system by regional optical flow analysis
Min-Kook Choi, Joonseok Park, Sang-Chul Lee

We address the problem of event classification for intelligent vehicle navigation system from video sequences acquired by a front mounted camera in complex urban scenes. Although in normal driving condition, large variety of events could be found and be preferably attached to an alerting system in a vehicle, there have been relatively narrow research activities on driving scene analysis, for example, finding local information such as lanes, pedestrians, traffic signs or light detections. Yet, the above-mentioned methods only provide limited performance due to many challenges in normal urban driving conditions, i.e. complex background, inhomogeneous illumination, occlusion, etc. In this paper, we tackle the problem of classification of various events by learning regional optical flows to detect some important events (very frequent occurring and involving riskiness on driving) using low cost front mounted camera equipment. We approached the problem as follows: First, we present an optical flow-based event detection method by regional significance analysis with the introduction of a novel significance map based on regional histograms of flow vectors; Second, we present a global and a local method to robustly detect ego-motion-based events and target-motion-based events. In our experiments, we achieved classification accuracy about 91% on average tested with two classifiers (Bayesian and SVM). We also show the performance of the method in terms of computational complexity achieving about 14.3 fps on a laptop computer with Intel Pentium 1.2 Ghz.



Special Issue Paper
Parking assistance using dense motion-stereo
Christian Unger, Eric Wahl, Slobodan Ilic

The ability of generating and interpreting a three-dimensional representation of the environment in real-time is one of the key technologies for autonomous vehicles. While active sensors like ultrasounds have been commercially used, their cost and precision is not favorable. On the other hand, integrating passive sensors, like video cameras, in modern vehicles is quite appealing especially because of their low cost. However, image processing requires reliable real-time algorithms to retrieve depth from visual information. In addition, the limited processing power in automobiles and other mobile platforms makes this problem even more challenging. In this paper we introduce a parking assistance system which relies on dense motion-stereo to compute depth maps of the observed environment in real-time. The flexibility and robustness of our method is showcased with different applications: automatic parking slot detection, a collision warning for the pivoting ranges of the doors and an image-based rendering technique to visualize the environment around the host vehicle. We evaluate the accuracy and reliability of our system and provide quantitative and qualitative results. A comparison to ultrasound and feature-based motion-stereo solutions shows that our approach is more reliable.



Special Issue Paper
Multi-modal object detection and localization for high integrity driving assistance
Sergio Alberto Rodríguez Flórez, Vincent Frémont, Philippe Bonnifait, Véronique Cherfaoui

Much work is currently devoted to increasing the reliability, completeness and precision of the data used by driving assistance systems, particularly in urban environments. Urban environments represent a particular challenge for the task of perception, since they are complex, dynamic and completely variable. This article examines a multi-modal perception approach for enhancing vehicle localization and the tracking of dynamic objects in a world-centric map. 3D ego-localization is achieved by merging stereo vision perception data and proprioceptive information from vehicle sensors. Mobile objects are detected using a multi-layer lidar that is simultaneously used to identify a zone of interest to reduce the complexity of the perception process. Object localization and tracking is then performed in a fixed frame which simplifies analysis and understanding of the scene. Finally, tracked objects are confirmed by vision using 3D dense reconstruction in focused regions of interest. Only confirmed objects can generate an alarm or an action on the vehicle. This is crucial to reduce false alarms that affect the trust that the driver places in the driving assistance system. Synchronization issues between the sensing modalities are solved using predictive filtering. Real experimental results are reported so that the performance of the multi-modal system may be evaluated.



Special Issue Paper
Active learning for on-road vehicle detection: a comparative study
Sayanan Sivaraman, Mohan M. Trivedi

In recent years, active learning has emerged as a powerful tool in building robust systems for object detection using computer vision. Indeed, active learning approaches to on-road vehicle detection have achieved impressive results. While active learning approaches for object detection have been explored and presented in the literature, few studies have been performed to comparatively assess costs and merits. In this study, we provide a cost-sensitive analysis of three popular active learning methods for on-road vehicle detection. The generality of active learning findings is demonstrated via learning experiments performed with detectors based on histogram of oriented gradient features and SVM classification (HOG–SVM), and Haar-like features and Adaboost classification (Haar–Adaboost). Experimental evaluation has been performed on static images and real-world on-road vehicle datasets. Learning approaches are assessed in terms of the time spent annotating, data required, recall, and precision.



Special Issue Paper
Traffic event classification at intersections based on the severity of abnormality
Ömer Aköz, M. Elif Karsligil

This paper proposes a novel traffic event classification approach using event severities at intersections. The proposed system basically learns normal and common traffic flow by clustering vehicle trajectories. Common vehicle routes are generated by implementing trajectory clustering with Continuous Hidden Markov Model. Vehicle abnormality is detected by observing maximum likelihoods of partial vehicle locations and velocities on underlying common route models. The second part of the work is based on extracting the severities of abnormality by deviation measurement using Coefficient of Variances method. By using abnormal event samples, two severity classes are built in order to recognize event severities by Support Vector Machines and k-Nearest Neighborhood algorithms. Experimental results show that the proposed model has high precision with satisfactory incident detection and event severity classification performance.



Special Issue Paper
Multi-view traffic sign detection, recognition, and 3D localisation
Radu Timofte, Karel Zimmermann, Luc Van Gool

Several applications require information about street furniture. Part of the task is to survey all traffic signs. This has to be done for millions of km of road, and the exercise needs to be repeated every so often. We used a van with eight roof-mounted cameras to drive through the streets and took images every meter. The paper proposes a pipeline for the efficient detection and recognition of traffic signs from such images. The task is challenging, as illumination conditions change regularly, occlusions are frequent, sign positions and orientations vary substantially, and the actual signs are far less similar among equal types than one might expect. We combine 2D and 3D techniques to improve results beyond the state-of-the-art, which is still very much preoccupied with single view analysis. For the initial detection in single frames, we use a set of colour- and shape-based criteria. They yield a set of candidate sign patterns. The selection of such candidates allows for a significant speed up over a sliding window approach while keeping similar performance. A speedup is also achieved through a proposed efficient bounded evaluation of AdaBoost detectors. The 2D detections in multiple views are subsequently combined to generate 3D hypotheses. A Minimum Description Length formulation yields the set of 3D traffic signs that best explains the 2D detections. The paper comes with a publicly available database, with more than 13,000 traffic signs annotations.



Special Issue Paper
Exploiting temporal and spatial constraints in traffic sign detection from a moving vehicle
Siniša Šegvić, Karla Brkić, Zoran Kalafatić, Axel Pinz

This paper addresses detection, tracking and recognition of traffic signs in video. Previous research has shown that very good detection recalls can be obtained by state-of-the-art detection algorithms. Unfortunately, satisfactory precision and localization accuracy are more difficultly achieved. We follow the intuitive notion that it should be easier to accurately detect an object from an image sequence than from a single image. We propose a novel two-stage technique which achieves improved detection results by applying temporal and spatial constraints to the occurrences of traffic signs in video. The first stage produces well-aligned temporally consistent detection tracks by managing many competing track hypotheses at once. The second stage improves the precision by filtering the detection tracks by a learned discriminative model. The two stages have been evaluated in extensive experiments performed on videos acquired from a moving vehicle. The obtained experimental results clearly confirm the advantages of the proposed technique.



Special Issue Paper
Enhanced fog detection and free-space segmentation for car navigation
Nicolas Hautière, Jean-Philippe Tarel, Houssam Halmaoui, Roland Brémond, Didier Aubert

Free-space detection is a primary task for car navigation. Unfortunately, classical approaches have difficulties in adverse weather conditions, in particular in daytime fog. In this paper, a solution is proposed thanks to a contrast restoration approach on images grabbed by an in-vehicle camera. The proposed method improves the state-of-the-art in several ways. First, the segmentation of the fog region of interest is better segmented thanks to the computation of the shortest routes maps. Second, the fog density as well as the position of the horizon line is jointly computed. Then, the method restores the contrast of the road by only assuming that the road is flat and, at the same time, detects the vertical objects. Finally, a segmentation of the connected component in front of the vehicle gives the free-space area. An experimental validation was carried out to foresee the effectiveness of the method. Different results are shown on sample images extracted from video sequences acquired from an in-vehicle camera. The proposed method is complementary to existing free-space area detection methods relying on color segmentation and stereovision.



Special Issue Paper
Dynamic objects detection through visual odometry and stereo-vision: a study of inaccuracy and improvement sources
Adrien Bak, Samia Bouchafa, Didier Aubert

Road safety, whatever the considered environment, relies heavily on the ability to detect and track moving objects from a moving point of view. In order to achieve such a detection, the vehicle’s ego-motion must first be estimated and compensated. This issue is crucial to complete a fully autonomous vehicle; this is why several approaches have already been proposed. This study presents a method, based solely on visual information that implements such a process. Information from stereo-vision and motion is derived to extract the vehicle’s ego-motion. Ego-motion extraction algorithm is thoroughly evaluated in terms of precision and uncertainty. Given those statistical attributes, a method for dynamic objects detection is presented. This method relies on 3D image registration and residual displacement field evaluation. This method is then evaluated on several real and synthetic data sequences. It will be shown that it allows a reliable and early detection, even in hard cases (e.g. occlusions,...). Given a few additional factors (detectable motion range), overall performances can be derived from visual odometry performances.



Special Issue Paper
Low-cost sensor to detect overtaking based on optical flow
Pablo Guzmán, Javier Díaz, Jarno Ralli, Rodrigo Agís, Eduardo Ros

The automotive industry invests substantial amounts of money in driver-security and driver-assistance systems. We propose an overtaking detection system based on visual motion cues that combines feature extraction, optical flow, solid-objects segmentation and geometry filtering, working with a low-cost compact architecture based on one focal plane and an on-chip embedded processor. The processing is divided into two stages: firstly analog processing on the focal plane processor dedicated to image conditioning and relevant image-structure selection, and secondly, vehicle tracking and warning-signal generation by optical flow, using a simple digital microcontroller. Our model can detect an approaching vehicle (multiple-lane overtaking scenarios) and warn the driver about the risk of changing lanes. Thanks to the use of tightly coupled analog and digital processors, the system is able to perform this complex task in real time with very constrained computing resources. The proposed method has been validated with a sequence of more than 15,000 frames (90 overtaking maneuvers) and is effective under different traffic situations, as well as weather and illumination conditions.



Special Issue Paper
Creating robust high-throughput traffic sign detectors using centre-surround HOG statistics
Gary Overett, Lachlan Tychsen-Smith, Lars Petersson, Niklas Pettersson, Lars Andersson

In this paper, we detail a system for creating object detectors which meet the extreme demands of real-world traffic sign detection applications such as GPS map making and real-time in-car traffic sign detection. The resulting detectors are designed to detect and locate multiple traffic sign types in high-definition video (high throughput) from several cameras captured along thousands of kilometers of road with minimal false-positives and detection rates in excess of 99%. This allows for the accurate detection and location of traffic signs in geo-tagged video datasets of entire national road networks in reasonable time using only moderate computing infrastructure. A key to the success of the methods described in this paper is the use of extremely efficient classifier features. In this paper, we identify two obstacles to achieving the desired performance for all target traffic sign types, feature memory bandwidth requirements and feature discriminance. We introduce our use of centre-surround histogram of oriented gradient (HOG) statistics which greatly reduce the per-feature memory bandwidth requirements. Subsequently we extend our use of centre-surround HOG statistics to the color domain, raising the discriminant power of the final classifiers for more challenging sign types.



Special Issue Paper
Recent progress in road and lane detection: a survey
Aharon Bar Hillel, Ronen Lerner, Dan Levi, Guy Raz

The problem of road or lane perception is a crucial enabler for advanced driver assistance systems. As such, it has been an active field of research for the past two decades with considerable progress made in the past few years. The problem was confronted under various scenarios, with different task definitions, leading to usage of diverse sensing modalities and approaches. In this paper we survey the approaches and the algorithmic techniques devised for the various modalities over the last 5 years. We present a generic break down of the problem into its functional building blocks and elaborate the wide range of proposed methods within this scheme. For each functional block, we describe the possible implementations suggested and analyze their underlying assumptions. While impressive advancements were demonstrated at limited scenarios, inspection into the needs of next generation systems reveals significant gaps. We identify these gaps and suggest research directions that may bridge them.



Features classification using geometrical deformation feature vector of support vector machine and active appearance algorithm for automatic facial expression recognition
Rajesh A. Patil, Vineet Sahula, A. S. Mandal

This paper proposes a method for facial expression recognition in image sequences. Face is detected from the scene and then facial features are detected using image normalization, and thresholding techniques. Using an optimization algorithm the Candide wire frame model is adapted properly on the first frame of face image sequence. In the subsequent frames of image sequence facial features are tracked using active appearance algorithm. Once the model fits on the first frame, animation parameters of model are set to zero, to obtain the shape of model for the neutral facial expression of the same face. The last frame of the image sequence corresponds to greatest facial expression intensity. The geometrical displacement of the Candide wire frame nodes, between the neutral expression frame and the last frame, is used as an input to the multiclass support vector machine, which classifies facial expression into one of the class such as happy, surprise, sadness, anger, disgust, fear and neutral. This method is applicable for frontal as well as tilted faces with angle ±30∘,±45∘,±60∘ with respect to y axis.



ReigSAC: fast discrimination of spurious keypoint correspondences on planar surfaces
Hugo Proença

Various methods were proposed to detect/match special interest points (keypoints) in images and some of them (e.g., SIFT and SURF) are among the most cited techniques in computer vision research. This paper describes an algorithm to discriminate between genuine and spurious keypoint correspondences on planar surfaces. We draw random samples of the set of correspondences, from which homographies are obtained and their principal eigenvectors extracted. Density estimation on that feature space determines the most likely true transform. Such homography feeds a cost function that gives the goodness of each keypoint correspondence. Being similar to the well-known RANSAC strategy, the key finding is that the main eigenvector of the most (genuine) homographies tends to represent a similar direction. Hence, density estimation in the eigenspace dramatically reduces the number of transforms actually evaluated to obtain reliable estimations. Our experiments were performed on hard image data sets, and pointed that the proposed approach yields effectiveness similar to the RANSAC strategy, at significantly lower computational burden, in terms of the proportion between the number of homographies generated and those that are actually evaluated.



A feature selection method using improved regularized linear discriminant analysis
Alok Sharma, Kuldip K. Paliwal, Seiya Imoto, Satoru Miyano

Investigation of genes, using data analysis and computer-based methods, has gained widespread attention in solving human cancer classification problem. DNA microarray gene expression datasets are readily utilized for this purpose. In this paper, we propose a feature selection method using improved regularized linear discriminant analysis technique to select important genes, crucial for human cancer classification problem. The experiment is conducted on several DNA microarray gene expression datasets and promising results are obtained when compared with several other existing feature selection methods.



FPGA-based module for SURF extraction
Tomáš Krajník, Jan Šváb, Sol Pedre, Petr Čížek, Libor Přeučil

We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm’s most computationally expensive process, the interest point detection. The module’s overall performance is evaluated and compared to CPU and GPU-based solutions. Results show that the embedded module achieves comparable distinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots.



Auto-calibrating photometric stereo using ring light constraints
Rakesh Shiradkar, Ping Tan, Sim Heng Ong

Various methods were proposed to detect/match special interest points (keypoints) in images and some of them (e.g., SIFT and SURF) are among the most cited techniques in computer vision research. This paper describes an algorithm to discriminate between genuine and spurious keypoint correspondences on planar surfaces. We draw random samples of the set of correspondences, from which homographies are obtained and their principal eigenvectors extracted. Density estimation on that feature space determines the most likely true transform. Such homography feeds a cost function that gives the goodness of each keypoint correspondence. Being similar to the well-known RANSAC strategy, the key finding is that the main eigenvector of the most (genuine) homographies tends to represent a similar direction. Hence, density estimation in the eigenspace dramatically reduces the number of transforms actually evaluated to obtain reliable estimations. Our experiments were performed on hard image data sets, and pointed that the proposed approach yields effectiveness similar to the RANSAC strategy, at significantly lower computational burden, in terms of the proportion between the number of homographies generated and those that are actually evaluated.



Seeing Eye Phone: a smart phone-based indoor localization and guidance system for the visually impaired
Dong Zhang, Dah-Jye Lee, Brandon Taylor

In order to help the visually impaired as they navigate unfamiliar environment such as public buildings, this paper presents a novel smart phone, vision-based indoor localization, and guidance system, called Seeing Eye Phone. This system requires a smart phone from the user and a server. The smart phone captures and transmits images of the user facing forward to the server. The server processes the phone images to detect and describe 2D features by SURF and then matches them to the 2D features of the stored map images that include their corresponding 3D information of the building. After features are matched, Direct Linear Transform runs on a subset of correspondences to find a rough initial pose estimate and the Levenberg–Marquardt algorithm further refines the pose estimate to find a more optimal solution. With the estimated pose and the camera’s intrinsic parameters, the location and orientation of the user are calculated using 3D location correspondence data stored for features of each image. Positional information is then transmitted back to the smart phone and communicated to the user via text-to-speech. This indoor guiding system uses efficient algorithms such as SURF, homographs, multi-view geometry, and 3D to 2D reprojection to solve a very unique problem that will benefit the visually impaired. The experimental results demonstrate the feasibility of using a simple machine vision system design to accomplish a complex task and the potential of building a commercial product based on this design.