Center for Research in Comptuer Vision
Center for Research in Comptuer Vision



MVA

Volume 26, Issue 1


This issue features the following papers.



Spatio-temporal features for the automatic control of driver drowsiness state and lack of concentration
Belhassen Akrout, Walid Mahdi

Driver fatigue is one of the leading causes of road accidents. It affects the mental vigilance of the driver and reduces his personal capacity to drive a vehicle in full safety. These factors increase the risk of human errors which could involve deaths and wounds. Consequently, the development of an automatic system, which controls the driver fatigue and prevents him from accidents in advance, has received a growing interest. In this work, we have proposed a fusion system for drowsiness detection based on blinking measurement and the 3D head pose estimation. We have studied the driver’s eye behaviors by analysing a non-stationary and non-linear signal and we estimate the head rotation in the three directions \(Yaw\), Pitch, and Roll by exploiting only three interest points of the face. Our suggested system of fusion presents three levels of drowsiness: awake, tired, and very tired. This system is evaluated by both DEAP and MiraclHB databases. The evaluation shows many promising results and shows the effectiveness of the suggested approach.



A robust multilevel segment description for multi-class object recognition
Mohammadreza Mostajabi, Iman Gholampour

We present an attempt to improve the performance of multi-class image segmentation systems based on a multilevel description of segments. The multi-class image segmentation system used in this paper marks the segments in an image, describes the segments via multilevel feature vectors and passes the vectors to a multi-class object classifier. The focus of this paper is on the segment description section. We first propose a robust, scale-invariant texture feature set, named directional differences (DDs). This feature is designed by investigating the flaws of conventional texture features. The advantages of DDs are justified both analytically and experimentally. We have conducted several experiments on the performance of our multi-class image segmentation system to compare DDs with some well-known texture features. Experimental results show that DDs present about 8 % higher classification accuracy. Feature reduction experiments also show that in a combined feature space, DDs remain in the list of most effective features even for small feature vector sizes. To describe a segment fully, we introduce a multilevel strategy called different levels of feature extraction (DLFE) that enables the system to include the semantic relations and contextual information in the features. This information is very effective especially for highly occluded objects. DLFE concatenates the features related to different views of every segment. Experimental results that show more than 4 % improvement in multi-class image segmentation accuracy is achieved. Using the semantic information in the classifier section adds another 2 % improvement to the accuracy of the system.



Fast inspection for size-based analysis in aggregate processing
Gordon Christie, Kevin Kochersberger, A. Lynn Abbott

As rocks are transported along the conveyor belt of a quarry, the maximum dimension of the rocks exiting the crushers should not exceed a size threshold specific to each crusher. If the rocks are too large then they can pose a threat to equipment, and lead to a large cost in repair and loss of production. A 2D vision system is presented, which is capable of estimating the size distribution of the rocks, and also monitoring for oversize rocks. Image segmentation is performed, which is followed by a process that classifies the segments as valid or invalid using a support vector machine. A novel split algorithm is presented, which attempts to split segments that have resulted in undersegmentation. This allows the system to constantly monitor for oversize rocks without stopping the conveyor belt. For the experiments presented in this paper, a set of images was taken of rocks on a moving conveyor. In testing, it was found that 81.31 % of the segments output by the system correctly found the maximum dimension of the rock that it represented.



Tracking the articulated motion of the human body with two RGBD cameras
Damien Michel, Costas Panagiotakis, Antonis A. Argyros

We present a model-based, top-down solution to the problem of tracking the 3D position, orientation and full articulation of the human body from markerless visual observations obtained by two synchronized RGBD cameras. Inspired by recent advances to the problem of model-based hand tracking Oikonomidis et al. (Efficient Model-based 3D Tracking of Hand Articulations using Kinect, 2011), we treat human body tracking as an optimization problem that is solved using stochastic optimization techniques. We show that the proposed approach outperforms in accuracy state of the art methods that rely on a single RGBD camera. Thus, for applications that require increased accuracy and can afford the extra-complexity introduced by the second sensor, the proposed approach constitutes a viable solution to the problem of markerless human motion tracking. Our findings are supported by an extensive quantitative evaluation of the method that has been performed on a publicly available data set that is annotated with ground truth.



A self-adaptive matched filter for retinal blood vessel detection
Tapabrata Chakraborti, Dhiraj K. Jha, Ananda S. Chowdhury, Xiaoyi Jiang

Retinal fundus images are widely studied in medicine for the detection of certain pathologies such as diabetes and glaucoma, the two major reasons for blindness. In this paper, a self-adaptive matched filter for the detection of blood vessels in the retinal fundus images is proposed. In particular, a novel synergistic combination of the vesselness filter with high sensitivity and the matched filter with high specificity is obtained using orientation histogram. Experiments on the publicly available DRIVE database clearly show that the proposed strategy outperforms several existing methods. Comparable performance with some of the state-of-the-art methods has also been obtained on the STARE and CHASE databases.



An image inpainting method using pLSA-based search space estimation
Mrinmoy Ghorai, Bhabatosh Chanda

In this paper, we present a novel exemplar-based image inpainting technique based on the local context measure of the target patch. Three main steps of the proposed method are determination of patch priority, the search space estimation for the candidate patches and the patch completion to fill in the unknown pixels of the target patch. In patch priority, we emphasize on the structure by the spatial relationship of neighborhood similar patches and kernel regression based local image structure. We find the search space, sub-regions of the entire source region similar to the region surrounding the target patch, to find the candidate patches. The said search space is estimated using probabilistic latent semantic analysis (pLSA). Last, we infer the unknown pixels of the target patch using pLSA-based context and histogram similarity measure between the target patch and the candidate patches. Experimental results are found to be good compared to the competitive methods and may be used for digital restoration of images of defective or damaged artifacts.

This work is partially supported by Department of Science and Technology, Government of India (NRDMS/11/1586/09/Phase-I/Project No. 9.



Hierarchical classification with reject option for live fish recognition
Phoenix X. Huang, Bastiaan J. Boom, Robert B. Fisher

A live fish recognition system is needed in application scenarios where manual annotation is too expensive, i.e. too many underwater videos. We present a novel balance-enforced optimized tree with reject option (BEOTR) for live fish recognition. It recognizes the top 15 common species of fish and detects new species in an unrestricted natural environment recorded by underwater cameras. The three main contributions of the paper are: (1) a novel hierarchical classification method suited for greatly unbalanced classes, (2) a novel classification-rejection method to clear up decisions and reject unknown classes, (3) an application of the classification method to free swimming fish. This system assists ecological surveillance research, e.g. fish population statistics in the open sea. BEOTR is automatically constructed based on inter-class similarities. Afterwards, trajectory voting is used to eliminate accumulated errors during hierarchical classification and, therefore, achieves better performance. We apply a Gaussian mixture model and Bayes rule as a reject option after the hierarchical classification to evaluate the posterior probability of being a certain species to filter less confident decisions. The proposed BEOTR-based hierarchical classification method achieves significant improvements compared to state-of-the-art techniques on a live fish image dataset of 24,150 manually labelled images from South Taiwan Sea.



An automated gland segmentation and classification method in prostate biopsies: an image source-independent approach
Jouni Pääkkönen, Niina Päivinen, Matti Nykänen, Timo Paavonen

The aim of this paper is to introduce an image source-independent automated method for segmentation and classification of prostate glands. This research focuses on light microscopic images of the samples from different laboratories using the same staining method. Color information in the image is highly dependent on the source and the conditions under which the image has been taken. The proposed method can be used to analyze images with color variations. Color information is used for the segmentation of tissue structures and Delaunay triangulation is used for gland segmentation. The proposed method uses triangulation to find the basic structure of any shaped and sized gland and to prevent misclassification of gland components. The proposed approach classifies the nuclei circumscribing the glands to single and multilayered. Other features used in the classification are the amount of nuclei and the area of the gland. The number of layers can be used for determining the malignancy of the tissue sample. In most cases, a single-layered gland is malignant and multilayered is benign. This segmentation approach is different than what has been previously used in the literature. In this paper, the glands are classified to four different categories: single layered, multilayered, rejected or nonclassified. This approach distinguishes majority of single and multilayered glands from each other.



A computer vision-based approach to grade simulated cataract surgeries
Junhuan Zhu, Jiebo Luo, Jonathan M. Soh, Yousuf M. Khalifa

To increase the timeliness, objectivity, and efficiency in evaluating ophthalmology residents’ learning of cataract surgery, an automatic analysis system for cataract surgery videos is developed to assess performance, particularly in the capsulorhexis step on the Kitaro simulator. We utilize computer vision technologies to measure performance of this critical step including duration, size, centrality, circularity, as well as motion stability during the capsulorhexis procedure. Consequently, a grading mechanism is established based on either linear regression or non-linear classification via Support Vector Machine of those computed measures. Comparisons of expert graders to the computer vision-based approach have demonstrated the accuracy and consistency of the computerized technique.



Visual tracking based on group sparsity learning
Yong Wang, Shiqiang Hu, Shandong Wu

We propose a new tracking method based on a group sparsity learning model. Previous work on sparsity tracking rely on a single sparse model to characterize the templates of tracking targets, which is hard to express complex tracking scenes. In this work, we utilize a superposition of multiple simpler sparse models to capture the structural information across templates. More specifically, our tracking method is formulated within particle filter framework and the particle representations are decomposed into two sparsity norms: a l1,∞ norm and a l1,2 norm, capturing the common and different information across the templates, respectively. To efficiently implement the proposed tracker, we adapt the alternating direction method of multipliers to solve the formulated two-norm optimization problem. The proposed tracking method is compared with seven state-of-the-art trackers using 16 publicly available and challenging video sequences due to appearance changes, heavy occlusions, and pose variations. Experiment results show that our tracker outperforms the five other tracking methods.