Center for Research in Comptuer Vision
Center for Research in Comptuer Vision



MVA

Volume 24, Issue 6


This issue features the following papers.



Gaussian-weighted Jensen-Shannon divergence as a robust fitness function for multi-model fitting
Kai Zhou, Karthik Mahesh Varadarajan, Michael Zillich, Markus Vincze

Model fitting is a fundamental component in computer vision for salient data selection, feature extraction and data parameterization. Convention-al approaches such as the RANSAC family show limitations when deal-ing with data containing multiple models, high percentage of outliers or sample selection bias, commonly encountered in computer vision appli-cations. In this paper, we present a novel model evaluation function based on Gaussian-weighted Jensen–Shannon divergence, and inte-grate into a particle swarm optimization (PSO) framework using ring to-pology. We avoid two problems from which most regression algorithms suffer, namely the requirements to specify inlier noise scale and the number of models. The novel evaluation method is generic and does not require any estimation of inlier noise. The continuous and meta-heuristic exploration facilitates estimation of each individual model while delivering the number of models automatically. Tests on datasets comprised of inli-er noise and a large percentage of outliers (more than 90 % of the data) demonstrate that the proposed framework can efficiently estimate multi-ple models without prior information. Superior performance in terms of processing time and robustness to inlier noise is also demonstrated with respect to state of the art methods.



Shadow compensation and illumination normalization of face image
Haiyang Wang, Mao Ye, Shangming Yang

This study proposes a novel shadow compensation and illumination nor-malization method under uncontrolled light conditions. First, we decom-pose the face image into two images based on the Lambertian theory, which corresponds to the large- and small-scale features, respectively. Then, the threshold minimum-and-maximum filter on the small-scale fea-tures to smooth the shadow edge is applied. After that, the robust Princi-pal Component Analysis and some normalization methods are used to remove the shadow and normalize the face image on the large-scale features. In the end, the normalized face image is obtained by combining both results from the large- and small-scale features. Our main contribu-tion is that a more reliable shadow compensation approach is found, which can get a better normalized face image. Experiments on the Ex-tended Yale B, CMU-PIE and FRGC 2.0 (Face Recognition Grand Chal-lenge) face datasets show that not only the recognition performance is significantly improved, but also much better visual quality is achieved.



Estimating 3D human shapes from measurements
Stefanie Wuhrer, Chang Shu

Recent advances in 3D imaging technologies give rise to databases of human shapes, from which statistical shape models can be built. These statistical models represent prior knowledge of the human shape and enable us to solve shape reconstruction problems from partial infor-mation. Generating human shape from traditional anthropometric meas-urements is such a problem, since these 1D measurements encode 3D shape information. Combined with a statistical shape model, these easy-to-obtain measurements can be leveraged to create 3D human shapes. However, existing methods limit the creation of the shapes to the space spanned by the database and thus require a large amount of training data. In this paper, we introduce a technique that extrapolates the statis-tically inferred shape to fit the measurement data using non-linear optimi-zation. This method ensures that the generated shape is both human-like and satisfies the measurement conditions. We demonstrate the effective-ness of the method and compare it to existing approaches through ex-tensive experiments, using both synthetic data and real human measure-ments.



Analysis of object description methods in a video object tracking environment
Pedro Carvalho, Telmo Oliveira, Lucian Ciobanu, Filipe Gaspar, Luís F. Teixeira, Rafael Bastos, Jaime S. Cardoso, Miguel S. Dias, Luís Côrte-Real

A key issue in video object tracking is the representation of the objects and how effectively it discriminates between different objects. Several techniques have been proposed, but without a generally accepted meth-od. While analysis and comparisons of these individual methods have been presented in the literature, their evaluation as part of a global solu-tion has been overlooked. The appearance model for the objects is a component of a video object tracking framework, depending on previous processing stages and affecting those that succeed it. As a result, these interdependencies should be taken into account when analysing the per-formance of the object description techniques. We propose an integrated analysis of object descriptors and appearance models through their com-parison in a common object tracking solution. The goal is to contribute to a better understanding of object description methods and their impact on the tracking process. Our contributions are threefold: propose a novel descriptor evaluation and characterisation paradigm; perform the first integrated analysis of state-of-the-art description methods in a scenario of people tracking; put forward some ideas for appearance models to use in this context. This work provides foundations for future tests and the proposed assessment approach contributes to the informed selection of techniques more adequately for a given tracking application context.



A new fusion scheme for multifocus images based on focused pix-els detection
Huafeng Li, Yi Chai, Zhaofei Li

In this paper, a new multifocus image fusion scheme based on the tech-nique of focused pixels detection is proposed. First, a new improved mul-tiscale Top-Hat (MTH) transform, which is more effective than the tradi-tional Top-Hat transform in extracting focus information, is introduced and utilized to detect the pixels of the focused regions. Second, the initial decision map of the source images is generated by comparing the im-proved MTH value of each pixel. Then, the isolated regions removal method is developed and employed to refine the initial decision map. In order to improve the quality of the fused image and avoid the discontinui-ty in the transition zone, a dual sliding window technique and a fusion strategy based on multiscale transform are developed to achieve the transition zones fusion. Finally, the decision maps of the focused regions and the transition zones are both used to guide the fusion process, and then the final fused image is formed. The experimental results show that the proposed method outperforms the conventional multifocus image fusion methods in both subjective and objective qualities.



Active contours methods with respect to Vickers indentations
Michael Gadermayr, Andreas Maier, Andreas Uhl

We investigate different Vickers indentation segmentation methods and especially concentrate on active contours approaches as these tech-niques are known to be precise state of the art segmentation methods. Particularly, different kinds of level set-based methods which are im-provements of the traditional active contours are analyzed. In order to circumvent the initialization problem of active contours, we separate the segmentation process into two stages. For the first stage, we introduce an approach which approximately locates the indentations with a high certainty. The results achieved with this method serve as initializations for the precise active contours (second stage). This two-stage approach delivers highly precise results for most real world indentation images. However, there are images, which are very difficult to segment. To han-dle even such images, our segmentation method is incorporated with the Shape from Focus approach, by including 3D information. In order to decrease the overall runtime, moreover, a gradual enhancement ap-proach based on unfocused images is introduced. With three different databases, we compare the proposed methods and we show that the segmentation accuracy of these methods is highly competitive compared with other approaches in the literature.



Algorithmic methodologies for FPGA-based vision
Yoong Kang Lim, Lindsay Kleeman, Tom Drummond

This paper proposes a strategy for the design of computer vision on Field Programmable Gate Arrays (FPGAs). We show that there are certain advantages to the approach of designing an algorithm to specifically suit hardware rather than attempting to replicate the behavior of software implementations in hardware. We justify this approach through the analy-sis of two case studies. In the first case study, we present FPGA imple-mentations of two corner detectors. We make a number of observations which point to the advantages of an FPGA-tailored algorithm. In the sec-ond case study, we investigate the feasibility of this approach by design-ing a proof-of-concept face detection algorithm that was designed specif-ically for an FPGA. We show that this design allows for high detection speed on a low-cost FPGA device, although the same algorithm would not be considered in software. Finally, we conclude that FPGAs offer special opportunities for specialized algorithms that are infeasible in soft-ware.



Learning class-specific dictionaries for digit recognition from spherical surface of a 3D ball
Donghui Wang, Shu Kong

In the literature, very few researches have addressed the problem of rec-ognizing the digits placed on spherical surfaces, even though digit recog-nition has already attracted extensive attentions and been attacked from various directions. As a particular example of recognizing this kind of digits, in this paper, we introduce a digit ball detection and recognition system to recognize the digit appearing on a 3D ball. The so-called digit ball is the ball carrying Arabic number on its spherical surface. Our sys-tem works under weakly controlled environment to detect and recognize the digit balls for practical application, which requires the system to keep on working without recognition errors in a real-time manner. Two main challenges confront our system, one is how to accurately detect the balls and the other is how to deal with the arbitrary rotation of the balls. For the first one, we develop a novel method to detect the balls appearing in a single image and demonstrate its effectiveness even when the balls are densely placed. To circumvent the other challenge, we use spin im-age and polar image for the representation of the balls to achieve rota-tion-invariance advantage. Finally, we adopt a dictionary learning-based method for the recognition task. To evaluate our system, a series of ex-periments are performed on real-world digit ball images, and the results validate the effectiveness of our system, which achieves 100 % accuracy in the experiments.



A novel plane extraction approach using supervised learning
J. Rafid Siddiqui, Mohammad Havaei, Siamak Khatibi, Craig A. Lindley

This paper presents a novel approach for the classification of planar sur-faces in an unorganized point clouds. A feature-based planner surface detection method is proposed which classifies a point cloud data into planar and non-planar points by learning a classification model from an example set of planes. The algorithm performs segmentation of the sce-ne by applying a graph partitioning approach with improved representa-tion of association among graph nodes. The planarity estimation of the points in a scene segment is then achieved by classifying input points as planar points which satisfy planarity constraint imposed by the learned model. The resultant planes have potential application in solving simulta-neous localization and mapping problem for navigation of an unmanned-air vehicle. The proposed method is validated on real and synthetic scenes. The real data consist of five datasets recorded by capturing three-dimensional(3D) point clouds when a RGBD camera is moved in five different indoor scenes. A set of synthetic 3D scenes are constructed containing planar and non-planar structures. The synthetic data are con-taminated with Gaussian and random structure noise. The results of the empirical evaluation on both the real and the simulated data suggest that the method provides a generalized solution for plane detection even in the presence of the noise and non-planar objects in the scene. Further-more, a comparative study has been performed between multiple plane extraction methods.



A combined topological and statistical approach for interactive seg-mentation of 3D images
Ludovic Paulhac, Jean-Yves Ramel, Pascal Makris

This paper presents a new framework for an interactive segmentation of 3D images. The framework is based on a bimodal data structure defined by a region adjacency graph (RAG) that is associated with a hierarchical classification tree (HCT). The RAG provides information about the spatial and topological organisation of the extracted regions of the image. The HCT provides information about the similarities between the extracted regions of the image based on a predefined set of features. The first con-tribution of our work is the combination of a RAG and a HCT. An incre-mental system was obtained by defining operators that work with and on the RAG and the HCT. If a static predefined processing chain has been defined, these operators can be used in batch mode. If a scheduler is available, they can be used in an adaptive manner. Finally, if a user chooses the operator to be used after each step, the operators can be used interactively. The second contribution of this paper is the formal description of these operators. To give the user the ability to incremental-ly improve the segmentation, powerful visualisation of the segmentation state and interfaces have been proposed, an important advantage of the proposed framework. To validate the proposed framework, a user study has been conducted in a concrete case of texture segmentation. Our system obtains very satisfactory results even for complex volumetric tex-tures, and helps real users by providing high quality segmentations. The system has been tested by specialists in sonography to segment 3D ul-trasound images of the skin. Some examples of segmentation are pre-sented to illustrate the benefit of the interactivity provided by our ap-proach.



Geometric steerable medial maps
Sergio Vera, Debora Gil, Agnés Borràs, Marius George Linguraru, Mi-guel Angel González Ballester

To provide more intuitive and easily interpretable representations of com-plex shapes/organs, medial manifolds should reach a compromise be-tween simplicity in geometry and capability of restoring the anatomy/shape of the organ/volume. Existing morphological methods show excel-lent results when applied to 2D objects, but their quality drops across dimensions. This paper contributes to the computation of medial mani-folds from a theoretical and a practical point of view. First, we introduce a continuous operator for accurate and efficient computation of medial structures of arbitrary dimension. Second, we present a validation proto-col for assessing the suitability of medial surfaces for anatomical repre-sentation in medical applications. We evaluate quantitatively the perfor-mance of our method with respect to existing approaches and show its higher performance for medical imaging applications in terms of medial simplicity and capability of reconstructing the anatomical volume.



Towards a balanced trade-off between speed and accuracy in unsu-pervised data-driven image segmentation
Balázs Varga, Kristóf Karacs

When it comes to image segmentation in the megapixel domain, most state-of-the-art algorithms use sampling to reduce the amount of data to be processed to reach a lower running time. Random patterns and equi-distant sampling usually result in a suboptimal result because, in general, the distribution of image content is not homogeneous. The segmentation framework we propose in this paper, employs a content-adaptive tech-nique that samples homogeneous and inhomogeneous regions sparsely and densely, respectively, thus it preserves information content in a com-putationally efficient way. Both the sampling procedure and the pixel-cluster assignment are guided by the same nonlinear confidence value, calculated for each image pixel with no overhead, which describes the strength of the pixel-cluster bond. Building on this confidence scheme, each pixel is associated with the most similar class with respect to its spatial position and color. We compare the performance of our frame-work to other segmentation algorithms on publicly available segmenta-tion databases and using a set of 10-megapixel images, we show that it provides similar segmentation quality to a mean shift-based reference in an order of magnitude shorter time, the speedup being proportional to the amount of details in the input image. Based on our findings, we also sketch up novel design aspects to be taken into account when designing a high resolution evaluation framework.



Pi-Tag: a fast image-space marker design based on projective in-variants
Filippo Bergamasco, Andrea Albarelli, Andrea Torsello

Visual marker systems have become an ubiquitous tool to supply a refer-ence frame onto otherwise uncontrolled scenes. Throughout the last dec-ades, a wide range of different approaches have emerged, each with different strengths and limitations. Some tags are optimized to reach a high accuracy in the recovered camera pose, others are based on de-signs that aim to maximizing the detection speed or minimizing the effect of occlusion on the detection process. Most of them, however, employ a two-step procedure where an initial homography estimation is used to translate the marker from the image plane to an orthonormal world, where it is validated and recognized. In this paper, we present a general purpose fiducial marker system that performs both steps directly in im-age-space. Specifically, by exploiting projective invariants such as collin-earity and cross-ratios, we introduce a detection and recognition algo-rithm that is fast, accurate and moderately robust to occlusion. The over-all performance of the system is evaluated in an extensive experimental section, where a comparison with a well-known baseline technique is presented. Additionally, several real-world applications are proposed, ranging from camera calibration to projector-based augmented reality.



Mind reading with regularized multinomial logistic regression
Heikki Huttunen, Tapio Manninen, Jukka-Pekka Kauppi, Jussi Tohka

In this paper, we consider the problem of multinomial classification of magnetoencephalography (MEG) data. The proposed method participat-ed in the MEG mind reading competition of ICANN’11 conference, where the goal was to train a classifier for predicting the movie the test person was shown. Our approach was the best among ten submissions, reach-ing accuracy of 68 % of correct classifications in this five category prob-lem. The method is based on a regularized logistic regression model, whose efficient feature selection is critical for cases with more measure-ments than samples. Moreover, a special attention is paid to the estima-tion of the generalization error in order to avoid overfitting to the training data. Here, in addition to describing our competition entry in detail, we report selected additional experiments, which question the usefulness of complex feature extraction procedures and the basic frequency decom-position of MEG signal for this application.