Center for Research in Comptuer Vision
Center for Research in Comptuer Vision



MVA

Volume 25, Issue 2


This issue features the following papers.



Rapid blockwise multi-resolution clustering of facial images for intelligent watermarking
Population-based evolutionary computation (EC) is widely used to optimize embedding parameters in intelligent watermarking systems. Candidate solutions generated with these techniques allow finding optimal embedding parameters of all blocks of a cover image. However, using EC techniques for full optimization of a stream of high-resolution grayscale face images is very costly. In this paper, a blockwise multiresolution clustering (BMRC) framework is proposed to reduce this cost. During training phase, solutions obtained from multi-objective optimization of reference face images are stored in an associative memory. During generalization operations, embedding parameters of an input image are determined by searching for previously stored solutions of similar sub-problems in memory, thereby eliminating the need for full optimization for the whole face image. Solutions for sub-problems correspond to the most common embedding parameters for a cluster of similar blocks in the texture feature space. BMRC identifies candidate block clusters used for embedding watermark bits using the robustness score metric. It measures the texture complexity of image block clusters and can thereby handle watermarks of different lengths. The proposed framework implements a multi-hypothesis approach by storing the optimization solutions according to different clustering resolutions and selecting the optimal resolution at the end of the watermarking process. Experimental results on the PUT face image database show a significant reduction in complexity up to 95.5 % reduction in fitness evaluations compared with reference methods for a stream of 198 face images.



3D segmentation of abdominal CT imagery with graphical models, conditional random fields and learning
Chetan Bhole, Christopher Pal, David Rim, Axel Wismüller

Probabilistic graphical models have had a tremendous impact in machine learning and approaches based on energy function minimization via techniques such as graph cuts are now widely used in image segmentation. However, the free parameters in energy functionbased segmentation techniques are often set by hand or using heuristic techniques. In this paper, we explore parameter learning in detail. We show how probabilistic graphical models can be used for segmentation problems to illustrate Markov random fields (MRFs), their discriminative counterparts conditional random fields (CRFs) as well as kernel CRFs. We discuss the relationships between energy function formulations, MRFs, CRFs, hybrids based on graphical models and their relationships to key techniques for inference and learning. We then explore a series of novel 3D graphical models and present a series of detailed experiments comparing and contrasting different approaches for the complete volumetric segmentation of multiple organs within computed tomography imagery of the abdominal region. Further, we show how these modeling techniques can be combined with state of the art image features based on histograms of oriented gradients to increase segmentation performance. We explore a wide variety of modeling choices, discuss the importance and relationships between inference and learning techniques and present experiments using different levels of user interaction. We go on to explore a novel approach to the challenging and important problem of adrenal gland segmentation. We present a 3D CRF formulation and compare with a novel 3D sparse kernel CRF approach we call a relevance vector random field. The method yields state of the art performance and avoids the need to discretize or cluster input features. We believe our work is the first to provide quantitative comparisons between traditional MRFs with edge-modulated interaction potentials and CRFs for multi-organ abdominal segmentation and the first to explore the 3D adrenal gland segmentation problem. Finally, along with this paper we provide the labeled data used for our experiments to the community.



Hyperspectral imaging based on diffused laser light for prediction of astaxanthin coating concentration
Martin Georg Ljungqvist, Otto Højager Attermann Nielsen, Stina Frosch, Michael Engelbrecht Nielsen, Line Harder Clemmensen, Bjarne Kjær Ersbøll

We present a study on predicting the concentration level of synthetic astaxanthin in fish feed pellet coating using multi- and hyperspectral image analysis. This was done in parallel using two different vision systems. A new instrument for hyperspectral imaging, the SuperK setup, using a super-continuum laser as the light source was introduced. Furthermore, a parallel study with the commercially available multispectral VideometerLab imaging system was performed. The SuperK setup used 113 spectral bands (455-1,015 nm), and the VideometerLab used 20 spectral bands (385-1,050 nm). To predict the astaxanthin concentration from the spectral image data, the synthetic astaxanthin content in the pellets was measured with the established standard technique; high-pressure liquid chromatography (HPLC). Regression analysis was done using partial least squares regression (PLSR) and the sparse regression method elastic net (EN). The ratio of standard error of prediction (RPD) is the ratio between the standard deviation of the reference values and the prediction error, and for both PLSR and EN both devices gave RPD values between 4 and 24, and with mean prediction error of 1.4-8.0 parts per million of astaxanthin concentration. The results show that it is possible to predict the synthetic astaxanthin concentration in the coating well enough for quality control using both multi- and hyperspectral image analysis, while the SuperK setup performs with higher accuracy than the VideometerLab device for this particular problem. The spectral resolution made it possible to identify the most significant spectral regions for detection of astaxanthin. The results also imply that the presented methods can be used in general for quality inspection of various coating substances using similar coating methods.



On hierarchical modelling of motion for workflow analysis from overhead view
Banafshe Arbab-Zavar, John N. Carter, Mark S. Nixon

Understanding human behaviour is a high level perceptual problem, one which is often dominated by the contextual knowledge of the environment, and where concerns such as occlusion, scene clutter and high within-class variations are commonplace. Nonetheless, such understanding is highly desirable for automated visual surveillance. We consider this problem in a context of a workflow analysis within an industrial environment. The hierarchical nature of the workflow is exploited to split the problem into 'activity' and 'task' recognition. In this, sequences of low level activities are examined for instances of a task while the remainder are labelled as background. An initial prediction of activity is obtained using shape and motion based features of the moving blob of interest. A sequence of these activities is further adjusted by a probabilistic analysis of transitions between activities using hidden Markov models (HMMs). In task detection, HMMs are arranged to handle the activities within each task. Two separate HMMs for task and background compete for an incoming sequence of activities. Imagery derived from a camera mounted overhead the target scene has been chosen over the more conventional oblique views (from the side) as this view does not suffer from as much occlusion, and it poses a manageable detection and tracking problem while still retaining powerful cues as to the workflow patterns. We evaluate our approach both in activity and task detection on a challenging dataset of surveillance of human operators in a car manufacturing plant. The experimental results show that our hierarchical approach can automatically segment the timeline and spatially localize a series of predefined tasks that are performed to complete a workflow.



New color GPHOG descriptors for object and scene image classification
Atreyee Sinha, Sugata Banerji, Chengjun Liu

This paper presents a novel set of image descriptors that encodes information from color, shape, spatial and local features of an image to improve upon the popular Pyramid of Histograms of Oriented Gradients (PHOG) descriptor for object and scene image classification. In particular, a new Gabor-PHOG (GPHOG) image descriptor created by enhancing the local features of an image using multiple Gabor filters is first introduced for feature extraction. Second, a comparative assessment of the classification performance of the GPHOG descriptor is made in grayscale and six different color spaces to further propose two novel color GPHOG descriptors that perform well on different object and scene image categories. Finally, an innovative Fused Color GPHOG (FC-GPHOG) descriptor is presented by integrating the Principal Component Analysis (PCA) features of the GPHOG descriptors in the six color spaces to combine color, shape and local feature information. Feature extraction for the proposed descriptors employs PCA and Enhanced Fisher Model (EFM), and the nearest neighbor rule is used for final classification. Experimental results using the MIT Scene dataset and the Caltech 256 object categories dataset show that the proposed new FC-GPHOG descriptor achieves a classification performance better than or comparable to other popular image descriptors, such as the Scale Invariant Feature Transform (SIFT) based Pyramid Histograms of visual Words descriptor, Color SIFT four Concentric Circles, Spatial Envelope, and Local Binary Patterns.



Defect identification on specular machined surfaces
Ken Sills, Gary M. Bone, David Capson

In many industrial applications, it is important to identify defects on specular surfaces. On machined surfaces, defect identification may be further complicated by the presence of marks from a machining process. These marks may dramatically and unpredictably change the appearance of the surface, while not altering its ability to function. To differentiate between surface characteristics that constitute a defect and those that do not, we propose a system that directly illuminates specular machined surfaces with a programmable array of high-power lightemitting diodes that allows the angle of the incident light to be varied over a series of images. A reflection model is used to predict the reflected intensity as a function of incident lighting angle for each point on the imaged surface. A surface defect causes the observed reflected intensity as a function of incident lighting angle to differ from that predicted by the reflection model. Such differences between the observations and the reflection model are shown to identify surface defects such as porosities, dents and scratches in the presence of marks from the machining process.



Intrinsic and extrinsic active self-calibration of multi-camera systems
Marcel Brückner, Ferid Bajramovic, Joachim Denzler

We present a method for active self-calibration of multi-camera systems consisting of pan-tilt zoom cameras. The main focus of this work is on extrinsic self-calibration using active camera control. Our novel probabilistic approach avoids multi-image point correspondences as far as possible. This allows an implicit treatment of ambiguities. The relative poses are optimized by actively rotating and zooming each camera pair in a way that significantly simplifies the problem of extracting correct point correspondences. In a final step we calibrate the entire system using a minimal number of relative poses. The selection of relative poses is based on their uncertainty. We exploit active camera control to estimate consistent translation scales for triplets of cameras. This allows us to estimate missing relative poses in the camera triplets. In addition to this active extrinsic self-calibration we present an extended method for the rotational intrinsic self-calibration of a camera that exploits the rotation knowledge provided by the camera's pan-tilt unit to robustly estimate the intrinsic camera parameters for different zoom steps as well as the rotation between pan-tilt unit and camera. Quantitative experiments on real data demonstrate the robustness and high accuracy of our approach. We achieve a median reprojection error of 0.95 pixel.



Harmony search-based hybrid stable adaptive fuzzy tracking controllers for vision-based mobile robot navigation
Kaushik Das Sharma, Amitava Chatterjee, Anjan Rakshit

In this paper the harmony search (HS) algorithm and Lyapunov theory are hybridized together to design a stable adaptive fuzzy tracking control strategy for vision-based navigation of autonomous mobile robots. The proposed variant of HS algorithm, with complete dynamic harmony memory (named here as DyHS algorithm), is utilized to design two selfadaptive fuzzy controllers, for x -direction and y -direction movements of a mobile robot. These fuzzy controllers are optimized, both in their structures and free parameters, such that they can guarantee desired stability and simultaneously they can provide satisfactory tracking performance for the vision-based navigation of mobile robots. In addition, the concurrent and preferential combinations of global-search capability, utilizing DyHS algorithm, and Lyapunov theory-based local search method, are employed simultaneously to provide a high degree of automation in the controller design process. The proposed schemes have been implemented in both simulation and real-life experiments. The results demonstrate the usefulness of the proposed design strategy and shows overall comparable performances, when compared with two other competing stochastic optimization algorithms, namely, genetic algorithm and particle swarm optimization.



Improving three-dimensional point reconstruction from image correspondences using surface curvatures
Chin-Hung Teng

Recovering three-dimensional (3D) points from image correspondences is an important and fundamental task in computer vision. Traditionally, the task is completed by triangulation whose accuracy has its limitation in some applications. In this paper, we present a framework that incorporates surface characteristics such as Gaussian and mean curvatures into 3D point reconstruction to enhance the reconstruction accuracy. A Gaussian and mean curvature estimation scheme suitable to the proposed framework is also introduced in this paper. Based on this estimation scheme and the proposed framework, the 3D point recovery from image correspondences is formulated as an optimization problem with the surface curvatures modeled as soft constraints. To analyze the performance of proposed 3D reconstruction approach, we generated some synthetic data, including the points on the surfaces of a plane, a cylinder and a sphere, to test the approach. The experimental results demonstrated that the proposed framework can indeed improve the accuracy of 3D point reconstruction. Some real-image data were also tested and the results also confirm this point.



Hybrid model of clustering and kernel autoassociator for reliable vehicle type classification
Bailing Zhang, Yifan Zhou, Hao Pan, Tammam Tillo

Automatic vehicle classification is an important area of research for intelligent transportation, traffic surveillance and security. A working image-based vehicle classification system is proposed in this paper. The first component vehicle detection is implemented by applying histogram of oriented gradient features and SVM classifier. The second component vehicle classification, which is the emphasis of this paper, is accomplished by a hybrid model composed of clustering and kernel autoassociator (KAA). The KAA model is a generalization of autoassociative networks by training to recall the inputs through kernel subspace. As an effective one-class classification strategy, KAA has been proposed to implement classification with rejection, showing balanced error-rejection trade-off. With a large number of training samples, however, the training of KAA becomes problematic due to the difficulties involved with directly creating the kernel matrix. As a solution, a hybrid model consisting of self-organizing map (SOM) and KAM has been proposed to first acquire prototypes and then construct the KAA model, which has been proven efficient in internet intrusion detection. The hybrid model is further studied in this paper, with several clustering algorithms compared, including k-mean clustering, SOM and Neural Gas. Experimental results using more than 2,500 images from four types of vehicles (bus, light truck, car and van) demonstrated the effectiveness of the hybrid model. The proposed scheme offers a performance of accuracy over 95 % with a rejection rate 8 % and reliability over 98 % with a rejection rate of 20 % . This exhibits promising potentials for realworld applications.



Accurate and robust localization of duplicated region in copy-move image forgery
Maryam Jaberi, George Bebis, Muhammad Hussain, Ghulam Muhammad

Copy-move image forgery detection has recently become a very active research topic in blind image forensics. In copy-move image forgery, a region from some image location is copied and pasted to a different location of the same image. Typically, post-processing is applied to better hide the forgery. Using keypoint-based features, such as SIFT features, for detecting copy-move image forgeries has produced promising results. The main idea is detecting duplicated regions in an image by exploiting the similarity between keypoint-based features in these regions. In this paper, we have adopted keypoint-based features for copy-move image forgery detection; however, our emphasis is on accurate and robust localization of duplicated regions. In this context, we are interested in estimating the transformation (e.g., affine) between the copied and pasted regions more accurately as well as extracting these regions as robustly by reducing the number of false positives and negatives. To address these issues, we propose using a more powerful set of keypoint-based features, called MIFT, which shares the properties of SIFT features but also are invariant to mirror reflection transformations. Moreover, we propose refining the affine transformation using an iterative scheme which improves the estimation of the affine transformation parameters by incrementally finding additional keypoint matches. To reduce false positives and negatives when extracting the copied and pasted regions, we propose using "dense" MIFT features, instead of standard pixel correlation, along with hysteresis thresholding and morphological operations. The proposed approach has been evaluated and compared with competitive approaches through a comprehensive set of experiments using a large dataset of real images (i.e., CASIA v2.0). Our results indicate that our method can detect duplicated regions in copy-move image forgery with higher accuracy, especially when the size of the duplicated region is small.



Generation of new points for training set and feature-level fusion in multimodal biometric identification
Dhiman Karmakar, C. A. Murthy

Multimodal biometrics has gained interest in the recent past due to its improved recognition rate over unibiometric and unimodal systems. Fusion at feature level is considered here for the purpose of recognition. The biometrics considered for fusion are face and iris. Here, new face images along with iris images are generated, and they are included in the training set. Feature-level fusion is incorporated. The recognition rates of the classification algorithm thus obtained are statistically found to be significantly better than the existing feature-level fusion and classification techniques.



Structured light self-calibration with vanishing points
Radu Orghidan, Joaquim Salvi, Mihaela Gordan, Camelia Florea, Joan Batlle

This paper introduces the vanishing points to self-calibrate a structured light system. The vanishing points permit to automatically remove the projector's keystone effect and then to self-calibrate the projector- camera system. The calibration object is a simple planar surface such as a white paper. Complex patterns and 3D calibrated objects are not required any more. The technique is compared to classic calibration and validated with experimental results.



Automatic image segmentation and classification based on direction texton technique for hemolytic anemia in thin blood smears
Hung-Ming Chen, Ya-Ting Tsao, Shin-Ni Tsai

This paper proposes an automatic method for cell segmentation and classification of erythrocytes in thin blood smears with hemolytic anemia. First, to remove the background and noises in the blood images, the proposed method detects a series of changes on the edges and analyzes the edge changes by using the 8-connection chain codes technique to recognize isolated erythrocytes. For segmenting the overlapping erythrocytes, the 8-connection chain codes technique obtains the edge direction of the cells to effectively figure out the points of high concavity. Then, the adapted high concavity information is used to separate overlapping erythrocytes and to extract features from each segmented erythrocyte. After segmenting, all the erythrocytes can be treated equally and the differences between adjacent chain codes of each erythrocyte can be calculated. Furthermore, the proposed method extracts the variation of eight directions from each individual erythrocyte as their features for classifying into four main hemolytic anemia types. Finally, classification process identifies abnormal erythrocytes and the types of hemolytic anemia by using a trained bank of classifiers, utilizing the proposed method to calculate the quantity of erythrocytes and recognize the types of hemolytic anemia effectively.



Multiple human tracking system for unpredictable trajectories
B. Cancela, M. Ortega, M. G. Penedo

Tracking multiple objects into a scene is one of the most active research topics in computer vision. The art of identifying each target within the scene along a video sequence has multiple issues to be solved, being collision and occlusion events among the most challenging ones. Because of this, when dealing with human detection, it is often very difficult to obtain a full body image, which introduces complexity in the process. The task becomes even more difficult when dealing with unpredictable trajectories, like in sport environments. Thus, headshoulder omega shape becomes a powerful tool to perform the human detection. Most of the contributions to this field involve a detection technique followed by a tracking system based on the omega-shape features. Based on these works, we present a novel methodology for providing a full tracking system. Different techniques are combined to both detect, track and recover target identifications under unpredictable trajectories, such as sport events. Experimental results into challenging sport scenes show the performance and accuracy of this technique. Also, the system speed opens the door for obtaining a real-time system using GPU programing in standard desktop machines, being able to be used in higher-level human behavioral systems, with multiple applications.



A nonlocal energy minimization approach to brain image segmentation with simultaneous bias field estimation and denoising
Zengsi Chen, Jinwei Wang, Dexing Kong, Fangfang Dong

Image segmentation plays an important role in medical image analysis. The most widely used image segmentation algorithms, region-based methods that typically rely on the homogeneity of image intensities in the regions of interest, often fail to provide accurate segmentation results due to the existence of bias field, heavy noise and rich structures. In this paper, we incorporate nonlocal regularization mechanism in the coherent local intensity clustering formulation for brain image segmentation with simultaneously estimating bias field and denoising, specially preserving good structures. We define an energy functional with a local data fitting term, two nonlocal regularization terms for both image and membership functions, and a L2 image fidelity term. By minimizing the energy, we get good segmentation results with well preserved structures. Meanwhile, the bias estimation and noise reduction can also be achieved. Experiments performed on synthetic and clinical brain magnetic resonance imaging data and comparisons with other methods are given to demonstrate that by introducing the nonlocal regularization mechanism, we can get more regularized segmentation results.