Scene Labeling Using Sparse Precision Matrix
Semantic image segmentation, assigning a label from
many classes to each pixel of an image, is a challenging
task in computer vision, due to the effort needed to segment
and recognize the image simultaneously. Commonly used approaches ignore to incorporate long-range connections
and model the contextual relationships among labels. In our method we aim to model interactions between labels and segments as an energy minimization over a Graph,
Whose structure is captured by Inverse of Covariance matrix (Precision Matrix) and encodes only significant interactions. We use local and global information of a scene to improve the results.
(a) shows a query
image, (b) shows the human annotated image (ground truth), (c)
shows labels obtained by classifiers, (d) shows labels via spatial
smoothing and (e) shows our results
Our approach consists of two main steps. The first step
consists of off-the-shelf parts including feature extraction
and classifier training based on local features of the sample
training images. Also, in this phase using the training data,
we capture the structure of semantic label interactions graph
to be later employed in the pair-wise cost computation. In
the second step, which is the inference, for a given query
image, using scores computed by the classifiers for each
possible label, and the pair-wise costs obtained by label correlations
and appearance features of the image, the MAP inference
in CRF framework is applied and each super-pixel
is assigned a label. An overview of proposed approach is depicted in the following figure.
We begin by extracting the feature matrix, and segmenting the image into super-pixels. Then
classifiers (random forest) are trained. We detect the relations between labels using the sparse estimated partial correlation matrix of
training data. In the inference part, for a given image the label scores are obtained via the classifiers (random forest and nearest neighbor),
then the energy function of a sparse graphical model on super-pixels is optimized to label each super-pixel.
In training, first we segment images using efficient graph
based segmentation. Next, for each super-pixel, local
features, including SIFT, color histogram, mean and standard
derivation of color, area and texture, are extracted.
Given these local features, classifiers (random forest) are
trained to label super-pixels using their local features. Also,
in training phase we build the sparse precision matrix based
on the sample data to highlight the important relations (positive
or negative correlations) between labels. In testing, for
a query image we find the unary terms, for its segments,
using scores from local classifiers refined with the probabilities
obtained from a retrieval set based on global features.
Then, we use a fast implementation of graphical lasso to
find the structure of the dependency graph between superpixels
and assign weights to edges based on correlation values.
The following graph depicts label graph for SIFTFlow dataset.
LEFT obtained using an empirical inverse of covariance matrix and RIGHT obtained by the sparse partial correlation matrix.
As it is clear, more relevant relations are maintained and irrelevant edges are removed.
We evaluate our method on three benchmark datasets.
The first dataset is Stanford-background, which has 8
classes and 715 images. The second dataset that we assess our approach with is
SIFTflow dataset, which consists of 2,488 training images and 200 testing images from 33 classes collected from
LabelMe. We also applied our method on third dataset, MSRCV2, which has 591 images of 23 classes. We use the
provided split, 276 images in training and 255 images.
Some qualitative results are shown in in the following figure:
From left to right: examples of images, ground truth, spatial smoothing results and our results.
The following tables present the results of accuracy of our method and the state of the art methods on the benchmark datasets.
Nasim Souly, Mubarak Shah, Scene Labeling Using Sparse Precision Matrix
Back to Object Detection and Scene Segmentation Projects