Real-world Anomaly Detection in Surveillance Videos
Center for Research in Computer Vision (CRCV), University of Central Florida (UCF)
Waqas Sultani, Chen Chen, Mubarak Shah, "Real-world Anomaly Detection in Surveillance Videos," arXiv:1801.04264 [cs.CV]
[ PDF ]
Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To
avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to
learn anomaly through the deep multiple instance ranking
framework by leveraging weakly labeled training videos,
i.e. the training labels (anomalous or normal) are at video-level instead of clip-level. In our approach, we consider
normal and anomalous videos as bags and video segments
as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts
high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness
constraints in the ranking loss function to better localize
anomaly during training.
We also introduce a new large-scale first of its kind
dataset of 128 hours of videos. It consists of 1900 long and
untrimmed real-world surveillance videos, with 13 realistic
anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be
used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in
another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL
method for anomaly detection achieves significant improvement on anomaly detection performance as compared to
the state-of-the-art approaches. We provide the results of
several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these
baselines reveals that our dataset is very challenging and
opens more opportunities for future work.
1. Problem & Motivation
One critical task in video surveillance is
detecting anomalous events such as traffic accidents, crimes
or illegal activities. Generally, anomalous events rarely occur as compared to normal activities. Therefore, to alleviate the waste of labor and time, developing intelligent computer vision algorithms for automatic video anomaly detection is a pressing need. The goal of a practical anomaly
detection system is to timely signal an activity that deviates
normal patterns and identify the time window of the occurring anomaly. Therefore, anomaly detection can be considered as coarse level video understanding, which filters out
anomalies from normal patterns. Once an anomaly is detected, it can further be categorized into one of the specific
activities using classification techniques.
In this work, we propose an anomaly detection
algorithm using weakly labeled training videos. That is we
only know the video-level labels, i.e. a video is normal or
contains anomaly somewhere, but we do not know where.
This is intriguing because we can easily annotate a large
number of videos by only assigning video-level labels. To
formulate a weakly-supervised learning approach, we resort
to multiple instance learning. Specifically, we
propose to learn anomaly through a deep MIL framework
by treating normal and anomalous surveillance videos as
bags and short segments/clips of each video as instances in
a bag. Based on training videos, we automatically learn an
anomaly ranking model that predicts high anomaly scores
for anomalous segments in a video. During testing, a longuntrimmed video is divided into segments and fed into our
deep network which assigns anomaly score for each video
segment such that an anomaly can be detected.
Our proposed approach (summarized in Figure 1) begins
with dividing surveillance videos into a fixed number of
segments during training. These segments make instances
in a bag. Using both positive (anomalous) and negative
(normal) bags, we train the anomaly detection model using
the proposed deep MIL ranking loss.
3. UCF-Crime Dataset
We construct a new large-scale dataset, called UCF-Crime, to evaluate our method. It consists
of long untrimmed surveillance videos which cover 13 realworld anomalies, including Abuse, Arrest, Arson, Assault,
Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies
are selected because they have a significant impact on public safety. We compare our dataset with previous anomaly
detection datasets in Table 1. For more details about the UCF-Crime dataset, please refer to our paper. A short description of each anomalous event is given below.
Abuse: This event contains videos which show bad, cruel or violent behavior against children, old people, animals, and women.
Burglary: This event contains videos that show people (thieves) entering into a building or house with the intention to commit theft. It does not include use of force against people.
Robbery: This event contains videos showing thieves taking money unlawfully by force or threat of force. These videos do not include shootings.
Stealing: This event contains videos showing people taking property or money without permission. They do not include shoplifting.
Shooting: This event contains videos showing act of shooting someone with a gun.
Shoplifting: This event contains videos showing people stealing goods from a shop while posing as a shopper.
Assault: This event contains videos showing a sudden or violent physical attack on someone. Note that in these videos the person who is assaulted does not fight back.
Fighting: This event contains videos displaying two are more people attacking one another.
Arson: This event contains videos showing people deliberately setting fire to property.
Explosion: This event contains videos showing destructive event of something blowing apart. This event does not include videos where a person intentionally sets a fire or sets off an explosion.
Arrest: This event contains videos showing police arresting individuals.
Road Accident: This event contains videos showing traffic accidents involving vehicles, pedestrians or cyclists.
Vandalism: This event contains videos showing action involving deliberate destruction of or damage to public or private property. The term includes property damage, such as graffiti and defacement directed towards any property without permission of the owner.
Normal Event: This event contains videos where no crime occurred. These videos include both indoor (such as a shopping mall) and outdoor scenes as well as day and night-time scenes.
Video examples of each anomalous event from the UCF-Crime dataset
 Unusual crowd activity dataset of university of minnesota. http://mha.cs.umn.edu/movies/crowdactivity-all.avi.
 A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. Robust real-time unusual event detection using multiple fixedlocation monitors. TPAMI, 2008.
 M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury,
and L. S. Davis. Learning temporal regularity in video sequences. In CVPR, June 2016.
 W. Li, V. Mahadevan, and N. Vasconcelos. Anomaly detection and localization in crowded scenes. TPAMI, 2014.
 C. Lu, J. Shi, and J. Jia. Abnormal event detection at 150 fps
in matlab. In ICCV, 2013.
 D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri.
Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
This dataset is temporarily unavailable due to server issues. We will have it available again as soon as possible. We are sorry for the inconvenience.
- Temporal Annotation for Testing Videos (anomaly detection task): Temporal Annotation for Testing Videos
- UCF_Crimes folder contains following 3 subfolders:
- This folder contains training and testing partitions for anomaly detection experiments, i.e., Anomaly_Test.txt and Anomaly_Train.txt
2. Action_Recognition _splits
- This folder contains four training and testing partitions for action recognition experiments.
- The experimental results reported in the paper are done using four-fold cross-validation. In order to report the results, please use all four partitions.
This folder contains the complete dataset. It has 16 subfolders. The names of folders are self-explanatory.
- 13 folders correspond to each of the anomaly.
- Normal_Videos_event corresponds to normal videos for action detection experiments.
- Testing_Normal_Videos_Anomaly contains normal testing videos for anomaly detection experiment
- Training_Normal_Videos_Anomaly contains normal training videos for training a network for anomaly detection experiments.