Center for Research in Comptuer Vision
Center for Research in Comptuer Vision

Real-world Anomaly Detection in Surveillance Videos

Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at video-level instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training.

We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work.

Problem & Motivation

One critical task in video surveillance is detecting anomalous events such as traffic accidents, crimes or illegal activities. Generally, anomalous events rarely occur as compared to normal activities. Therefore, to alleviate the waste of labor and time, developing intelligent computer vision algorithms for automatic video anomaly detection is a pressing need. The goal of a practical anomaly detection system is to timely signal an activity that deviates normal patterns and identify the time window of the occurring anomaly. Therefore, anomaly detection can be considered as coarse level video understanding, which filters out anomalies from normal patterns. Once an anomaly is detected, it can further be categorized into one of the specific activities using classification techniques.

In this work, we propose an anomaly detection algorithm using weakly labeled training videos. That is we only know the video-level labels, i.e. a video is normal or contains anomaly somewhere, but we do not know where. This is intriguing because we can easily annotate a large number of videos by only assigning video-level labels. To formulate a weakly-supervised learning approach, we resort to multiple instance learning. Specifically, we propose to learn anomaly through a deep MIL framework by treating normal and anomalous surveillance videos as bags and short segments/clips of each video as instances in a bag. Based on training videos, we automatically learn an anomaly ranking model that predicts high anomaly scores for anomalous segments in a video. During testing, a longuntrimmed video is divided into segments and fed into our deep network which assigns anomaly score for each video segment such that an anomaly can be detected.


Our proposed approach (summarized in Figure 1) begins with dividing surveillance videos into a fixed number of segments during training. These segments make instances in a bag. Using both positive (anomalous) and negative (normal) bags, we train the anomaly detection model using the proposed deep MIL ranking loss.

UCF-Crime Dataset

We construct a new large-scale dataset, called UCF-Crime, to evaluate our method. It consists of long untrimmed surveillance videos which cover 13 realworld anomalies, including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety. We compare our dataset with previous anomaly detection datasets in Table 1. For more details about the UCF-Crime dataset, please refer to our paper. A short description of each anomalous event is given below.
Abuse: This event contains videos which show bad, cruel or violent behavior against children, old people, animals, and women.
Burglary: This event contains videos that show people (thieves) entering into a building or house with the intention to commit theft. It does not include use of force against people.
Robbery: This event contains videos showing thieves taking money unlawfully by force or threat of force. These videos do not include shootings.
Stealing: This event contains videos showing people taking property or money without permission. They do not include shoplifting.
Shooting: This event contains videos showing act of shooting someone with a gun.
Shoplifting: This event contains videos showing people stealing goods from a shop while posing as a shopper.
Assault: This event contains videos showing a sudden or violent physical attack on someone. Note that in these videos the person who is assaulted does not fight back.
Fighting: This event contains videos displaying two are more people attacking one another.
Arson: This event contains videos showing people deliberately setting fire to property.
Explosion: This event contains videos showing destructive event of something blowing apart. This event does not include videos where a person intentionally sets a fire or sets off an explosion.
Arrest: This event contains videos showing police arresting individuals.
Road Accident: This event contains videos showing traffic accidents involving vehicles, pedestrians or cyclists.
Vandalism: This event contains videos showing action involving deliberate destruction of or damage to public or private property. The term includes property damage, such as graffiti and defacement directed towards any property without permission of the owner.
Normal Event: This event contains videos where no crime occurred. These videos include both indoor (such as a shopping mall) and outdoor scenes as well as day and night-time scenes.

Video examples of each anomalous event from the UCF-Crime dataset
Abuse Arrest Arson
Assault Burglary Explosion
Fighting Road accident Robbery
Shooting Shoplifting Stealing
Vandalism Normal Event


[2] Unusual crowd activity dataset of university of minnesota.
[3] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. Robust real-time unusual event detection using multiple fixedlocation monitors. TPAMI, 2008.
[18] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis. Learning temporal regularity in video sequences. In CVPR, June 2016.
[27] W. Li, V. Mahadevan, and N. Vasconcelos. Anomaly detection and localization in crowded scenes. TPAMI, 2014.
[28] C. Lu, J. Shi, and J. Jia. Abnormal event detection at 150 fps in matlab. In ICCV, 2013.
[36] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.


Related Publication

Waqas Sultani, Chen Chen, Mubarak Shah, Real-world Anomaly Detection in Surveillance Videos, Cornell University Library, arXiv:1801.04264 [cs.CV], [v1] Fri, 12 Jan 2018.

Back to Human Action and Activity Recognition Projects