Center for Research in Comptuer Vision
Center for Research in Comptuer Vision

Data Sets

UCF-QNRF - A Large Crowd Counting Data Set

Automatic counting and localizing in dense crowd scenes has significant importance from socio-political and safety perspective. Crowds gather around the world in a variety of scenarios and counting the number of participants is often an important matter of concern for the organizers and the law enforcement agencies.

Figure 1: Six images from our dataset

We introduce the largest dataset to-date (in terms of number of annotations) for training and evaluating crowd counting and localization methods. It contains 1535 images which are divided into train and test sets of 1201 and 334 images respectively. Our dataset is most suitable for training very deep Convolutional Neural Networks (CNNs) since it contains order of magnitude more annotated humans in dense crowd scenes than any other available crowd counting dataset. Summary of our dataset statistics and comparison with others is presented in Table 1 while Figure 1 shows six images randomly selected from our dataset.

Table 1: Comparison of dataset statistics
Dataset Number of
Number of
Average Count Maximum Count Average Resolution Average Density
UCF_CC_50 50 63,974 1279 4633 2101 x 2888 2.02 x 10^-4
WorldExpo10 3980 225,216 56 334 576 x 720 1.36 x 10^-4
ShanghaiTech_PartA 482 241,677 501 3139 589 x 868 9.33 x 10^-4
UCF-QNRF 1535 1,251,642 815 12865 2013 x 2902 1.12 x 10^-4

The UCF-QNRF dataset has the most number of high-count crowd images and annotations, and a wider variety of scenes containing the most diverse set of viewpoints, densities and lighting variations. The resolution is large compared to WorldExpo10 and ShanghaiTech. The average density, i.e., the number of people per pixel over all images is also the lowest, signifying high-quality large images. Lower per-pixel density is partly due to inclusion of background regions, where there are many high-density regions as well as zero-density regions. Part A of Shanghai dataset has high-count crowd images as well, however, they are severely cropped to contain crowds only. On the other hand, the new UCF-QNRF dataset contains buildings, vegetation, sky and roads as they are present in realistic scenarios captured in the wild. This makes this dataset more realistic as well as difficult.

Moreover, since we collected our dataset from the web and not from surveillance camera videos or simulated crowd scenes, it is very diverse in terms of prepectivity, image resolution, crowd density and the scenarios which a crowd exist. We also took special care to ensure that images in the dataset come from all parts of the world. Figure 2 shows the geo-tags of images in our dataset, marked on the world map.

Figure 2: Locations of images in our dataset

Similarly, Figure 3(a) shows the diversity in counts among the datasets. The distribution of dataset is similar to UCF_CC_50, however, the new dataset is 30 and 20 times larger in terms of number of images and annotations, respectively, compared to UCF_CC_50. Furthermore, the resolution is large compared to WorldExpo10 and ShanghaiTech, as can be seen in Figure 3(b). We hope the new dataset will significantly increase research activity in visual crowd analysis and will pave way for building deployable practical counting and localization systems for dense crowds.

Figure 3: Count distribution in our dataset

The data set can be downloaded by clicking here.

If you happen to use the data set, please refer to the following paper:

H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maddeed, N. Rajpoot, M. Shah, Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds, in Proceedings of IEEE European Conference on Computer Vision (ECCV 2018), Munich, Germany, September 8-14, 2018.