Third Workshop on Geo-Spatial Computer Vision

Visual Analysis and Geo-Localization of Large-Scale Imagery (ECCV12, CVPR’13)
Computer Vision for Converging Perspectives (ICCV’13)
Vision from Satellite to Street (ICCV’15)

In conjunction with CVPR 2016, Las Vegas, Nevada July 1, 2016


CVPR logos


Call for Papers | Submission | Committees | Invited Talks | Awards | Program

Mission: Seeing the world from diverse perspectives provides us a unique opportunity to understand it better. Today, we live in a world with devices ranging from first-person vision systems (such as smart phones) to space-borne imaging platforms (such as satellites) sensing the world around us from wildly different perspectives and with diverse data modalities. In addition, advanced remote sensing technologies such as hyperspectral imaging and synthetic aperture radar can capture information beyond visible spectrum. The images captured from these different perspectives are complementary, and so analyzing them together provides novel solutions for understanding and describing the world better. The key to integrating these different perspectives is location.

The problem of visual analysis of satellite to street view imagery arises in a variety of real-world applications. Consumers may be interested in determining when and where an image was taken, who is in the image, what the different objects in the depicted scene are, and how they are related to each other. Local government agencies may be interested in using large-scale imagery to automatically obtain and index useful geographic and geological features and their distributions in a region of interest. Economic forecasters might be interested in how much business did a particular retail store conduct by counting cars in the parking lot over the course of the year. The military may want to know the location of terrorist camps or activities near restricted zones. Relief agencies may be interested in knowing the hardest hit areas after a natural disaster. Similarly, local businesses may utilize content statistics to target their marketing based on the ‘where’, ‘what’, and ‘when’ that may automatically be extracted during visual analysis of satellite to street view imagery.

Despite recent advances in computer vision and large-scale indexing techniques, fine-grained fusion of data with different views of the geo-location remains a challenging task. The problem involves identifying, extracting, and indexing geo-informative features, discovering subtle overlapping geo-location cues from wildly diverse visual data, geometric modeling and reasoning, context-based reasoning, and exploitation and indexing of large-scale aerial and ground. Theoretical foundations from computer graphics, vision, photogrammetry, and robotics can be useful assets in solving the problem. We feel that due to the growing availability of geo-referenced images and videos, the time is right to investigate the research challenges and opportunities involved with jointly analyzing images and videos captured from different devices and from wildly varying perspectives but pointing to the same 3D point in space. Combining this heterogeneous visual data could lead to improved data organization strategies, event understanding systems, and transformative solutions for computer vision challenges. The focus of this workshop therefore is to explore techniques that can exploit the rich data provided by converging perspectives - images captured by first-person cameras and aerial images delivered by various air/space-borne sensors.




The aim of this workshop is to bring together interested researchers from academia, government, and industry working in the field of computer vision, machine learning, pattern recognition, robotics, remote sensing, and earth observation to address the challenges involved in developing vision systems capable of assimilating image and video data from heterogeneous, multiscale and multi-perspective imaging platforms for actionable intelligence and scientific discoveries. The workshop will provide an interactive forum to engage in discussions, shape potential research directions, and disseminate recent research results. This workshop invites contributions in the form of original papers to the following areas:

  • Complex event understanding through visual data fusion
  • Spatiotemporal integration of visual observations
  • First-person vision meets aerial vision
  • Registration of social network data with street view images
  • Integrating remote and proximate sensing land use/cover map classification
  • Scene Reconstruction from Multi-Dimensional and Multi-View Imagery
  • Understanding and Modeling Uncertainties in Visual and Geospatial Data
  • Semantic Generalization of Visual and Geospatial Data
  • Representation, Indexing, Storage, and Analysis of City-to-Earth Scale Models
  • Automated 3D Modeling Pipelines for Complex Large-Scale Architectures
  • Integrated Processing of Point Clouds, Image, and Video Data
  • Multi-Modal Visual Sensor Data Fusion
  • Design and Development of Architectures that Support Real-Time and Parallel Execution of Algorithms for Earth-Scale Geo-Localization
  • Scene Change Detection and Segment Classification
  • Rendering, Overlay and Visualization of Models, Semantic Labels and Imagery
  • Applications of Visual Analysis and Geo-Localization of Large-Scale Imagery
  • Datasets/ Model Validation/Algorithm Testing/Annotation Techniques
  • Matching information derived from ground level images to satellite/aerial images, GIS data, and DEM data



Please download the CVPR template from here and submit it to

Important Dates

Paper submission:
Acceptance decision:
Camera-ready submission:
April 7, 2016
April 17, 2016
May 2, 2016



General Chairs

Luc Van Gool
Mubarak Shah
Richard Szeliski
ETH Zurich
University of Central Florida
Microsoft Research


Workshop Organizers

Asaad Hakeem
Marc Pollefeys
Amir Roshan Zamir
Anil Cheriyadat
Mei Han
Marco Körner
Shawn Newsam
Peter Reinartz
Jiangye Yuan
Decisive Analytics
ETH Zurich
Stanford University
Technische Universität München
University of California, Merced
German Aerospace Center (DLR)
Oak Ridge National Laboratory


Jana Kosecka
Torsten Sattler
Nathan Jacobs
Martial Hebert
John Leonard
Himaanshu Gupta
Raquel Urtasun
George Mason University
ETH Zurich
University of Kentucky
Carnegie Mellon University
Massachusetts Institute of Technology
Nokia Here
University of Toronto



Google best paper award
NVIDIA sponsored award



Start Time Paper/Talk Title/Author Author/Speaker/Affiliation
800 Welcome
815 Invited Speaker: Torsten Sattler ETH Zurich
900 Invited Speaker: Nathan Jacobs University of Kentucky
1000 Morning Break
1030 Detection of small objects, land cover mapping and modelling of uncertainty in urban remote sensing images using deep convolutional neural networks Michael Kampffmeyer, Arnt-Børre Salberg, Robert Jenssen
1055 Automatic Alignment of Indoor and Outdoor Building Models using 3d Line Segments Tobias Koch, Marco Körner, Friedrich Fraundorfer
1120 The TUM-DLR Multimodal Earth Observation Evaluation Benchmark Tobias Koch, Pablo d’Angelo, Franz Kurz, Friedrich Fraundorfer, Peter Reinartz, Marco Körner
1150 Invited Speaker: Raquel Urtasun
1235 Lunch (on your own) University of Toronto
1435 Invited Speaker: Himaanshu Gupta Nokia Here
1510 Invited Speaker: John Leonard Massachusetts Institute of Technology
1525 Afternoon Break
1600 Panel: Discussion
1700 Closing Remarks and Awards
1730 Dinner

CVPR logos