SpaceNet Turns Four

Aug. 10, 2020

Ryan Lewis

Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e. building footprint and road network detection). SpaceNet is solely managed by co-founder, In-Q-Tel CosmiQ Works, in collaboration with co-founder and co-chair, Maxar Technologies, and the other partners: Amazon Web Services (AWS), Capella Space, Topcoder, Institute of Electrical and Electronics Engineers (IEEE) Geoscience and Remote Sensing Society (GRSS), the National Geospatial-Intelligence Agency (NGA) and Planet.

Introduction: Crossing a Milestone

It can be difficult to recall the details of the past few days, much less the past few years, given our current socially distanced and quarantined lives. In April, which itself seems rather long ago, The Atlantic likened our new daily, work-from-home routines to a bad version of the 1993 cult classic movie Groundhog Day (1). Despite this challenge, it is worth recalling the summer of 2016 for the sake of this conversation. Netflix launched a new series called Stranger Things, the 2016 Summer Olympics were held in Rio de Janeiro (on schedule) and the world continued its battle against another virus: Zika. It was amidst this different but not necessarily simpler time that IQT’s CosmiQ Works (“CosmiQ”) and Maxar Technologies (then DigitalGlobe) launched SpaceNet, a collaborative initiative designed to encourage the development of open-source, machine learning (ML) techniques for geospatial applications. Although the initial project plan was to release a single, labeled, satellite imagery dataset and host a public data science competition featuring that data, the collaboration has grown significantly through the steady addition of new partner organizations, labeled datasets, public challenges, open source algorithms and detailed evaluations. As SpaceNet officially turns four years old this month, it seems only fitting to use this milestone as an opportunity to reflect upon our progress to date, discuss several lessons learned and consider the potential paths ahead.

Enabling Research Through Partnership

As discussed frequently throughout this blog publication and on our podcast series, Training_Data, we believe rapidly maturing ML technologies, specifically computer vision, will fundamentally disrupt current geospatial analytics products and services. Yet a majority of the initial advancements in open-source computer vision technologies have been built using labeled common photograph datasets such as ImageNet and Common Objects in Context (COCO). The differences inherent in remote sensing data types such as satellite and aerial imagery present unique challenges for leveraging the latest computer vision techniques. SpaceNet was founded to address this gap by accelerating the development of open-source ML capabilities for geospatial applications, specifically the foundational mapping mission. We have relied upon forming partnerships with leading commercial, nonprofit, academic and government organizations in order to gain access to datasets and expertise, as well as build awareness in both the geospatial and computer vision communities. In 2018, we formed SpaceNet into an official nonprofit LLC, solely managed by CosmiQ, to better manage the growth in partnerships and community engagement. There are currently eight 2019–2020 SpaceNet Partners: CosmiQ, Maxar Technologies, AWS, Capella Space, Topcoder, the IEEE GRSS, National Geospatial-Intelligence Agency (NGA), and Planet. Each partner serves on the SpaceNet Advisory Council, which is responsible for designing and implementing datasets and challenges. The Advisory Council is co-chaired by CosmiQ and Maxar.

A comprehensive timeline of SpaceNet’s activities over the years.

Four Pillar Strategy: Building an Ecosystem

SpaceNet’s strategy for developing and maintaining an open-source geospatial analytics ecosystem has four pillars: (1) develop and publish highly curated, labeled remote sensing datasets; (2) design and host public data science challenges targeting a specific foundational mapping problem; (3) open source release of the leading algorithms from each challenge; and (4) conduct detailed evaluations of ML approaches to difficult geospatial problems. This approach has remained largely unchanged since SpaceNet’s founding because the combined output provides practitioners and researchers alike a comprehensive resource spanning the entire supervised ML model lifecycle.

SpaceNet has focused on four strategic pillars since its founding in 2016.

Pillar 1: Data

The single most important SpaceNet pillar is the development and release of highly curated, labeled remote sensing datasets. The lack of high-quality, permissively licensed training data remains the largest single barrier to entry to end users interested in building and deploying a supervised ML model into their products or workflows. As a result, we prioritized the development of a novel dataset for each of our challenges. We limited the feature classes to foundational mapping applications, specifically building footprint and road network labels, in order to systematically expand the current data corpus with each new release. We have open sourced over 27,000 km2 of high-resolution satellite imagery and synthetic aperture radar (SAR) data featuring ~900,000 building footprint labels and ~20,000 km of road network labels across 11 cities over five continents. At the time of this writing, parts of the dataset have been downloaded in over 80 countries totaling over 733TB downloaded data through our AWS Open Data S3 repository. We also host several high-quality, third-party datasets from the Intelligence Advanced Research Project Activity (IARPA) Multiview Stereo Graphic Dataset; the IARPA Functional Map of the World (fMoW) Challenge; and the U.S. Special Operations Command (SOCOM) Urban 3D Challenge.

SpaceNet has averaged over 60TBs each quarter in data downloads since 2019.

We have identified four key characteristics for building geospatial datasets:

(1) Quality and Diversity

(2) Permissive Licensing

(3) Accessibility

(4) Interoperability

First, the development of each dataset must adhere to the highest standards of quality. This includes having a detailed label taxonomy, topology rules, and strategy for addressing label diversity (e.g., urban vs. rural settings, etc.) in the train and test sets. We have sought to systematically add more geographic diversity (e.g., new cities, varying terrains, seasonal effects, etc.) with each dataset release, but there is much more we could do in this area. Second, one of the most unique aspects of the SpaceNet dataset is its permissive licensing structure. The entire dataset is now licensed under Creative Commons Attribution-ShareAlike 4.0 International License, which enables researchers and corporations alike to use the datasets in their product development lifecycle. Third, it is important that the datasets remain accessible after the conclusion of each challenge so they can remain an enduring resource for further research. Fourth, we have increasingly focused on making sure the entire dataset is compliant with leading data standards to enable efficient data access and processing. For example, we prioritized the adoption of the Cloud Optimized GeoTIFF (COG) and, more recently, the Spatiotemporal Asset Catalog (STAC) standards (special thanks to the Radiant Earth Foundation team for their support).

Pillar 2: Open Data Science Challenges

We design a public data science challenge around each new dataset. These challenges present an opportunity to crowdsource algorithmic solutions across some of the leading researchers and practitioners in a wide variety of technical domains. Our central strategy since the launch of SpaceNet 1 has been to incrementally increase the technical complexity posed in each challenge. As opposed to taking moonshots, this approach has enabled us to open source models that are useful to current applied research and product development projects. The trick with designing each challenge is finding the appropriate level of added complexity. To date, we have hosted six challenges with our partner Topcoder focused on either building footprint identification or road network extraction and routing estimation. These challenges have attracted over 3,000 technical submissions across the Topcoder community. (Learn more about the Topcoder community here.) We have awarded $300,000 in cash prizes (prizes are offered to the top five submissions, top graduate student submission, and top undergraduate submission) and offered $20,000 in AWS credits that participants can earn throughout the duration of the challenge.

Building Footprints

Our primary technical focus has been building footprint extraction. SpaceNet Challenges 1, 2, 4, and 6 (as well as SpaceNet 7, which is scheduled to launch on August 31, 2020) have asked participants to extract building footprints from remote sensing data sets. We have relied upon the “SpaceNet metric,” a modified F1 score with an intersection over union threshold of 0.5, to evaluate performance of the submissions for each of these challenges (NOTE: SpaceNet 7 will feature a new evaluation metric analyzing footprint detections over time). We have incrementally increased the technical complexity of our building footprint challenges by adding geographic diversity, changing the look angles of the imagery, as well as incorporating a multimodal dataset featuring both high-resolution satellite imagery as well as SAR data.

Road Network Identification and Routing Estimation

Identification of road networks and routing estimation has been SpaceNet’s second major focus area. SpaceNet Challenges 3 and 5 asked participants to extract road networks from satellite imagery as well as estimate routes and travel time. The prediction of travel routes and timing from exclusively remote sensing data is a particularly technical novel challenge and also has direct applicability to certain disaster response scenarios. In order to appropriately evaluate model performance for these challenges, my colleague and IQT Labs Chief Data Scientist, Adam Van Etten, developed the Average Path Length Similarity (APLS) metric. We increased the technical complexity between SpaceNet 3 and 5 by asking participants to add a time estimation prediction to their estimated routes. We also included a mystery city (which turned out to be Dar es Salaam, Tanzania) in our test dataset that participants did not have access to during the challenge to test the generalizability of their models.

SpaceNet has prioritized an incremental approach to increasing analytic complexity per challenge.

Pillar 3: Algorithms

At the conclusion of each challenge, we evaluate and open source the top algorithms from each challenge on our Github repository. In order to lower the barrier to entry and promote use in future work, each algorithm includes a dockerfile and maintains a permissive Apache 2.0 license. We have released 28 algorithms from the first six challenges: 18 for building footprint detection and 10 for road network detection and routing estimation. It has been particularly exciting to see the technical submissions evolve over time to leverage some of the cutting-edge deep learning methods. For instance, four out of five top submissions from the SpaceNet 6 challenge utilized the recently released EfficientNet. Beyond the winning algorithms from the challenges, we also release baseline algorithms developed by CosmiQ prior to the launch of each challenge. We started releasing baselines with the launch of SpaceNet 4 Challenge as a way to help enable participants, particularly those unfamiliar with geospatial datasets, get started quickly.

Pillar 4: Evaluations and Research

If you have followed SpaceNet over the years, then you will have noticed that we have dramatically increased the level of post-challenge evaluation and analysis since the SpaceNet 2 and 3 challenges. These evaluations, either posted on this blog publication or through our other resources (e.g., video tutorials, podcasts or published papers), help explain the strengths and weaknesses in the open sourced winning models. In order to share our results and progress, we presented at industry conferences including State of the Map US, the GEOINT Symposium, FOSS4G, FedGeoDay and NVIDIA GTC. Recently, we have expanded our engagement with the research community through participation in some of the leading computer vision and artificial intelligence conferences. We recently presented papers at Computer Vision Pattern Recognition (CVPR) and the International Conference on Computer Vision (ICCV). The results and evaluations from the SpaceNet 6 challenge were featured at CVPR Earthvision 2020 Workshop (special thanks to our Partners at IEEE GRSS). In keeping with this tradition, the results from the upcoming SpaceNet 7 Challenge will be featured at the Neural Information Processing Systems (NeurIPS) 2020 Competition track.

Looking Ahead

SpaceNet has truly exceeded our initial expectations set back in 2016. It has been particularly rewarding for companies, research organizations, government entities, and academia to utilize SpaceNet resources. Looking towards year five and beyond, we want to continue to build upon the current ecosystem while adding additional projects. First, we will continue to develop novel datasets and data science challenges that feature exciting technical trends in the geospatial industry. The upcoming SpaceNet 7 Challenge featuring Planet’s deep time series data stack illustrates our commitment to creating novel datasets. Beyond SpaceNet 7, we are thinking about a variety of options including new data types, some non-remote sensing data, expanded multimodal datasets (similar to SpaceNet 6), additional labels for existing areas of interest, and new feature types. If you have any suggestions, then please do not hesitate to reach out. Second, the partners are considering new projects either supporting or developing geospatial data standards. Third, we are planning several new offerings to enhance our existing resources. For example, we recently launched a collaboration with Azavea allowing anyone to create their own features in their new offering solution, Groundwork (See Azavea’s recent blog post for additional information). Also, based on a lot of feedback from end users, we are also planning to offer a persistent scoreboard so anyone can compare their scores to past challenge leaderboards. Fourth and finally, SpaceNet would not be possible without the collaboration and support from each of the SpaceNet Partners. If your organization is interested in learning more, then please reach out at cosmiq@iqt.org.

Thanks to all of the SpaceNet Partners and Advisory Council members over the years including the current members: AWS’s Joe Flasher, Grace Kitzmiller, Maggie Carter, and Jed Sundwall; Capella Space’s Jason Brown, and Scott Soenen; Topcoder’s Dan Reitz, Clinton Bonner, Michael Contreras, and Andy LaMora;, the IEEE GRSS’s Ronny Hänsch and Fabio Pacifici; NGA’s Andy Spage, Keisha Roach and Derek Johnson; and Planet’s Jesus Martinez Manso, Giovanni Marchisio, Chris Holmes and Claire Bentley. Also, special thanks to SpaceNet’s co-founder and co-chair, Maxar Technologies, especially Tony Frazier, Walter Scott, Todd Bacastow (Advisory Council Co-Chair), Omar Mahmoud,, and Kristin Carringer. Last but certainly not least, it has been a privilege to work with all of the current and past members and interns of In-Q-Tel CosmiQ Works who all helped make SpaceNet a reality: Adam Van Etten, Jake Shermeyer, Daniel Hogan, Christyn Zehnder, Jeremy Joseph, Nick Weir, David Lindenbaum, Lisa Porter, Todd Stavish, Lee Cohn, Patrick Hagerty, Rosham Ram and Arihant Chadda.

(1) Garber, Megan, “Groundhog Day Was a Horror Movie All Along,” The Atlantic, April 30, 2020.

SpaceNet Turns Four

A Panoramic View of (One Small Slice) of Cybersecurity Data Science

Can Human Judgement Aid Knowledge Discovery Algorithms?

Privacy and Data Science: Protecting Sensitive Data in the Age of Analytics