To Blog

SpaceNet 6: Data Fusion and Colorization

Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e., building footprint & road network detection). SpaceNet is run in collaboration by co-founder and managing partner, CosmiQ Works, co-founder and co-chair, Maxar Technologies, and our partners including Amazon Web Services (AWS)Capella SpaceTopcoderIEEE GRSS, the National Geospatial-Intelligence Agency and Planet.


When we first created the SpaceNet 6 dataset we were quite interested in testing the ability to transform SAR data to look more like optical imagery. We thought that this transformation would be helpful for improving our ability to detect buildings or other objects in SAR data. Interestingly, some research had been done in this field, with a few approaches successfully transforming SAR to look like optical imagery at coarser resolutions (123). We had tried some similar approaches with the SpaceNet 6 data but had found the results underwhelming and the model outputs incoherent. We concluded that translating SAR data to look nearly identical to optical data appeared to be quite challenging at such a high resolution and may not be a viable solution.

A few weeks later, one of our colleagues Jason Brown from Capella Space sent us an interesting image he had created. He had merged the RGB optical imagery and SAR data using a Hue Saturation and Value data fusion approach. The approach was intriguing as it maintained the interesting and valuable structural components of SAR while introducing the color of optical imagery.

The inception of this workflow: Can we teach a neural network to automatically fuse RGB and SAR data? — As shown in the middle image above. Will this help us improve the quality of our segmentation networks for extracting building footprints?

After seeing the image above, it triggered something in my mind — Instead of doing the direct conversion of SAR to optical imagery, what if we could work somewhere in the middle, training a network to convert SAR to look like the fused product shown above? On the surface this task seems far easier than the direct conversion and something that could be beneficial in a workflow. We hypothesized that a segmentation network should find this color valuable to use for building detection. Moreover, we wanted to test this approach in an area where we didn’t have concurrent SAR and optical collects, simulating a real world scenario where this would commonly occur due to inconsistent orbits or cloud cover that would render optical sensors useless.

In this final blog in our post-SpaceNet 6 analysis challenge series (previous posts: [1,2,3]), we dive into a topic that we’ve been greatly interested in for a long while: Can colorizing synthetic aperture radar data with a deep learning network really help to improve you ability to detect buildings?

The Approach and Results

To achieve our analysis goals we constructed the following workflow:

I. SAR and Optical Data Fusion

To begin our workflow we fuse our SAR and RGB optical imagery using a Hue Saturation Value (HSV) image fusion technique. We again use the larger 1200 pixel² images as introduced in the previous blog. The process is as follows:

  1. We first convert our quad-polarized 4-channel SAR data to single channel by calculating the total polarimetric scattering power for each pixel. When working with polarimetric radar, this is known as a calculating the ‘SPAN’ or creating a SPAN image.
  2. We then convert our RGB image and transform it to the HSV color space and swap the value channel with our single-channel SAR span image.
  3. Finally, we convert the [Hue, Saturation, SAR SPAN] image back to RGB color space.

The final result is an interesting fusion of optical and SAR imagery. We perform this technique for all three data splits. You can download the code to do this here:

II. Training a Colorization Network

We next train the Pix2Pix (Code) Generative Adversarial Network (GAN) using the colorization mode. We modify the base network slightly to work in HSV colorspace rather than LAB colorspace. The network will take a SAR span image as the input, learn corresponding hue and saturation values and then use those value to create a colorized output. We actually train on the ‘test_public’ split of the SpaceNet 6 dataset for this process. This ensures that the model does not overfit to our training subset that we will later use to train a segmentation network or to our third final testing split of the dataset.

Training time is actually quite fast for this approach at ~5 hours using 4 NVIDIA Titan Xp GPU’s with 12GB of memory each. Ultimately, no participants attempted to run a colorization or domain adaptation process using the optical data to pre-process the SAR in any fashion. This minimal training time does show that this was a possibility in the challenge, but may not be worth doing.

III. Creating Colorized Outputs

We next run inference, creating our colorized outputs. In the figure below we can see the fused RGB and SAR (which we label as ground truth) in the leftmost column, our SAR SPAN image (the input into Pix2Pix) in the center column, and our Pix2Pix predicted output in the rightmost column. Simply put — we train Pix2Pix to read in images in the center column and create the output images on in rightmost column. Inference is a bit slow at ~5.4 seconds per image tile. However this implementation only uses 1 CPU for data loading, thus could likely be improved many times over with some parallel processing.

Visually these results are quite promising, however what really matters is: will this preprocessing step help you extract buildings? Ultimately we’re interested in testing if this is worth doing when you have some optical data at a different location and time, but no optical data (only SAR) when you want to perform inference. Of note, Pix2Pix does tend to apply green colors a bit too frequently which certainly could further complicate building extraction.

IV. Final Exams

The final step in our pipeline is to once again train the SpaceNet 6 winner zbigniewwojna’s algorithm on the fused data (left column, the SAR SPAN (center column) and on the colorized data (right column) and test our results on the ground-truth. This side by side comparison will enable us to evaluate:

  1. Is SAR/RGB fusion worth doing?
  2. Is colorizing SAR data with a deep learning network when you lack RGB imagery a plausible solution?
Performance of different model inputs into the SpaceNet 6 winning algorithm. The scoring format data types can be seen in the figure in the previous section.The overall score represents the SpaceNet Metric (x 100). We also report model precision (ratio of false predictions) and recall (ratio of missed ground truth polygons).

On the whole, these results are fairly disappointing but there are some good lessons to be learned from this workflow. The main takeaway here is that SAR and RGB fusion is worth doing- it provides a 55% performance boost over using the SAR SPAN only. Additionally, using the SAR SPAN is only slightly less performant than using all 4 polarizations for this task (39.5 vs. 42.4).

Unfortunately recreating this data fusion process with a deep learning approach is quite difficult. We found that the GAN tends to colorize things inconsistently as well as leave artifacts behind that confuse our segmentation model. Additionally, we also tried to use CycleGAN to accomplish a domain adaptation approach and other SAR inputs, however the results were similarly very poor. Overall, we took the most direct approach and did little modification to the existing Pix2Pix model architecture. Additional training data, different inputs or colorspaces, modified data loaders, and a customized network are likely baseline requirements to improve these results and advance research forward.

An output image from Pix2Pix in the HSV colorspace. Although this image is strangely beautiful, the artifacts left behind by the GAN are certainly not helpful as an input into a building segmentation model.


This blog marks the end of our post-SpaceNet 6 analysis series. We learned quite a bit along the way including:

  1. EfficentNet rises above the competition.
  2. What are the effects of building height and size on model performance? — Tall, short, or small buildings can be quite challenging when working with SAR data.
  3. How well do models perform at city scales? — 20% better than on individual tiles.
  4. How many SAR revisits are necessary to maximize model performance for extracting static objects like buildings? — Four.
  5. Does RGB and SAR fusion help you detect buildings? — Absolutely.
  6. Should I train a network to colorize SAR data and apply to it to an area where I lack optical imagery? — You should try. Our results are very preliminary and lack the rigor required to maximize performance.

What’s Next?

Over the next weeks we will close out the SpaceNet 6 challenge, open-sourcing the prize winning algorithms and releasing an expanded SpaceNet 6 dataset, which will introduce both phase information and complex data.

Special thanks to Jason Brown at Capella Space for inspiring this blog and research.

IQT Blog

Insights & Thought Leadership from IQT

Read More