Airbus - Satellite Change Detection

1. Context and objectives

For teams dedicated to analyzing satellite imagery and detecting changes, the large temporal and spatial scale makes it tedious to identify the most relevant changes to analyze (e.g., the appearance of a deforestation zone, detection of a new building/road in an area of interest, flooding/drying of land). The proposed approach is as follows:

Extract local image representations using large visual or multimodal foundation models, both general and in-domain (SAM2, DinoV2, Clay, Anysat)
Detect changes in local representations from one image to another within a time series
Cluster the variations in local representations
Detect the most relevant atypical variations by ranking them based on rarity or magnitude of change

The analyst can then select a change from among the atypical variations detected by the model and focus their attention on this type of change at other points in the time series or in another area (another time series).

After exploring the Dynamic Earth Net dataset, the study compared the performance of various foundation models in detecting labeled changes in the ground truth. The most effective models were then used to extract time series of satellite image embeddings to implement unsupervised change detection methods.

2. Dataset and preprocessing

We used for this project the Dynamic Earth Net Dataset (see Figure 1). It can be downloaded here.

Figure 1 - Visualization of the Dynamic Earth Net Dataset

This dataset consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs. These observations are paired with pixel-wise monthly semantic segmentation labels of seven land use and land cover (LULC) classes.

After carefully checking the alignment of the images, necessary for our change detection task, we estimated ground truth changes based on the LULC labels.

3. Foundation models benchmark

Image embeddings were generated using four foundation models:

SAM2 (general)
DinoV2 (general)
Clay (trained on satellite images)
Anysat (trained on satellite images)

We compared the evolution of embeddings per patch to aggregated ground truth changes per patch to evaluate each model's performance.

Figure 2 - Comparison Methodology between extracted embeddings and ground truth

Results were analyzed across different types of changes, including various temporal and semantic patterns. A detailed case study is presented in the next section.

4. Use case

The selected use case demonstrates an extended drought phenomenon, which is clearly visible in the satellite images (see Figure 3) and also reflected in the ground truth (see Figure 4).

Figure 3 - Extract of satellite images for the use case

Figure 4 - Ground truth evolution over time for the case study

The ROC curve for this case study shows better performance for the two in-domain foundation models: Clay and Anysat (see Figure 5).

Figure 5 - ROC Curve obtained for the case study

Patch-level analysis shows a strong tendency for SAM2 and DinoV2 to produce false positives. Their larger patch size also makes detection less precise (see Figure 6).

Figure 6 - Comparison of the performance of the four models on the drought use case

Figure 6b - Anysat vs Clay model comparison on drought detection

Figure 6c - Clay model detailed analysis on another region (Contributor: Martin LE CORRE)

5. Automatic change detection

We tested a local change detection method per patch by comparing new observations to reference observations, as illustrated in Figure 7.

Figure 7 - Local change detection method

Figure 8 shows that two clusters are clearly identified: before and after the change. Both the distance-based method and the LOF (Local Outlier Factor) score allow for change identification, with the LOF method better isolating noise from real changes.

Figure 8 - Results of the local change detection method on the case study

6. Conclusion

Our study demonstrates that foundation models trained specifically on satellite imagery, such as Clay and Anysat, outperform general-purpose models in detecting changes over time.

By extracting local patch-based embeddings, we were able to generate time series that facilitate unsupervised change detection, with the LOF-based method proving quite effective at distinguishing real changes from noise.

These results highlight the potential of combining in-domain foundation models with unsupervised techniques to efficiently identify relevant changes in large-scale satellite datasets.

Future work should focus on testing the robustness of these methods on reconstructed images or images with varying lighting conditions, refining the distinction between semantic and non-semantic changes, and further evaluating the performance of Clay and Anysat across different types of changes.

Additionally, we aim to develop a hierarchical approach to change detection, clustering different types of changes and prioritizing them based on their atypicality.

References

Toker, Aysim, et al. "Dynamicearthnet: Daily multi-spectral satellite dataset for semantic change segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, https://arxiv.org/pdf/2203.12560