The HyperspectralViTs models and ecosystem We introduce the HyperspectralViTs models and the OxHyper datasets of hyperspectral data from the NASA's EMIT sensor. We show that the proposed models outperform prior deep learning models in both score and processing speed. We test our models on low-compute hardware. Finally, we release the full annotated training, validation and test datasets of real and synthetic methane events and mineral maps. |
IEEE JSTARS Paper 2025 • Oxford Department of Computer Science news
TED AI in Vienna 2024 talk • Datasets on Hugging Face🤗
Abstract
On-board processing of hyperspectral data with machine learning models would enable an unprecedented amount of autonomy across a wide range of tasks allowing new capabilities such as early warning systems and automated scheduling across constellations of satellites. However, current classical methods suffer from high false positive rates and therefore prevent easy automation while previously published deep learning models exhibit prohibitive computational requirements. We propose fast and accurate machine learning architectures which support endto-end processing of data with high spectral dimension without relying on hand-crafted products or spectral band compression techniques. We create three new large datasets of hyperspectral data containing all relevant spectral bands from the near global sensor EMIT. We evaluate our models on two tasks related to hyperspectral data processing - methane detection and mineral identification. Our models reach a new state-of-the-art performance on the task of methane detection, where we improve the F1 score of previous deep learning models by 27% on a newly created synthetic dataset and by 13% on the previously released large benchmark dataset. Our models generalise from synthetic datasets to data with real methane leak events and boost performance by 6.9% in F1 score in contrast with training models from scratch on the real data. Finally, with our newly proposed architectures, one capture from the EMIT sensor can be processed within 30 seconds on a realistic proxy of the IONSCV 004 satellite and in less than 0.64 seconds on a GPU powered Jetson AGX Xavier board.![]() Illustration of the motivation behind the proposed machine learning models. |
In the paper we present three newly created OxHyper datasets with labels for 1.) real methane leak events, 2.) synthetic methane leak events and 3.) dataset mineral identification. In these, we provide wide range of hyperspectral bands from EMIT, computed methane enhancement products and also manually checked labels. We also use a new version of previously released STARCOP dataset of events from the AVIRIS-NG data, which can be collectively explored in here. For more details see the paper.
Downloading datasets
Please note that created datasets are very large. As an example, to download the OxHyperSyntheticCH4 dataset (of 226 GB), run this:
git lfs install
git clone git@hf.co:datasets/previtus/OxHyperSyntheticCH4
# it is recommended to delete the .git folder to save space:
rm -rdf OxHyperSyntheticCH4/.git
We recommend first trying some of the miniaturized datasets versions, for example: https://huggingface.co/datasets/previtus/starcop_allbands_mini
For an updated list of datasets please check the main page: https://huggingface.co/previtus
Install
conda create -c conda-forge -n hyper_env python=3.11.4 mamba
conda activate hyper_env
pip install git+https://github.com/previtus/HyperspectralViTs.git
Inference
Demo notebooks for showing model inference on Google Colab are being prepared.
Training
To reproduce the same training process as reported in the paper, you will need to download the appropriate OxHyper dataset first, and prepare the coding environment. Remember to adjust paths in scripts/settings.yaml, as these will be used as defaults for the runs. In /bash we provide couple of example training and evaluation scripts (with local overwrites to settings, so remember to adjust these too).
# Check possible parameters with:
!python3 -m scripts.train --help
# Or run one of the prepared training scripts used for the paper models (remember to download and adjust the paths to the training datasets)
./bash/demos_train_hyper_segformer.sh
./bash/demos_train_hyper_efficientvit.sh
# You may also check data exploration demos listed in:
.bash/demos_data_explore.sh
If you find the HyperspectralViTs models or the OxHyper datasets useful in your research, please consider citing our work.
@article{Ruzicka2025HyperspectralViTs,
author={Růžička, Vít and Markham, Andrew},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
title={Hyperspectral{V}i{T}s: General Hyperspectral Models for On-Board Remote Sensing},
year={2025},
volume={18},
number={},
pages={10241-10253},
doi={10.1109/JSTARS.2025.3557527}
}
- Models presented here are directly linked with our prior research in spaceml-org/STARCOP
- We use the newest researched models for Methane leak detection in processing pipelines at the UN International Methane Emissions Observatory to speed up the work of analysts
We would like to thank D-Orbit and Unibap for access to the SpaceCloud® Hardware when measuring our models inference speeds on realistic hardware proxy of a real satellite environment.