Home

Awesome

HEST-Library: Bringing Spatial Transcriptomics and Histopathology together

Designed for querying and assembling HEST-1k dataset

[ arXiv | Data | Documentation | Tutorials | Cite ]

<!-- [ArXiv (stay tuned)]() | [Interactive Demo](http://clam.mahmoodlab.org) | [Cite](#reference) -->

Welcome to the official GitHub repository of the HEST-Library introduced in "HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis", NeurIPS Spotlight, 2024. This project was developed by the Mahmood Lab at Harvard Medical School and Brigham and Women's Hospital.

<img src="figures/fig1.jpeg" /> <br/>

What does this repository provide?

HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.

<br/>

Updates

Download/Query HEST-1k (>1TB)

To download/query HEST-1k, follow the tutorial 1-Downloading-HEST-1k.ipynb or follow instructions on Hugging Face.

NOTE: The entire dataset weighs more than 1TB but you can easily download a subset by querying per id, organ, species...

HEST-Library installation

git clone https://github.com/mahmoodlab/HEST.git
cd HEST
conda create -n "hest" python=3.9
conda activate hest
pip install -e .

Additional dependencies (for WSI manipulation):

sudo apt install libvips libvips-dev openslide-tools

Additional dependencies (GPU acceleration):

If a GPU is available on your machine, we recommend installing cucim on your conda environment. (hest was tested with cucim-cu12==24.4.0 and CUDA 12.1)

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12==24.6.* dask-cudf-cu12==24.6.* cucim-cu12==24.6.* \
    raft-dask-cu12==24.6.*

NOTE: HEST-Library was only tested on Linux/macOS machines, please report any bugs in the GitHub issues.

Inspect HEST-1k with HEST-Library

You can then simply view the dataset as,

from hest import iter_hest

for st in iter_hest('../hest_data', id_list=['TENX95']):
    print(st)

HEST-Library API

The HEST-Library allows assembling new samples using HEST format and interacting with HEST-1k. We provide two tutorials:

In addition, we provide complete documentation.

HEST-Benchmark

The HEST-Benchmark was designed to assess 11 foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes nine tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in nine different organs and eight cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in 4-Running-HEST-Benchmark.ipynb.

HEST-Benchmark results (08.30.24)

HEST-Benchmark was used to assess 11 publicly available models. Reported results are based on a Ridge Regression with PCA (256 factors). Ridge regression unfairly penalizes models with larger embedding dimensions. To ensure fair and objective comparison between models, we opted for PCA-reduction. Model performance measured with Pearson correlation. Best is bold, second best is underlined. Additional results based on Random Forest and XGBoost regression are provided in the paper.

ModelIDCPRADPAADSKCMCOADREADccRCCLUADLYMPH IDCAverage
Resnet500.47410.30750.38890.48220.25280.08120.22310.49170.23220.326
CTransPath0.5110.34270.43780.51060.22850.110.22790.49850.23530.3447
Phikon0.53270.3420.44320.53550.25850.15170.24230.54680.23730.3656
CONCH0.53630.35480.44750.57910.25330.16740.21790.53120.25070.3709
Remedis0.5290.34710.46440.58180.28560.11450.26470.53360.24730.3742
Gigapath0.55080.37080.47680.55380.3010.1860.23910.53990.24930.3853
UNI0.57020.3140.47640.62540.2630.17620.24270.55110.25650.3862
Virchow0.57020.33090.48750.60880.3110.20190.26370.54590.25940.3977
Virchow20.59220.34650.46610.61740.25780.20840.27880.56050.25820.3984
UNIv1.50.59890.36450.49020.64010.29250.22400.25220.55860.25970.4090
Hoptimus00.59820.3850.49320.64320.29910.22920.26540.55820.25950.4146

Benchmarking your own model

Our tutorial in 4-Running-HEST-Benchmark.ipynb will guide users interested in benchmarking their own model on HEST-Benchmark.

Note: Spontaneous contributions are encouraged if researchers from the community want to include new models. To do so, simply create a Pull Request.

Issues

Citation

If you find our work useful in your research, please consider citing:

Jaume, G., Doucet, P., Song, A. H., Lu, M. Y., Almagro-Perez, C., Wagner, S. J., Vaidya, A. J., Chen, R. J., Williamson, D. F. K., Kim, A., & Mahmood, F. HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis. Advances in Neural Information Processing Systems, December 2024.

@inproceedings{jaume2024hest,
    author = {Guillaume Jaume and Paul Doucet and Andrew H. Song and Ming Y. Lu and Cristina Almagro-Perez and Sophia J. Wagner and Anurag J. Vaidya and Richard J. Chen and Drew F. K. Williamson and Ahrong Kim and Faisal Mahmood},
    title = {HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis},
    booktitle = {Advances in Neural Information Processing Systems},
    year = {2024},
    month = dec,
}

<img src=docs/joint_logo.png>