Home

Awesome

<img src="elmo dalle2.png" alt="image_description" width="40" height="40"/> scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis

News!

We have uploaded gene embeddings from gpt4-o and drug embeddings from GPT 3.5 in our website, please check them if you wanna have a try!

Installation

We rely on OpenAI API for query.

pip install openai

The descriptions and tutorials for OpenAI API can be found in this link.

We reply on these packages for zero-shot learning analysis.

pip install scib scib_metrics==0.3.3 pickle mygene scanpy==1.9.3 scikit-learn

Installing hnswlib from the original Github profile to avoid potential errors.

apt-get install -y python-setuptools python-pip #may not need it for HPC base
git clone https://github.com/nmslib/hnswlib.git
cd hnswlib
pip install .

All the packages above are enough for testing tasks absed on zero-shot learning.

We rely on PyTorch for fine-tuning.

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install lightning -c conda-forge

For the perturbation analysis, please install related pacakges based on their website and use the modifeid version provided in the Perturbation Analysis folder: CINEMAOT, CPA and GEARS.

To generate gene embeddings from sequence models (as seq2emb), please refer seq2cells to install related packages.

For users who cannot access OpenAI API, we provide an alternative solution based on deepseekv2. Please refer the Get outputs from LLMs for more information.

Tutorials

Please use the example ipynb notebook in each folders as instructions. Evaluations are included in the notebooks. The demo tutorial can be finished in a normal computer within 10 minutes with a prepared environment.

Datasets

All of the datasets and their download information are included in the Supplementary file 3. A demo dataset for clustering can be found in this link.

Database for scELMo

We are maintaining a website containing embeddings of different information generated by LLM. We are happy to discuss if you have any requests or comments.

Acknowledgement

We refer the codes from the following packages to implement scELMo. Many thanks to these great developers:

GenePT, seq2cells, CINEMAOT, CPA and GEARS.

Open for contribution

We are happy to see if you have more exciting ideas about the extension of scELMo. Feel free to contact us for discussion:

Tianyu Liu (tianyu.liu@yale.edu)

Citation

@article{liu2023scelmo,
  title={scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis},
  author={Liu, Tianyu and Chen, Tianqi and Zheng, Wangjie and Luo, Xiao and Zhao, Hongyu},
  journal={bioRxiv},
  pages={2023--12},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

Related work