Home

Awesome

SpliceBERT-analysis

Additional analysis on SpliceBERT. The original repository is available at SpliceBERT.

Benchmark

On SpliceAI's GTEx dataset

We fine-tuned SpliceBERT on SpliceAI's GTEx dataset with R-Drop regularization for 5 times using different random seeds (model weights: Google Drive). The average AP scores of SpliceBERT (900nt) is comparable (donor) or slightly superior (acceptor) to SpliceAI-10K, while the ensemble model (averaging the predictions of 5 models) underperforms that of SpliceAI-10K, which is likely because that SpliceBERT models were fine-tuned based on the same pre-trained model and thus lack sufficient diversity.

The source codes are available in benchmark_spliceai-gtex.

modelreceptive field sizeAP (donor)AP (acceptor)
SpliceBERT9000.8547 $\pm$ 0.00120.8458 $\pm$ 0.0009
SpliceAI-10k100010.8547 $\pm$ 0.00270.8434 $\pm$ 0.0023
SpliceAI-2k20010.8369 $\pm$ 0.00150.8270 $\pm$ 0.0017
SpliceAI-4004010.7961 $\pm$ 0.00200.7873 $\pm$ 0.0026
SpliceAI-80810.5216 $\pm$ 0.00220.4449 $\pm$ 0.0020
model (ensemble)receptive field sizeAP (donor)AP (acceptor)
SpliceAI-10k (ensemble)100010.87350.8644
SpliceBERT (ensemble)9000.86080.8524

On DeepSTARR's dataset

Though SpliceBERT was pre-trained on primary RNA sequences, it can also be applied to DNA sequences. We finetuned SpliceBERT on DeepSTARR's dataset (https://zenodo.org/records/5502060) to identify sequences with potential enhancer activity. SpliceBERT outperformed DeepSTARR (convolution model) and Nucleotide Transformer (DNA language model). The results are available at benchmark_deepstarr.

modelDevelopmentalHousekeeping
SpliceBERT0.700.78
DeepSTARR0.680.74
Nucleotide Transformer (multi-species)0.640.75

<img src="./benchmark_deepstarr/splicebert_on_deepstarr.png"> SpliceBERT_on_DeepSTARR (show 20% points) </img>

Contact

For any questions, contact chenkenbio_[at]_gmail.com

Citation

@article{chen2024self_bbae163,
  title={Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction},
  author={Chen, Ken and Zhou, Yue and Ding, Maolin and Wang, Yu and Ren, Zhixiang and Yang, Yuedong},
  journal={Briefings in Bioinformatics},
  volume={25},
  number={3},
  pages={bbae163},
  year={2024},
  publisher={Oxford University Press}
}