Home

Awesome

Python 3.7

MedBench

A Russian Medical language understanding Benchmark is the set of NLP tasks on medical textual data for the Russian language.

This repository contains code and data to reproduce the results of the paper RuMedBench: A Russian Medical Language Understanding Benchmark.

Video from the AIME 2022 conference

Tasks Descriptions

*Both tasks are based on the RuMedPrime dataset.

Baselines & Results

We have implemented several baseline models; please see details in the paper.

Accuracy is the base metric for all tasks evaluation. For some tasks, additional metrics are used:

Test results:

ModelRuMedTop3RuMedSymptomRecRuMedDaNetRuMedNLIRuMedNERECG2PathologyRuMedOverall
Naive10.58/22.021.93/5.3050.0033.3393.66/51.961.1529.53
Feature-based49.76/72.7532.05/49.4051.9559.7094.40/62.89-58.46
BiLSTM40.88/63.5020.24/31.3352.3460.0694.74/63.26-53.87
RuBERT39.54/62.2918.55/34.2267.1977.6496.63/73.53-61.44
RuPoolBERT47.45/70.4434.94/52.0571.4877.2996.47/73.15-67.20
RuBioBERT*43.55/68.8628.94/44.5553.9180.3196.63/75.97-62.69
RuBioRoBERTa*46.72/72.8744.01/58.9576.1782.7797.19/77.81-71.54
Human25.06/48.547.23/12.5393.3683.2696.09/76.1839.3458.13

We define the overall model score as mean over all metric values (with prior averaging in the case of two metrics).

* this is implementation from the paper RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining (repository).

You can find the extension of this benchmark (with closed test sets) on the MedBench platform.

How to Run

Please refer to the code/ directory.

Contact

If you have any questions, please post a Github issue or email the authors.

Citation

@misc{blinov2022rumedbench,
    title={RuMedBench: A Russian Medical Language Understanding Benchmark},
    author={Pavel Blinov and Arina Reshetnikova and Aleksandr Nesterov and Galina Zubkova and Vladimir Kokh},
    year={2022},
    eprint={2201.06499},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}