Home

Awesome

bleuscore

codecov MIT licensed Crates.io PyPI - Version docs.rs

bleuscore is a fast BLEU score calculator written in rust.

Installation

The python package has been published to pypi, so we can install it directly with many ways:

Quick Start

The usage is exactly same with huggingface evaluate:

- import evaluate
+ import bleuscore

predictions = ["hello there general kenobi", "foo bar foobar"]
references = [
    ["hello there general kenobi", "hello there !"],
    ["foo bar foobar"]
]

- bleu = evaluate.load("bleu")
- results = bleu.compute(predictions=predictions, references=references)
+ results = bleuscore.compute(predictions=predictions, references=references)

print(results)
# {'bleu': 1.0, 'precisions': [1.0, 1.0, 1.0, 1.0], 'brevity_penalty': 1.0, 
# 'length_ratio': 1.1666666666666667, 'translation_length': 7, 'reference_length': 6}

Benchmark

TLDR: We got more than 10x speedup when the corpus size beyond 100K

<p align="center"> <img src="./benchmark/bench.png" alt="Benchmark" width="400" height="300"> </p>

We use the demo data shown in quick start to do this simple benchmark. You can check the benchmark/simple for the benchmark source code.

The N is used to enlarge the predictions/references size by simply duplication the demo data as shown before. We can see that as N increase, the bleuscore gets better performance. You can navigate benchmark for more benchmark details.

N=100

hyhyperfine --warmup 5 --runs 10   \
  "python simple/rs_bleuscore.py 100" \
  "python simple/local_hf_bleu.py 100" \
  "python simple/sacre_bleu.py 100"   \
  "python simple/hf_evaluate.py 100"

Benchmark 1: python simple/rs_bleuscore.py 100
  Time (mean ± σ):      19.0 ms ±   2.6 ms    [User: 17.8 ms, System: 5.3 ms]
  Range (min … max):    14.8 ms …  23.2 ms    10 runs

Benchmark 2: python simple/local_hf_bleu.py 100
  Time (mean ± σ):      21.5 ms ±   2.2 ms    [User: 19.0 ms, System: 2.5 ms]
  Range (min … max):    16.8 ms …  24.1 ms    10 runs

Benchmark 3: python simple/sacre_bleu.py 100
  Time (mean ± σ):      45.9 ms ±   2.2 ms    [User: 38.7 ms, System: 7.1 ms]
  Range (min … max):    43.5 ms …  50.9 ms    10 runs

Benchmark 4: python simple/hf_evaluate.py 100
  Time (mean ± σ):      4.504 s ±  0.429 s    [User: 0.762 s, System: 0.823 s]
  Range (min … max):    4.163 s …  5.446 s    10 runs

Summary
  python simple/rs_bleuscore.py 100 ran
    1.13 ± 0.20 times faster than python simple/local_hf_bleu.py 100
    2.42 ± 0.35 times faster than python simple/sacre_bleu.py 100
  237.68 ± 39.88 times faster than python simple/hf_evaluate.py 100

N = 1K ~ 1M

CommandMean [ms]Min [ms]Max [ms]Relative
python simple/rs_bleuscore.py 100020.3 ± 1.318.221.41.00
python simple/local_hf_bleu.py 100045.8 ± 1.244.247.52.26 ± 0.16
python simple/rs_bleuscore.py 1000037.8 ± 1.535.939.51.87 ± 0.14
python simple/local_hf_bleu.py 10000295.0 ± 5.9288.6304.214.55 ± 0.98
python simple/rs_bleuscore.py 100000219.6 ± 3.3215.3224.010.83 ± 0.72
python simple/local_hf_bleu.py 1000002781.4 ± 42.22723.12833.0137.13 ± 9.10
python simple/rs_bleuscore.py 10000002048.8 ± 31.42013.22090.3101.01 ± 6.71
python simple/local_hf_bleu.py 100000028285.3 ± 100.928182.128396.11394.51 ± 90.21