Home

Awesome

MT@BigScience

Evaluation results for Machine Translation within the BigScience project. Evaluation is carried out using the BigScience fork of lm-evaluation-harness coupled with the eval-hackathon branch of PromptSource. N.B. Updates of latest versions are currently ongoing and will be available shortly.

Citation

This repository contains codes and outputs to accompany the paper "Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM". Please cite the following:

@inproceedings{bawden-yvon-bloom-mt-2023,
    author = {Bawden, Rachel and Yvon, François},
    title = {Investigating the Translation Performance of a Large Multilingual
Language Model: the Case of {BLOOM}},
    booktitle = {Proceedings of the 24th Annual Conference of the European Association for Machine Translation},
    url = {https://arxiv.org/abs/2303.01911},
    year = {2023},
    notes = {To appear}
}

Outputs and evaluation

Extract all predictions and evaluate

python scripts/process_results_{flores,diabla,wmt}.py

This extracts all predictions from .jsonl files into .tsv files and calculates BLEU and COMET scores. These are written out to the following folders:

Three versions of each output are generated:

  1. The original outputs
  2. The outputs truncated at the first newline (newline-cut)
  3. The outputs truncated at the first newline or before the first repetition of the 'xglm' prompt (newline-cut-custom-truncate). This corresponds to the truncated outputs from the paper.

Generate latex tables:

python scripts/make-tables-{flores,diabla,wmt}.py

Results (BLEU scores)

Cross-dataset and model comparison (focus on English-French and English-Hindi)

WMT14 results (Original outputs)

Lang. dir#shotsBLOOMT0mT0-xxlOPT
en→fr014.911.2129.2712.95
127.831.4125.2421.92
fr→en015.5225.7932.8815.54
134.6121.0130.0324.55
en→hi06.800.1611.200.14
113.620.129.500.08
hi→en012.050.0026.130.42
125.040.0120.150.58

DiaBLa results (Original outputs)

Lang. dir#shotsBLOOMT0mT0-xxlOPT
en→fr00.880.5228.440.53
15.700.6121.0315.52
fr→en00.8525.5134.960.83
112.0520.5726.8812.05

Flores-101 results (Original outputs)

Lang. dir#shotsBLOOMT0mT0-xxlOPT
en→fr02.771.8655.452.76
144.992.1353.5324.36
fr→en02.7331.9060.102.59
145.5924.8658.2216.74
en→hi01.290.1567.690.07
127.250.0654.660.12
hi→en03.400.0059.550.10
135.060.1957.320.45

WMT14 results (Truncated outputs)

Lang. dir#shotsBLOOMT0mT0-xxlOPT
en→fr032.251.2129.2418.86
136.291.4125.1922.31
fr→en037.1625.8032.8733.18
138.1821.0729.9533.25
en→hi012.100.1611.200.11
115.730.129.500.08
hi→en024.290.0026.060.51
125.040.0120.060.61

DiaBLa results (Truncated outputs)

Lang. dir#shotsBLOOMT0mT0-xxlOPT
en→fr024.230.5228.4417.42
137.570.6121.8920.71
fr→en022.9425.5134.9236.80
141.3621.0927.2037.63

Flores-101 results (Truncated outputs)

Lang. dir#shotsBLOOMT0mT0-xxlOPT
en→fr026.911.8555.3421.40
149.322.1353.4028.41
fr→en040.2831.9060.0139.41
147.2425.2058.2439.82
en→hi07.740.1567.690.12
129.520.0654.660.12
hi→en030.190.0059.550.23
135.060.1957.270.50

Flores-101: High-resource, 1-shot

(Original outputs with no postprocessing)

Src↓Trg→arenesfrzh
arBloom--40.2823.3233.1217.68
M2M--25.5016.7425.6913.10
enBloom28.21--29.4244.9926.69
M2M17.92--25.5741.9919.33
esBloom18.7632.70--24.8020.92
M2M12.1125.09--29.3314.86
frBloom23.4445.5927.51--23.15
M2M15.3637.1725.60--17.61
zhBloom15.0530.5020.5426.01--
M2M11.5520.9116.9224.32--

Flores-101:High-mid resource, 1-shot

(Original outputs with no postprocessing)

Src↓Trg→enfrhiidvi
enBloom--44.9927.2539.0028.54
M2M--41.9928.1537.2635.10
frBloom45.59--18.4731.4432.76
M2M37.17--22.9129.1430.26
hiBloom35.0627.62------
M2M27.8925.88------
idBloom43.2530.35------
M2M33.7430.81------
viBloom38.7126.85------
M2M29.5125.82------

Flores-101: Low-resource, 1-shot

(Original outputs with no postprocessing)

Src↓Trg→bnenhiswyo
enBloom--24.6527.2520.512.60
M2M--23.0428.1526.952.17
bnBloom29.91--16.34----
M2M22.86--21.76----
hiBloom35.0623.77------
M2M27.8921.77------
swBloom37.40------1.31
M2M30.43------1.29
yoBloom4.08----0.89--
M2M4.18----1.93--

Flores-101: Romance languages, 1-shot

(Original outputs with no postprocessing)

Src↓Trg→caesfrglitpt
caBloom--28.9233.7919.2419.8533.05
M2M--25.1735.0833.4225.5035.17
esBloom31.16--24.8023.2816.4929.11
M2M23.12--29.3327.5423.8728.10
frBloom37.1627.51--24.9223.9738.94
M2M28.7425.60--32.8228.5637.84
glBloom37.4927.0933.77--18.2632.16
M2M30.0727.6537.06--26.8734.81
itBloom31.0025.4031.3620.16--29.15
M2M25.2029.2334.3929.23--31.47
ptBloom39.5628.0740.3427.1020.06--
M2M30.6926.8840.1733.7728.09--

Flores-101: Bengali→English MT, Transfer using 1-shot example from a different language direction

(Original outputs with no postprocessing)

1-shot example direction type1-shot example directionspBLEU orig.COMET orig.spBLEU trunc.COMET trunc.
Samebn→en29.910.444029.910.4440
Oppositeen→bn21.810.313229.420.4143
Related sourcehi→en30.140.449230.540.4603
Related source (from WMT)hi→en29.060.421629.070.4274
HR unrelated sourcefr→en17.190.314729.680.3960
HR unrelated sourcefr→ar8.44-0.102527.990.3218

DiaBLa context results (1-shot with differing source of context)

The 1-shot example can be:

OriginDirectionTruncateen→fr BLEUen→fr COMETfr→en BLEUfr→en COMET
Rand.rand.5.700.342112.050.6138
37.570.634341.360.7576
Prev.rand.6.100.328012.340.6166
38.510.613941.570.7513
Prev.Same19.320.596520.710.7190
38.950.632542.100.7607
Prev.Opposite3.640.06358.560.5184
37.760.589841.200.7423