Home

Awesome

Rebiber: A tool for normalizing bibtex with official info.

<p> <a href="https://huggingface.co/spaces/yuchenlin/Rebiber"> <img src="https://img.shields.io/badge/🤗 Web%20Demo--red?style=flat_square"> </a> <a href="https://colab.research.google.com/drive/12oQcLs25CFjI4evsFlWfKD1DfTEiqyCN?usp=sharing"> <img src="https://img.shields.io/badge/Colab%20Notebook--green?style=flat_square&logo=googlecolab"> <!-- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/, width=150, height=150/></a> --> </a> <a href="https://twitter.com/billyuchenlin/status/1353850378438070272?s=20"> <img src="https://img.shields.io/badge/Tweet--blue?style=flat_square&logo=twitter"> </a> </p>

We often cite papers using their arXiv versions without noting that they are already PUBLISHED in some conferences. These unofficial bib entries might violate rules about submissions or camera-ready versions for some conferences. We introduce Rebiber, a simple tool in Python to fix them automatically. It is based on the official conference information from the DBLP or the ACL anthology (for NLP conferences)! You can check the list of supported conferences here. Apart from handling outdated arXiv citations, Rebiber also normalizes citations in a unified way (DBLP-style), supporting abbreviation and value selection.

<!-- ***Web demo:*** [https://rebiber.herokuapp.com/](https://rebiber.herokuapp.com/) (recommended). -->

Demo on Huggingface Space https://huggingface.co/spaces/yuchenlin/Rebiber (recommended)

Colab notebook: here

Changelog

Installation

# pip install rebiber -U # for the stable version
pip install -e git+https://github.com/yuchenlin/rebiber.git#egg=rebiber -U
# rebiber --update  # (optional) update the bib data and the abbr. info  (using wget)

OR

git clone https://github.com/yuchenlin/rebiber.git
cd rebiber/
pip install -e .

If you would like to use the latest github version with more bug fixes, please use the second installation method.

Usage(v1.1.3 and v1.2.0)

Normalize your bibtex file with the official conference information:

rebiber -i /path/to/input.bib -o /path/to/output.bib

You can find a pair of example input and output files in rebiber/example_input.bib and rebiber/example_output.bib.

argumentusage
-ior --input_bib. The path to the input bib file that you want to update
-oor --output_bib. The path to the output bib file that you want to save. If you don't specify any -o then it will be the same as the -i.
-ror --remove. A comma-separated list of value names that you want to remove, such as "-r pages,editor,volume,month,url,biburl,address,publisher,bibsource,timestamp,doi". Empty by default.
-sor --shorten. A bool argument that is "False" by default, used for replacing booktitle with abbreviation in -a. Used as -s True.
-dor --deduplicate. A bool argument that is "True" by default, used for removing the duplicate bib entries sharing the same key. Used as -d True.
-lor --bib_list. The path to the list of the bib json files to be loaded. Check rebiber/bib_list.txt for the default file. Usually you don't need to set this argument.
-aor --abbr_tsv. The list of conference abbreviation data. Check rebiber/abbr.tsv for the default file. Usually you don't need to set this argument.
-uor --update. Update the local bib-related data with the latest Github version.
-vor --version. Print the version of current Rebiber.
-stor --sort. A bool argument that is "False" by default. used for keeping the original order of the bib entries of the input file. By setting it to be "True", the bib entries are ordered alphabetically in the output file. Used as -st True.
<!-- Or ```bash python rebiber/normalize.py \ -i rebiber/example_input.bib \ -o rebiber/example_output.bib \ -l rebiber/bib_list.txt ``` -->

Example Input and Output

An example input entry with the arXiv information (from Google Scholar or somewhere):

@article{lin2020birds,
	title={Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models},
	author={Lin, Bill Yuchen and Lee, Seyeon and Khanna, Rahul and Ren, Xiang},
	journal={arXiv preprint arXiv:2005.00683},
	year={2020}
}

An example normalized output entry with the official information:

@inproceedings{lin2020birds,
    title = "{B}irds have four legs?! {N}umer{S}ense: {P}robing {N}umerical {C}ommonsense {K}nowledge of {P}re-{T}rained {L}anguage {M}odels",
    author = "Lin, Bill Yuchen  and
      Lee, Seyeon  and
      Khanna, Rahul  and
      Ren, Xiang",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.557",
    doi = "10.18653/v1/2020.emnlp-main.557",
    pages = "6862--6868",
}

Supported Conferences

The bib_list.txt contains a list of converted json files of the official bib data. In this repo, we now support the full ACL anthology, i.e., all papers that are published at *CL conferences (ACL, EMNLP, NAACL, etc.) as well as workshops. Also, we support any conference proceedings that can be downloaded from DBLP, for example, ICLR2020.

Note that to DBLP only allows you to download in batches of 1000 using &h=1000&f=0, where f=0|1000|2000 specifies the starting index. So we have to manually download the bib files of each conference and concatenate them together. add_conf.sh takes care of that, too.

The following conferences are supported and their bib/json files are in our data folder. You can turn each item on/off in bib_list.txt. Please feel free to create PR for adding new conferences following this!

NameYears
ACL Anthology(until 2024-07)
AAAI2010 -- 2024
AISTATS2013 -- 2024
ALENEX2010 -- 2020
ASONAM2010 -- 2019
BigDataConf2013 -- 2019
BMVC2010 -- 2023
CHI2010 -- 2024
CIDR2009 -- 2020
CIKM2010 -- 2020
COLT2000 -- 2020
CVPR2000 -- 2023
ICASSP2015 -- 2023
ICCV2003 -- 2023
ICLR2013 -- 2023
ICML2000 -- 2023
IJCAI2011 -- 2023
INTERSPEECH2016 -- 2023
KDD2010 -- 2023
MLSys2019 -- 2020
MM2016 -- 2020
NeurIPS2000 -- 2023
RECSYS2010 -- 2020
SDM2010 -- 2020
SIGIR2010 -- 2023
SIGMOD2010 -- 2022 (2023 and after changed to journal)
SODA2010 -- 2020
STOC2010 -- 2020
UAI2010 -- 2023
WSDM2008 -- 2020
WWW (The Web Conf)2001 -- 2024

Thanks for Anton Tsitsulin's great work on collecting such a complete set bib files!

<!-- python bib2json.py -i data/iclr2020.bib -o data/iclr2020.json python bib2json.py -i data/iclr2019.bib -o data/iclr2019.json python bib2json.py -i data/iclr2018.bib -o data/iclr2018.json python bib2json.py -i data/aaai2020.bib -o data/aaai2020.json -->

Adding a new conference

You may download the bib files of recent conferences from dblp by running the code:

python download_dblp.py

Notice that ECCV and ECML does not work for the automatic download.

Alternatively, you can manually add any conferences from DBLP by downloading their bib files to our raw_data folder, and run a prepared script add_conf.sh. Take ICLR2020 and ICLR2019 as an example:

bash add_conf.sh iclr 2019 2020

Particularly, to update *CL conference, we can

python bib2json.py -i raw_data/anthology.bib -o data/acl.json

Star History

Star History Chart

Contact

Please email yuchen.lin@usc.edu or create Github issues here if you have any questions or suggestions.