Awesome
RePro: A Benchmark Dataset for Opinion Mining in Brazilian Portuguese
RePro, which stands for "REview of PROducts," is a benchmark dataset for opinion mining in Brazilian Portuguese. It consists of 10,000 humanly annotated e-commerce product reviews, each labeled with sentiment and topic information. The dataset was created based on data from one of the largest Brazilian e-commerce platforms, which produced the B2W-Reviews01 dataset (https://github.com/americanas-tech/b2w-reviews01). The RePro dataset aims to provide a valuable resource for tasks related to sentiment analysis and topic modeling in the context of Brazilian Portuguese e-commerce product reviews. It is designed to serve as a benchmark for future research in natural language processing and related fields.
Licensing
RePro is available at https://github.com/lucasnil/repro and https://huggingface.co/datasets/lucasnil/repro under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (CC BY-NC-SA 4.01, https://creativecommons.org/licenses/by-nc-sa/4.0/), which means that licensees may only copy, distribute, display, work on and make derivative works and remixes based on it if they give credit to B2W Digital in the manner specified in https://github.com/americanas-tech/b2w-reviews01/blob/main/b2wreviews01_stil2019.pdf. Also, licensees may only distribute derivative works under a license identical (“not more restrictive”) to the license that governs the original work. Finally, licensees may only copy, distribute, display, work on, and make derivative works and remixes based on it for non-commercial purposes. We emphasize that models, AI, or any content derived from this corpus, including fine-tuned models, are strictly prohibited for commercial use.
Citation
When utilizing or referencing this dataset, kindly cite the following publication:
@inproceedings{dos2024repro,
title={RePro: a benchmark for Opinion Mining for Brazilian Portuguese},
author={dos Santos Silva, Lucas Nildaimon and Real, Livy and Zandavalle, Ana Claudia Bianchini and Rodrigues, Carolina Francisco Gadelha and da Silva Gama, Tatiana and Souza, Fernando Guedes and Zaidan, Phillipe Derwich Silva},
booktitle={Proceedings of the 16th International Conference on Computational Processing of Portuguese},
pages={432--440},
year={2024}
}