Home

Awesome

EVALITA

EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language.

The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent manner.

The diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for NLP and speech sciences. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that it is worth pursuing such goals for the Italian language.

As a side effect of the evaluation campaign, both training and test data are available to the scientific community as benchmarks for future improvements.

EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC) and it is endorsed by the Italian Association for Artificial Intelligence (AI*IA) and the Italian Association for Speech Sciences (AISV).

http://www.evalita.it/

EVALITA 2016

The 5th evaluation campaign EVALITA 2016 was organized along the following selected tasks:

EVALITA 2016 is an initiative of AILC (Associazione Italiana di Linguistica Computazionale).

Proceedings are available on the CEUR open access platform and on the aAccademia University Press website.

Read the Storify story on the final workshop.

Follow EVALITA 2016 on Twitter and Facebook and use the hashtag #EVALITA2016 to disseminate the initiative!!

DATA

Data repository structure:

|-artiphone: ArtiPhon data<br> |----cnz_1.0.0<br> |----test_artiphone_lables.zip<br> |-facta: FactA data<br> |----evalita2016_facta_gold_pilot.tsv<br> |----evalita2016_facta_tweet_id_pilot<br> |-neelit: NEEL-IT data<br> |----neel-it16_dev-set_v4: training set folder<br> |----neel-it_evalita2016-nil.gold.idfix: goldstandard annotations<br> |----neel-it_evalita2016_v3.data.gold.test: Test set<br> |-postwita: PoSTWITA data<br> |-----goldDEVset-2016_09_05.txt: training set data<br> |-----goldTESTset-2016_09_12.txt: test set data<br> |-qa4faq: QA4FAQ data<br> |-----qa4faq_dev_v3: training data folder<br> |-----qa4faq_qrel: relevance judgments<br> |-----qa4faq_qrel.trec: relevance judgments (TREC format)<br> |-----qa4faq_question: questions for testing<br> |-sentipolc: SENTIPOLC data<br> |-----sentipolc16_gold2000.csv: test set<br> |-----sentipolc16_officialdistrib_train.csv: training set<br> |-shared Files in this directory contain tweet ids shared between tasks<br> |-EVALITA 2016 overview: Overview of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian

Each task folder contains a PDF document that describes the task and data format. You can find further information on the website of each task.

If you use these data in writing scientific papers, or you use this data in any other medium serving scientists or students (e.g. web-sites, CD-ROMs) please include the following citation:

@CONFERENCE{Evalita2016,<br> author={Basile, P. and Cutugno, F. and Nissim, M. and Patti, V. and Sprugnoli, R.},<br> title={EVALITA 2016: Overview of the 5th evaluation campaign of natural language processing and speech tools for Italian},<br> journal={CEUR Workshop Proceedings},<br> year={2016},<br> volume={1749},<br> }

LICENSE

Attribution-NonCommercial-ShareAlike 3.0 Italy (CC BY-NC-SA 3.0 IT)

You are free to:

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Notices:

https://creativecommons.org/licenses/by-nc-sa/3.0/it/deed.en

For what concerns Twitter datesets: any Content provided to third parties remains subject to the Twitter's Developer Agreement & Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy before receiving such downloads.