Awesome
PT Lexical Semantics
This repository includes different computational resources around lexical-semantic knowledge in Portuguese and can be seen as a follow-up to the Onto.PT and CONTO.PT projects (http://ontopt.dei.uc.pt/).
The following resources are included:
-
Large Portuguese Lexical-Semantic Knowledge Base (PT-LKB), with instances of lexical-semantic relations acquired from ten computational lexical resources: PAPEL (http://www.linguateca.pt/PAPEL/), Dicionário Aberto (dicionario-aberto.net), Wikcionário.PT (https://pt.wiktionary.org), TeP (http://www.nilc.icmc.usp.br/tep2/), OpenThesaurus.PT (http://paginas.fe.up.pt/~arocha/AED1/0607/trabalhos/thesaurus.txt), OpenWordNet-PT (https://github.com/own-pt/openWordnet-PT), PULO (http://wordnet.pt/), Port4Nooj (http://www.linguateca.pt/Repositorio/Port4Nooj/), WordNet.Br (http://www.nilc.icmc.usp.br/wordnetbr/), ConceptNet (http://conceptnet.io/)
-
PT-LKB embeddings, word embeddings learned from the structure of the large Portuguese LKB with node2vec.
-
TALES, an analogy-like test with lexical-semantic relations for assessing Portuguese word embeddings, with relations acquired from the large Portuguese LKB.
-
Analogies (TAP), an adaptation of the <a href="https://github.com/nathanshartmann/portuguese_word_embeddings/blob/master/analogies/testset/LX-4WAnalogies.txt">LX-4WAnalogiesPT</a> analogy test to the BATS format, also adopted by TALES.
-
BATS-PT, a manual translation of the lexicographic portion of the <a href="https://aclanthology.org/N16-2002.pdf">Bigger Analogy Test Set (BATS)</a> to Portuguese, covering ten types of lexico-semantic analogies, that can be used for assessing word embeddings and language models.