Awesome
lexica-lists-words
Dictionaries and lists of names, acronyms and it's extensions, stop-words, etc., which I gathered for different experiments. Acronyms were automatically extracted with A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text by A.S Schwartz and M.A. Hearst. A Java implementation is available here.
-
NomesLex-PT a lexicon of person names made up of 2,027 first names and 8,019 surnames, more information here.
-
PT-stopwords.txt a collections of stop-words for Portuguese.
-
geo-net-pt02_terms_frequency_wpt05.zip contains the frequency of occurrence of toponyms names from Geo-Net-PT_02 in WPT05 a crawl of the Portuguese Web
-
names-surnames-NL-UK-IT-PT-ES.zip a list of names and surnames for Dutch, English, Portuguese and Spanish.
-
publico-cargos.txt a list of Portuguese noun quantifiers, i.e., words that occur before a proper noun, gathered from the on-line newspaper publico.pt.
-
publico-acronyms.txt a list of acronyms and it's possible extensions, extracted from a collection of Portuguese news gathered from the on-line newspaper publico.pt.
-
wikipedia-acronyms.txt a list of acronyms and it's possible extesions, extracted from the English Wikipedia.