Home

Awesome

PhoW2V: Pre-trained Word2Vec syllable and word embeddings for Vietnamese

PhoW2V provides collections of pre-trained Word2Vec syllable- and word-level embeddings for Vietnamese, that were pre-trained on a 20GB corpus of Vietnamese texts and used for our EMNLP-2020 Findings work "A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese":

@inproceedings{phow2v_vitext2sql,
    title     	= {{A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese}},
    author    	= {Anh Tuan Nguyen and Mai Hoang Dao and Dat Quoc Nguyen},
    booktitle   = {Findings of the Association for Computational Linguistics: EMNLP 2020},
    year      	= {2020},
    pages       = {4079--4085}
}  
Pre-trained embeddingsSyllable/WordEmbedding sizeDownload mirror
PhoW2V_syllables_100dimsSyllable-level100Mirror
PhoW2V_syllables_300dimsSyllable-level300Mirror
PhoW2V_words_100dimsWord-level100Mirror
PhoW2V_words_300dimsWord-level300Mirror

By downloading the PhoW2V embeddings, USER agrees:

Note