Home

Awesome

FrequencyDictionaries

This repository contains frequency dictionaries in the form of text files, with one word per line.

The repository is organized into two folders:

freq_dicts_dirty

The files in this folder were derived from the LuminosoInsight/wordfreq project. These dictionaries were converted into .txt files with one word per line, ordered by frequency (most frequent words come first). Only words longer than two characters were retained.

The conversion process involved:

  1. Using the jakm/msgpack-cli tool to convert .msgpack files to .json format.
  2. Transforming the .json files into .txt files with one word per line using sed and grep.

freq_dicts_clean

The files in this folder were created by cleaning the dictionaries in the freq_dicts_dirty folder. This process involved removing words not found in the corresponding dictionaries from titoBouzout/Dictionaries.

File Naming Conventions

Licensing

This repository is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

Attribution and Data Licensing

This repository is based on two primary sources:

  1. The rspeer/wordfreq project by Robyn Speer.
  2. Dictionaries from the titoBouzout/Dictionaries repository, originally derived from the OpenOffice dictionary list.

Wordfreq

Dictionaries

Summary of Licensing

The combined content of this repository complies with the terms of the Apache License 2.0 and respects the attribution requirements of the original sources. See NOTICE.md for further details.