Awesome
Conway Stroke Data
A data set compiled manually by Conway (@yawnoc), used in the Android keyboard app Stroke Input Method (筆畫輸入法).
Stroke input method (generic, not the app)
The (generic) stroke input method is found on all dumbphones in HK and surrounds.
It is the simplest Chinese input method in existence. All strokes are classified into 5 types, entered via keypad:
# | Stroke | Type | Comment |
---|---|---|---|
1 | ㇐ | 橫 Horizontal | Includes rises (提) etc. |
2 | ㇑ | 豎 Vertical | |
3 | ㇒ | 撇 Throw | |
4 | ㇔ | 點 Dot | Includes presses (捺) |
5 | ㇖ | 折 Break | Basically everything else |
Contents of this repository
A. Manually compiled data
The following files contain data manually compiled by Conway (@yawnoc):
codepoint-character-sequence.txt
- Tab-separated (code point, character, stroke sequence regex) triplets.
- There are 28k+ entries. Because Conway (@yawnoc) is human, it is highly likely that there are some mistakes; please report these.
- Licensed under CC-BY-4.0.
phrases-traditional.txt
, phrases-simplified.txt
- Lists of common phrases.
- To be sorted by running
sort.py
. - Released into the public domain.
ranking-traditional.txt
, ranking-simplified.txt
- Rankings of commonly used characters.
- Released into the public domain.
B. Automatically generated data
The following files contain data automatically generated
by running generate.py
, which parses codepoint-character-sequence.txt
:
characters-traditional.txt
, characters-simplified.txt
- Lists of traditional-only and simplified-only characters.
- Released into the public domain.
sequence-characters.txt
- Tab-separated (stroke sequence, characters) pairs.
- Licensed under CC-BY-4.0.
C. Scripts
.bash_aliases
- Defines shell functions
s
(search),sp
(search prefix),ss
(search suffix).
generate.py
- Script used to generate
sequence-characters.txt
andcharacters-*.txt
(by parsingcodepoint-character-sequence.txt
). - Licensed under MIT-0.
sort.py
- Script used to sort certain sections of
phrases-*.txt
. - Licensed under MIT-0.
D. Tests
test_generate.py
- Unit tests for
generate.py
. - Licensed under MIT-0.
test_sort.py
- Unit tests for
sort.py
. - Licensed under MIT-0.
Miscellanea for convenient reference (in comments)
Unicode strokes
CJK Strokes (Unicode block) (U+31C0
to U+31E3
):
㇀㇁㇂㇃㇄㇅㇆㇇㇈㇉㇊㇋㇌㇍㇎㇏
㇐㇑㇒㇓㇔㇕㇖㇗㇘㇙㇚㇛㇜㇝㇞㇟
㇠㇡㇢㇣
Unicode composition
Ideographic Description Characters (Unicode block) (U+2FF0
to U+2FFB
):
⿰⿱⿲⿳⿴⿵⿶⿷⿸⿹⿺⿻