Awesome
Installation
- Download 全宋體 - FSung font.zip<br> 2a. On Windows 10 or later: Open the zip file, then right-click on each font, and choose "Install".<br> 2b. On Linux, extract to /usr/share/fonts .<br> 2c. On MacOs 10.3 or later: Open the zip file, click on each font to open it, then click "Install" at the bottom right-hand corner.<br><br>
More Information
Information from original webpage, by WFG , accessed 2022-02-03. English translation (by Google) is followed by original in Chinese.
<div class="post hentry uncustomized-post-template" itemprop="blogPost" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"> <meta content="https://blogger.googleusercontent.com/img/a/AVvXsEgaSImM9aLNaKNqSIm7p5pyeMl3AgMp9qGqaijLaYigm6udK91sMLhjRzkXRK8JaeeGWR9cXXgCe0JojchDMIIUNgIpqVioF2RG-3zntNfnxxlZPBmvZgrQ1z51S_gUr6lVU5lzVHnUM-6fxQGZAxXXpMobqwkyU5PAzdLj05bNb7mWa3Ho5B3e_RNy=w640-h326" itemprop="image_url"> <meta content="8083418832420346104" itemprop="blogId"> <meta content="8841634836748254500" itemprop="postId"> <a name="8841634836748254500"></a> <h3 class="post-title entry-title" itemprop="name"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> The establishment of the environment for the use of Chinese characters - the first draft of 170,000 Chinese characters debuts </font></font></h3> <div class="post-header"> <div class="post-header-line-1"></div> </div> <div class="post-body entry-content" id="post-body-8841634836748254500" itemprop="description articleBody"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> Tuesday, December 28, 2021<br><br>At the beginning of July, after my brother suns99 and I completed the cleaning of the prefixes of "Zhonghuazi Hai", we took a short break and then devoted ourselves to the cleaning of the prefixes of the "Dictionary of Variant Characters of the Ministry of Education". </font><font style="vertical-align: inherit;">Due to the higher difficulty of cleaning up the prefixes of the "Dictionary of Variants of the Ministry of Education", I estimated that it would be difficult for me to complete it in a short time, so I sent a letter to the maintenance unit of the "Dictionary of Variants of the Ministry of Education" on July 19- The Orthodox Academy, applied for the configuration data of the prefixes of the "Dictionary of Variants". </font><font style="vertical-align: inherit;">On July 30, I received a reply letter, and the Orthodox Academy agreed to provide the configuration data of the prefixes of the "Dictionary of Variant Characters" for me to sort out. The actual data was received on October 15. </font><font style="vertical-align: inherit;">At the same time, I extracted the prefixes of the "Variant Character Dictionary" that have not been restored to make a working file (previously, more than 60,000 words have been sorted and restored intermittently, and after deducting 13,830 handwritten glyphs, there are still 35,046 words to be cleaned up) , every 5,000 words is a package, divided into seven packages, brother suns99 uses the Cangjie input method to check and clean up word by word (each package takes about two weeks on average, the average restoration rate is slightly lower than 50%), and each completed package is sent back Give it to me, I'll double-check the reducible prefixes to make sure it's correct. </font><font style="vertical-align: inherit;">It took three months and the initial cleanup was completed in mid-October. </font><font style="vertical-align: inherit;">According to statistics, there are 73,803 characters of prefixes that can be retrieved by using the existing full-text font database, and a total of 18,366 characters of unreceived prefixes that need to be added to the font database. </font><font style="vertical-align: inherit;">Then I sorted out the configuration data provided by the Orthodox Academy and fit it into my word list. Finally, I extracted 18,366 unreceived prefix data and added them to the "Parts Search". So far, the large-scale font database of "Complete Song Dynasty" has been collected. The word officially broke through 170,000 Chinese characters, which should be enough for most Chinese characters.</font></font><span><a name="more"></a></span><div class="separator" style="clear: both; text-align: center;"><br></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgaSImM9aLNaKNqSIm7p5pyeMl3AgMp9qGqaijLaYigm6udK91sMLhjRzkXRK8JaeeGWR9cXXgCe0JojchDMIIUNgIpqVioF2RG-3zntNfnxxlZPBmvZgrQ1z51S_gUr6lVU5lzVHnUM-6fxQGZAxXXpMobqwkyU5PAzdLj05bNb7mWa3Ho5B3e_RNy=s1367" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="697" data-original-width="1367" height="326" src="https://blogger.googleusercontent.com/img/a/AVvXsEgaSImM9aLNaKNqSIm7p5pyeMl3AgMp9qGqaijLaYigm6udK91sMLhjRzkXRK8JaeeGWR9cXXgCe0JojchDMIIUNgIpqVioF2RG-3zntNfnxxlZPBmvZgrQ1z51S_gUr6lVU5lzVHnUM-6fxQGZAxXXpMobqwkyU5PAzdLj05bNb7mWa3Ho5B3e_RNy=w640-h326" width="640"></a></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Many friends have been wondering why I spend so much time sorting out these "almost unusable" uncommon Chinese characters, and even many of my friends reject the use of these privately created characters that are "not officially included in Unicode". </font><font style="vertical-align: inherit;">One of the main reasons I have been obsessed with cleaning up the prefixes of the "Dictionary of Variants" is that the marginal benefits of this work are actually very high:</font></font></div><div><ul style="text-align: left;"><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">It can restore the picture prefix of "Variant Character Dictionary" and become the searchable prefix of plain text. </font><font style="vertical-align: inherit;">This can greatly improve the retrieval and utilization efficiency of these variant characters.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">A horizontal relationship of 100,000 Chinese characters can be established. </font><font style="vertical-align: inherit;">Once the image prefixes are restored to searchable plain text prefixes, using the variant list of the "Variant Character Dictionary", the horizontal relationship of these 100,000 Chinese characters can be linked, which is helpful for the horizontal expansion of dictionary queries.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">The glyphs of existing fonts can be optimized by using reducible prefixes (indicating that existing fonts are available). </font><font style="vertical-align: inherit;">At present, the font sources of the font library are diverse, and some of them are of poor quality.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">The irreversible prefixes (indicating that the existing fonts are confiscated) can be supplemented into the fonts to increase the number of supplementary words. </font><font style="vertical-align: inherit;">Most of the prefixes of the "Variant Character Dictionary" come from the character books of the past dynasties, with complete documentary evidence. With these supplementary characters, the character books and documents of the past dynasties can be digitized more accurately.</font></font></li></ul></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">It is not urgent to wait for Unicode to officially include a large number of ancient books. Moreover, if no one organizes and submits, Unicode official will not be able to start. </font><font style="vertical-align: inherit;">Therefore, I have compiled a large number of Chinese character dictionaries with the largest collection of characters in the past year, hoping to absorb the achievements of these predecessors in the shortest time, summarize and convert them into reusable resources, and build a convenient to use and easy to retrieve. The large-scale free Chinese character platform is convenient for the general public, amateurs, and academic researchers to quickly use it.</font></font></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">In the past, AINet in Japan developed a commercial East Asian text retrieval software called "Present and Past Word Mirror". From 1985 to 2019, the last version included more than 170,000 texts (according to Wikipedia records, 2018 Tokio Kouji, the president of the company, passed away, and Tadahisa Ishikawa took over. The meeting ended the following year, and the "Writing Mirror of Past and Present" officially ended). </font><font style="vertical-align: inherit;">However, its collection of characters includes oracle bone script, seal script, italic script, Nan character, Shui script, Xitan script, Xixia script, variant pseudonym, etc. It is not purely Chinese characters, if it is purely Chinese characters that do not repeat, Should be less than 170,000. </font><font style="vertical-align: inherit;">Now, my completely free and open "Complete Song Dynasty" font library, the scale of Chinese characters should actually surpass that of "The Mirror of Past and Present", and I hope to become a more useful Chinese character platform for everyone.</font></font></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Because the configuration data of the "Variant Character Dictionary" provided by the Orthodox Academy is incomplete, many parts that cannot be input and displayed have been directly omitted, resulting in most of the configuration data being "missing arms and missing legs" (I Go to the Han Guojiao Institute to confirm, it is indeed). </font><font style="vertical-align: inherit;">This also confirms why I use the official website's configuration check character to check the characters, and there are often cases where I can't find them. </font><font style="vertical-align: inherit;">In order to be quickly usable, I can only roughly fix some serious defects first, and then hard-insert these flawed split data, so that these 18,366 new words have a chance to be retrieved (the retrieval results may be temporary. Like the official website, there will be incorrect situations), and then slowly check and correct the data word by word, and change it as you use it. </font><font style="vertical-align: inherit;">From mid-October to the present, it took me two months to check and correct the split data of more than 2,000 new words. There are still 16,000 words to be checked. It is estimated that all new words will be improved. It will take at least more than a year to add words and split the data, and we can only play the spirit of "The Foolish Old Man Moves the Mountains" once again, and the long-term war of resistance will take place.</font></font></div><div><br></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhwJnf5xF_d_GLYrZPDk2hMjZFeNgASAOLr-uJIqW8HqtJd7M_7dv3AxUaVjVc6io6y0kjqjeu_lhRK2XKsiJLAk0DsJ9a2iaYdPmV2wuzhEztzTByEBr2KjvmUQG5N47igfVzDNz1GSlwuKZ4tVYl_X-_FF0Y5VdTnlgfysmCpK3vr8BF0s1Y6llBp=s631" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="631" height="344" src="https://blogger.googleusercontent.com/img/a/AVvXsEhwJnf5xF_d_GLYrZPDk2hMjZFeNgASAOLr-uJIqW8HqtJd7M_7dv3AxUaVjVc6io6y0kjqjeu_lhRK2XKsiJLAk0DsJ9a2iaYdPmV2wuzhEztzTByEBr2KjvmUQG5N47igfVzDNz1GSlwuKZ4tVYl_X-_FF0Y5VdTnlgfysmCpK3vr8BF0s1Y6llBp=w400-h344" width="400"></a></div><div class="separator" style="clear: both; text-align: center;"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">( </font></font><span style="text-align: left;"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">The configuration data provided by the Orthodox Academy is </font></font></span><span style="text-align: left;"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">often </font></font></span><span style="text-align: left;"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">"missing arms and missing legs"</font></font></span><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> )</font></font></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">During the period, some netizens asked about a missing word "⿱蹹" in the Taiwanese version of the hymn. After brother Jian's advice, I went to the </font></font><a href="https://cb.fhl.net/" target="_blank"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Taiwan Bible Society Bible website</font></font></a><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> to find " </font></font><a href="https://cb.fhl.net/openhan31.zip" target="_blank"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Taiwan-Hakka Chinese Character Fonts Version 3.1</font></font></a><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> " as a reference. Clean up its 123 foreign characters, minus 17 Taiwanese phonetic characters and 106 characters, plus "⿉ spit", there are 75 missing characters, all of them are re-created in the Song style style and added to the font library, so that the font library can also be used. Covers some special Chinese characters in Taiwanese and Hakka languages. </font><font style="vertical-align: inherit;">A comparison table is specially attached, so that friends who use these words can quickly switch between the two fonts.</font></font></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">At the end of the year, I will publish the first draft of this "imperfect" "Full Song Dynasty" font library, so that everyone can use it first, and for this year, my brother suns99 and I have continuously challenged "Chinese Character Sea" and "Chinese Characters". Ending the clean-up work of the three most-received dictionaries, "Sea of ​​Characters" and "Dictionary of Variant Characters of the Ministry of Education". </font><font style="vertical-align: inherit;">This year, almost non-stop, a total of 96,175 prefixes of the three major dictionaries have been cleaned up, and 54,620 new words have been added to the font database, covering all the prefixes of "Chinese Character Sea", "Chinese Character Sea", and "Ministry of Education Variant Character Dictionary" Ninety percent of the prefixes (and 10% are handwritten fonts, which have not yet been processed), and the results can be described as fruitful. </font><font style="vertical-align: inherit;">I would also like to thank my brother suns99 again for his uncompromising loyalty over the past year, accompanying me as a "fool" in the charge, and completing one "impossible task", hehe!</font></font></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">In the next year, I should focus on continuing to improve the split data of the font database. As for the 13,830 handwritten fonts in the "Variant Character Dictionary" that have not yet been processed, although I have prepared temporary fonts for work, it may still be temporarily. put on hold. </font><font style="vertical-align: inherit;">After all, after this one year, my brother suns99 and I have been "fatigued". Even if brother suns99 is willing to continue to help, if the previous debt is not cleared and the debt is superimposed, I can't bear it. </font><font style="vertical-align: inherit;">Therefore, in the next year, we will focus on "paying off debts", and as for "unfinished achievements", we have to "see and go".</font></font></div><div><br></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgn-Bc-IsPCJs6HCW4VUPOhL0YEaOOMb0IkMdCypf502iW3kd3SIlClN_qBcYflXKmu01LEkIm4D-vXJLZs_yXrfGTx4c38kNjIMqVC2z94dCLXvYGwoaWYKQ2XcA10WoiF-STYS4hA61KJrZJaxFgX8DHXV3f7-tlyRGj7hUlEGlookkgoz91q684j=s818" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="818" data-original-width="472" src="https://blogger.googleusercontent.com/img/a/AVvXsEgn-Bc-IsPCJs6HCW4VUPOhL0YEaOOMb0IkMdCypf502iW3kd3SIlClN_qBcYflXKmu01LEkIm4D-vXJLZs_yXrfGTx4c38kNjIMqVC2z94dCLXvYGwoaWYKQ2XcA10WoiF-STYS4hA61KJrZJaxFgX8DHXV3f7-tlyRGj7hUlEGlookkgoz91q684j=s16000"></a></div><div class="separator" style="clear: both; text-align: center;"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">(Temporarily shelved </font><span style="text-align: left;"><font style="vertical-align: inherit;">handwritten glyphs of the </font></span></font><span style="text-align: left;"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">"Variant Character Dictionary"</font></font></span><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> )</font></font><span style="text-align: left;"><font style="vertical-align: inherit;"></font></span><font style="vertical-align: inherit;"></font></div><br><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Since the "Dictionary of Variants of the Ministry of Education" has not been openly licensed like the "Mandarin Dictionary", I cannot make it an offline dictionary for everyone to use (infringement will be involved). </font><font style="vertical-align: inherit;">The compromise method is that I discard all the content of the definition, leave only the prefix, and make a "Ministry of Education Variant Character Index Dictionary", which is convenient for everyone to use the "component retrieval" to check the words, and then use this index dictionary to look up the words. After that, you can automatically jump to the official page by clicking the font size link (I once went to the Orthodox Academy to ask for advice, and jumped to the official page by linking, there should be no infringement problem). </font><font style="vertical-align: inherit;">Although it is not the most perfect, it can still achieve a better and more convenient user experience than the official website. I hope this "Ministry of Education Variant Character Index Dictionary" can help you use the professional-level Chinese character "Ministry of Education Variant Character Dictionary" more conveniently Resources ( </font></font><a href="https://fgwang.blogspot.com/2021/12/blog-post_29.html"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">"Ministry of Education Variant Index Dictionary"</font></font></a><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> will be published separately).</font></font></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">I am happy to see the application of academic research, educational work, and personal reading, but </font></font><span style="color: red;"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">please do not use it for any form of commercial profit</font></font></span><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> . </font><font style="vertical-align: inherit;">It is hoped that the large-scale character library "Quan Song Style" and the character checking tool "Component Retrieval" can help a little in the sorting and research of Chinese character culture.</font></font></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Download link: </font></font><a href="https://drive.google.com/file/d/1m0-WYAXbEz3lxJrti25ZvWv6LkHjMp2X/view?usp=sharing"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Full Song Dynasty.zip</font></font></a></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Download link: </font></font><a href="https://drive.google.com/file/d/1kCSZzPBndZNKyhTrsqLo58ZEChpFya5B/view?usp=sharing"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Parts Retrieval (Beta).7z</font></font></a></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Download link: </font></font><a href="https://drive.google.com/file/d/1y74W62N-mIcl9r6H63oXkzV3aC4YDQRP/view?usp=sharing"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Cangjie Code Table.7z</font></font></a><font style="vertical-align: inherit;"><font style="vertical-align: inherit;"> (Due to the different habits of each person, only the Chinese characters are reserved, please incorporate your own code table)</font></font></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Download link: </font></font><a href="https://drive.google.com/file/d/1Na8R0kp1mYatdcnEkHl1SpTUwGcmaij0/view?usp=sharing"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Taiwan-Hakka Chinese and foreign characters comparison table.7z</font></font></a></div><div><br></div><div><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Finally, attach some work logs recorded during the finishing process here as a memory and remembrance:</font></font></div><div><ul style="text-align: left;"><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/07/19 Make a temporary working font file of the 35046 characters to be cleaned up, and extract the unrestored prefixes to make a working file. Every 5,000 characters are divided into seven packets and sent to Brother suns99, officially started the job. </font><font style="vertical-align: inherit;">In the evening, I went to the Orthodox Academy to apply for the configuration data of the prefixes of the "Dictionary of Variants".</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/07/20 Received a canned reply from the Orthodox Academy, saying that the application has been received and will be processed after they have discussed it.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/07/30 Received a reply from the Orthodox Academy agreeing to provide data.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/08/01 The first package has been checked, and a total of 5,000 words have been cleared.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/08/13 The verification of the second package is completed, a total of 10,000 words have been cleaned up, and the restoration rate is about 48%.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/08/24 The verification of the 3rd package has been completed, and a total of 15,000 words have been cleared.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/09/05 The 4th package has been checked, and a total of 20,000 words have been cleared.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/09/17 The 5th package has been checked, and a total of 25,000 words have been cleared.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/09/29 The 6th package has been checked, and a total of 30,000 words have been cleared.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/10/04 I have not received the configuration data of the Orthodox Academy for a long time, so I sent a letter to inquire again.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/10/11 The 7th package has been checked, and a total of 35,000 words have been cleared.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/10/15 Finally received the configuration data of the Orthodox Academy. </font><font style="vertical-align: inherit;">Thank you in return.</font></font></li><li><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2021/10/18 Preliminarily complete the font coding and component retrieval of the newly added characters, and restore 90% of the prefixes of the "Variant Character Dictionary".</font></font></li><li>2021/11/12 完成一千多字拆分清理。去函國教院請教數據瑕疵及授權問題。</li><li>2021/11/20 網友問起台語版聖詩的一個缺字「⿱艹吐」,去信向簡兄請教。</li><li>2021/11/22 國教院回覆數據瑕疵及授權問題。</li><li>2021/12/13 完成75個臺、客語特用漢字的造字。</li><li>2021/12/24 完成兩千多字拆分清理。</li></ul><br><br></div><div>p.s. 由於收字量龐大,第 15 字面(FSung-F.ttf)的空間已經完全用罄,因此這一版字型開始啟用第 16 字面(FSung-X.ttf),接續存放補充字字形。</div><div><br></div><div><br></div><div><br></div><div></div><div><br></div><div><br></div><div><br></div>
<div style="clear: both;"></div> </div> <div class="post-footer"> <div class="post-footer-line post-footer-line-1"> <span class="post-author vcard"> 作者: <span class="fn" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"> <meta content="https://www.blogger.com/profile/14004240365298046569" itemprop="url"> <a class="g-profile" href="https://www.blogger.com/profile/14004240365298046569" rel="author" title="author profile" data-gapiscan="true" data-onload="true" data-gapiattached="true"> <span itemprop="name">WFG</span> </a> </span> </span> <span class="post-timestamp"> 於 <meta content="http://fgwang.blogspot.com/2021/12/blog-post.html" itemprop="url"> <a class="timestamp-link" href="http://fgwang.blogspot.com/2021/12/blog-post.html" rel="bookmark" title="permanent link"><abbr class="published" itemprop="datePublished" title="2021-12-28T14:45:00+08:00">下午2:45</abbr></a> </span> <span class="post-comment-link"> </span> <span class="post-icons"> <span class="item-control blog-admin pid-149868981"> <a href="https://www.blogger.com/post-edit.g?blogID=8083418832420346104&postID=8841634836748254500&from=pencil" title="編輯文章"> <img alt="" class="icon-action" height="18" src="https://resources.blogblog.com/img/icon18_edit_allbkg.gif" width="18"> </a> </span> </span> <div class="post-share-buttons goog-inline-block"> </div> </div> <div class="post-footer-line post-footer-line-2"> <span class="post-labels"> 分類: <a href="http://fgwang.blogspot.com/search/label/%E6%BC%A2%E5%AD%97%E7%92%B0%E5%A2%83" rel="tag">漢字環境</a> </span> </div> <div class="post-footer-line post-footer-line-3"> <span class="post-location"> </span> </div> </div> </div> <div class="post hentry uncustomized-post-template" itemprop="blogPost" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"> <meta content="https://blogger.googleusercontent.com/img/a/AVvXsEgaSImM9aLNaKNqSIm7p5pyeMl3AgMp9qGqaijLaYigm6udK91sMLhjRzkXRK8JaeeGWR9cXXgCe0JojchDMIIUNgIpqVioF2RG-3zntNfnxxlZPBmvZgrQ1z51S_gUr6lVU5lzVHnUM-6fxQGZAxXXpMobqwkyU5PAzdLj05bNb7mWa3Ho5B3e_RNy=w640-h326" itemprop="image_url"> <meta content="8083418832420346104" itemprop="blogId"> <meta content="8841634836748254500" itemprop="postId"> <a name="8841634836748254500"></a> <h3 class="post-title entry-title" itemprop="name"> 漢字使用環境的建置 ——十七萬漢字初稿登場 </h3> <div class="post-header"> <div class="post-header-line-1"></div> </div> <div class="post-body entry-content" id="post-body-8841634836748254500" itemprop="description articleBody"> 2021年12月28日 星期二<br><br>七月初我與 suns99 兄完成了《中華字海》字頭的清理工作後,略事休息,接著又投入了《教育部異體字字典》的字頭清理工作。由於《教育部異體字字典》的字頭清理工作難度更高,我估計憑我二人之力難以在短時間完成,於是便在7月19日去函《教育部異體字字典》的維護單位——國教院,申請《異體字字典》字頭的構形數據。7月30日收到回函,國教院同意提供《異體字字典》字頭的構形數據供我整理之用,實際收到數據已是在10月15日。與此同時,我將尚未還原的《異體字字典》字頭摘錄出來做成工作檔(先前已斷續整理還原了六萬多字,再扣除掉 13830 個手寫字形後,還有 35046 字待清理),每五千字一包,切分成七個包,suns99 兄用倉頡輸入法逐字核對清理(每包平均約花兩週時間,平均還原率略低於 50%),每完成一包發回給我,我再針對可還原的字頭覆核一遍,確保還原的正確性。花了三個月的時間,十月中完成了初步的清理工作。經過統計,利用既有全宋體字庫可檢索的字頭計有 73803 字,需新增至字庫的未收字頭共有 18366 字。然後我將國教院提供的構形數據略事整理,吻合進我的字表,最後提取 18366 個未收字頭數據,加進「部件檢索」裡,至此「全宋體」這個大型字庫,收字正式突破了十七萬漢字,應該足供大多數的漢字應用。<span><a name="more"></a></span><div class="separator" style="clear: both; text-align: center;"><br></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgaSImM9aLNaKNqSIm7p5pyeMl3AgMp9qGqaijLaYigm6udK91sMLhjRzkXRK8JaeeGWR9cXXgCe0JojchDMIIUNgIpqVioF2RG-3zntNfnxxlZPBmvZgrQ1z51S_gUr6lVU5lzVHnUM-6fxQGZAxXXpMobqwkyU5PAzdLj05bNb7mWa3Ho5B3e_RNy=s1367" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="697" data-original-width="1367" height="326" src="https://blogger.googleusercontent.com/img/a/AVvXsEgaSImM9aLNaKNqSIm7p5pyeMl3AgMp9qGqaijLaYigm6udK91sMLhjRzkXRK8JaeeGWR9cXXgCe0JojchDMIIUNgIpqVioF2RG-3zntNfnxxlZPBmvZgrQ1z51S_gUr6lVU5lzVHnUM-6fxQGZAxXXpMobqwkyU5PAzdLj05bNb7mWa3Ho5B3e_RNy=w640-h326" width="640"></a></div><div><br></div><div>有不少朋友一直奇怪我為何要花那麼多時間去整理這些「幾乎用不到的」生僻漢字,甚至很多朋友很排斥使用這些「Unicode 官方沒有收錄」的私造字。我一直執著地要清理《異體字字典》的字頭,一個很主要的原因便是這項工作的邊際效益其實很高:</div><div><ul style="text-align: left;"><li>可以還原《異體字字典》的圖片字頭,成為純文字的可檢索字頭。這可以大大提昇這些異體字的檢索、利用效率。</li><li>可以建立起十萬漢字的橫向聯繫關係。一旦圖片字頭還原成可檢索的純文字字頭,利用《異體字字典》的異體表列,便可以將這十萬漢字的橫向關係聯繫起來,有助於字典查詢時的橫向擴展。</li><li>可以利用可還原的字頭(表示既有字庫有收)來優化既有字庫的字形。目前字庫的字形來源多元,有些質量很差,《異體字字典》的字形質量較高,可以進行替代優化。</li><li>可以將不可還原的字頭(表示既有字庫沒收)補進字庫,擴增補充字的數量。《異體字字典》的字頭多半來自歷代字書,有完整書證,有了這些補充字,便能更精確地數位化歷代字書、文獻。</li></ul></div><div>要等待 Unicode 官方收錄大量古籍用字,緩不濟急,況且若是沒人整理提交,Unicode 官方也無從收錄起。所以這一年來我大量整理幾本收字量最大的漢字字典,就是希望在最短時間內,吸納這些前人的成果,將它們匯總轉化成可再利用的資源,建立一個方便使用、容易檢索的大型免費漢字平台,方便讓一般大眾、業餘愛好者、學術研究者都能快速地加以利用。</div><div><br></div><div>昔日日本的AINet開發了一款商業販售的東亞文字檢索軟體名為「今昔文字鏡」,從 1985 年至 2019 為止,最後的版本收錄文字達十七萬以上(據維基百科的記錄,2018年社長古家時雄病逝,改由石川忠久接手,於次年散會,「今昔文字鏡」正式落幕)。不過它的收字包含了甲骨文、篆體字、楷體字、喃字、水族文字、悉曇文字、西夏文字、變體假名等等,不純粹都是漢字,若是以不重複的純粹漢字而言,應該不到十七萬之數。現下我這個完全免費開放的「全宋體」字庫,漢字的收字規模實際上應該已經超越了「今昔文字鏡」,希望能成為對大家更為有用的漢字平台。</div><div><br></div><div>由於國教院提供的《異體字字典》構形數據並不完整,很多無法輸入、顯示的部件都被直接略去,導致大部分的構形數據都是「缺了胳膊,少了腿」(我去函國教院確認,確實如此)。這也印證了為什麼我利用官網的構形檢字來查字,經常會有查不到的情形。為求快速可用,我只能大致先修補一些較嚴重的缺失,然後就硬套入這些帶有瑕疵的拆分數據,先求讓這 18366 個新增字有被檢索的機會(檢索結果可能暫時跟官網一樣會有不正確的情形),之後再慢慢逐字檢查、修正數據,邊用邊改。從十月中到現在,花了兩個月的時間,我獨力檢查、修正了兩千多個新增字的拆分數據,後續尚有一萬六千字待檢,估計要完善全部的新增字拆分數據,起碼還要一年多的時間,只能再一次地發揮「愚公移山」的精神,長期抗戰了。</div><div><br></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhwJnf5xF_d_GLYrZPDk2hMjZFeNgASAOLr-uJIqW8HqtJd7M_7dv3AxUaVjVc6io6y0kjqjeu_lhRK2XKsiJLAk0DsJ9a2iaYdPmV2wuzhEztzTByEBr2KjvmUQG5N47igfVzDNz1GSlwuKZ4tVYl_X-_FF0Y5VdTnlgfysmCpK3vr8BF0s1Y6llBp=s631" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="631" height="344" src="https://blogger.googleusercontent.com/img/a/AVvXsEhwJnf5xF_d_GLYrZPDk2hMjZFeNgASAOLr-uJIqW8HqtJd7M_7dv3AxUaVjVc6io6y0kjqjeu_lhRK2XKsiJLAk0DsJ9a2iaYdPmV2wuzhEztzTByEBr2KjvmUQG5N47igfVzDNz1GSlwuKZ4tVYl_X-_FF0Y5VdTnlgfysmCpK3vr8BF0s1Y6llBp=w400-h344" width="400"></a></div><div class="separator" style="clear: both; text-align: center;">(<span style="text-align: left;">國教院提供的</span><span style="text-align: left;">構形數據經常</span><span style="text-align: left;">「缺了胳膊,少了腿」</span>)</div><div><br></div><div>期間有網友問起了台語版聖詩的一個缺字「⿱艹吐」,經過簡兄的指點,我到<a href="https://cb.fhl.net/" target="_blank">台灣聖經公會聖經網站</a>找來了「<a href="https://cb.fhl.net/openhan31.zip" target="_blank">臺客語漢字字型3.1版</a>」作為參考,將它的 123 個外字,扣除 17 個台語注音字符外的 106 字清理一遍,加上「⿱艹吐」計有 75 個缺字,全部以宋體風格重新造字補入字庫,讓字庫也能涵蓋臺、客語的一些特用漢字。特別附上對照表,讓有使用這些字的朋友可以在兩種字庫之間快速地轉換。</div><div><br></div><div>值此歲末年終,我先將這「並不完善」的「全宋體」字庫初稿發布出來,讓大家能先行使用,也為今年一年我與 suns99 兄連續挑戰了《漢字海》、《中華字海》、《教育部異體字字典》三部收字最多字典的清理工作做一個 Ending。這一年,幾乎馬不停蹄,總計清理了三大字典的 96175 個字頭,為字庫新增了 54620 字,涵蓋了《漢字海》、《中華字海》所有字頭,以及《教育部異體字字典》的九成字頭(還有一成是手寫字形字頭,尚未處理),成績可謂豐碩。在此也要再次向 suns99 兄致謝,感謝他這一年來沒有二話的義氣相挺,陪著我這個「傻子」衝鋒陷陣,完成了一項項的「不可能任務」,呵呵!</div><div><br></div><div>未來一年,我應該會將重點放在繼續完善字庫的拆分數據上,至於尚未處理的 13830 個《異體字字典》手寫字形,雖然我已經做好了工作用的臨時字型,可能還是會暫時予以擱置。畢竟經此一年,我與 suns99 兄已經「兵困馬疲」,就算 suns99 兄還願意再繼續幫忙,若是前債未清後債又疊加上來,我也負荷不了。所以未來一年先以「還債」為主,至於「未竟之功」只好「且看且走」了。</div><div><br></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgn-Bc-IsPCJs6HCW4VUPOhL0YEaOOMb0IkMdCypf502iW3kd3SIlClN_qBcYflXKmu01LEkIm4D-vXJLZs_yXrfGTx4c38kNjIMqVC2z94dCLXvYGwoaWYKQ2XcA10WoiF-STYS4hA61KJrZJaxFgX8DHXV3f7-tlyRGj7hUlEGlookkgoz91q684j=s818" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="818" data-original-width="472" src="https://blogger.googleusercontent.com/img/a/AVvXsEgn-Bc-IsPCJs6HCW4VUPOhL0YEaOOMb0IkMdCypf502iW3kd3SIlClN_qBcYflXKmu01LEkIm4D-vXJLZs_yXrfGTx4c38kNjIMqVC2z94dCLXvYGwoaWYKQ2XcA10WoiF-STYS4hA61KJrZJaxFgX8DHXV3f7-tlyRGj7hUlEGlookkgoz91q684j=s16000"></a></div><div class="separator" style="clear: both; text-align: center;">(暫時擱置的<span style="text-align: left;">《異體字字典》</span><span style="text-align: left;">手寫字形</span>)</div><br><div>由於《教育部異體字字典》尚未像《國語辭典》一樣開放授權,我無法將其製作成離線辭典開放給大家使用(會涉及侵權)。折衷的辦法是我捨去所有釋義的內容,只留下字頭,做成一部《教育部異體字索引字典》,方便大家以「部件檢索」檢字之後,利用這個索引字典來查字,查得後點擊字號連結便能自動跳轉至官方頁面(我曾去函向國教院請教,以連結的方式跳轉至官方頁面,應該沒有侵權的問題)。雖然不是最完美,但還是能達到比官網查字更好、更便利的使用體驗,希望這部《教育部異體字索引字典》能幫助大家更方便地利用《教育部異體字字典》這個專業級的漢字資源(<a href="https://fgwang.blogspot.com/2021/12/blog-post_29.html">《教育部異體字索引字典》</a>將另文發布)。</div><div><br></div><div>樂見學術研究、教育工作、個人閱讀這方面的運用,但<span style="color: red;">請勿用做任何形式的商業營利行為</span>。希望「全宋體」這個大型字庫以及「部件檢索」這個檢字工具,能在漢字文化的整理、研究上幫上一點小忙。</div><div><br></div><div>下載連結:<a href="https://drive.google.com/file/d/1m0-WYAXbEz3lxJrti25ZvWv6LkHjMp2X/view?usp=sharing">全宋體.zip</a></div><div>下載連結:<a href="https://drive.google.com/file/d/1kCSZzPBndZNKyhTrsqLo58ZEChpFya5B/view?usp=sharing">部件檢索(測試版).7z</a></div><div>下載連結:<a href="https://drive.google.com/file/d/1y74W62N-mIcl9r6H63oXkzV3aC4YDQRP/view?usp=sharing">倉頡碼表.7z</a> (由於每個人的習慣不同,僅保留漢字部分,請自行併入您慣用的碼表)</div><div>下載連結:<a href="https://drive.google.com/file/d/1Na8R0kp1mYatdcnEkHl1SpTUwGcmaij0/view?usp=sharing">臺客語漢字外字對照表.7z</a></div><div><br></div><div>最後,將一些整理過程中記錄的工作日誌附在這裡,做為一個回憶與紀念:</div><div><ul style="text-align: left;"><li>2021/07/19 將欲清理的 35046 字製成一個臨時性的工作字型檔,並將尚未還原的字頭摘錄出來做成工作檔,每五千字一包,切分成七包,發給 suns99 兄,正式啟動作業。晚上去函國教院,申請《異體字字典》字頭的構形數據。</li><li>2021/07/20 收到了國教院的罐頭回信,表示收到申請,要等待他們研議後處理。</li><li>2021/07/30 收到國教院回函,同意提供數據。</li><li>2021/08/01 第1包核對完成,累計清理五千字。</li><li>2021/08/13 第2包核對完成,累計清理了一萬字,還原率約四成八。</li><li>2021/08/24 第3包核對完成,累計清理一萬五千字。</li><li>2021/09/05 第4包核對完成,累計清理兩萬字。</li><li>2021/09/17 第5包核對完成,累計清理兩萬五千字。</li><li>2021/09/29 第6包核對完成,累計清理三萬字。</li><li>2021/10/04 遲遲未收到國教院的構形數據,再度去函詢問。</li><li>2021/10/11 第7包核對完成,累計清理三萬五千字。</li><li>2021/10/15 終於收到國教院的構形數據。回函致謝。</li><li>2021/10/18 初步完成新增字的字形編碼及部件檢索,還原九成的《異體字字典》字頭。</li><li>2021/11/12 完成一千多字拆分清理。去函國教院請教數據瑕疵及授權問題。</li><li>2021/11/20 網友問起台語版聖詩的一個缺字「⿱艹吐」,去信向簡兄請教。</li><li>2021/11/22 國教院回覆數據瑕疵及授權問題。</li><li>2021/12/13 完成75個臺、客語特用漢字的造字。</li><li>2021/12/24 完成兩千多字拆分清理。</li></ul><br><br></div><div>p.s. 由於收字量龐大,第 15 字面(FSung-F.ttf)的空間已經完全用罄,因此這一版字型開始啟用第 16 字面(FSung-X.ttf),接續存放補充字字形。</div><div><br></div><div><br></div><div><br></div><div></div><div><br></div><div><br></div><div><br></div>
<div style="clear: both;"></div> </div> <div class="post-footer"> <div class="post-footer-line post-footer-line-1"> <span class="post-author vcard"> 作者: <span class="fn" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"> <meta content="https://www.blogger.com/profile/14004240365298046569" itemprop="url"> <a class="g-profile" href="https://www.blogger.com/profile/14004240365298046569" rel="author" title="author profile" data-gapiscan="true" data-onload="true" data-gapiattached="true"> <span itemprop="name">WFG</span> </a> </span> </span> <span class="post-timestamp"> 於 <meta content="http://fgwang.blogspot.com/2021/12/blog-post.html" itemprop="url"> <a class="timestamp-link" href="http://fgwang.blogspot.com/2021/12/blog-post.html" rel="bookmark" title="permanent link"><abbr class="published" itemprop="datePublished" title="2021-12-28T14:45:00+08:00">下午2:45</abbr></a> </span> <span class="post-comment-link"> </span> <span class="post-icons"> <span class="item-control blog-admin pid-149868981"> <a href="https://www.blogger.com/post-edit.g?blogID=8083418832420346104&postID=8841634836748254500&from=pencil" title="編輯文章"> <img alt="" class="icon-action" height="18" src="https://resources.blogblog.com/img/icon18_edit_allbkg.gif" width="18"> </a> </span> </span> <div class="post-share-buttons goog-inline-block"> </div> </div> <div class="post-footer-line post-footer-line-2"> <span class="post-labels"> 分類: <a href="http://fgwang.blogspot.com/search/label/%E6%BC%A2%E5%AD%97%E7%92%B0%E5%A2%83" rel="tag">漢字環境</a> </span> </div> <div class="post-footer-line post-footer-line-3"> <span class="post-location"> </span> </div> </div> </div>