http://scholars.ntou.edu.tw/handle/123456789/17887
Title: | A Study on Chinese Spelling Check Using Confusion Sets and N-gram Statistics | Authors: | Chuan-Jie Lin Wei-Cheng Chu |
Keywords: | Chinese Spelling Check;Confusion Set Expansion;Google Ngram Scoring Function. | Issue Date: | 1-Jun-2015 | Journal Volume: | 20 | Journal Issue: | 1 | Start page/Pages: | 23-48 | Abstract: | This paper proposes an automatic method to build a Chinese spelling check system. Confusion sets were expanded by using two language resources, Shuowen Jiezi and the Four-Corner codes, which improved the coverages of the confusion sets. Nine scoring functions which utilize the frequency data in the Google Ngram Datasets were proposed, where the idea of smoothing was also adopted. Thresholds were also decided in an automatic way. The final system achieved far better than our baseline system in CSC 2013 Evaluation Task. |
URI: | http://scholars.ntou.edu.tw/handle/123456789/17887 |
Appears in Collections: | 資訊工程學系 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.