http://scholars.ntou.edu.tw/handle/123456789/17906
Title: | Strategies of Processing Japanese Names and Character Variants in Traditional Chinese Text | Authors: | Chuan-Jie Lin Jia-Cheng Zhan Yen-Heng Chen Chien-Wei Pao |
Keywords: | Semantic Chinese Word Segmentation;Japanese Name Identification;Character Variants. | Issue Date: | Sep-2012 | Publisher: | Computational Linguistics | Journal Volume: | 17 | Journal Issue: | 3 | Start page/Pages: | 87-108 | Source: | Computational Linguistics and Chinese Language Processing | Abstract: | This paper proposes an approach to identify word candidates that are not Traditional Chinese, including Japanese names (written in Japanese Kanji or Traditional Chinese characters) and word variants, when doing word segmentation on Traditional Chinese text. When handling personal names, a probability model concerning formats of names is introduced. We also propose a method to map Japanese Kanji into the corresponding Traditional Chinese characters. The same method can also be used to detect words written in character variants. After integrating generation rules for various types of special words, as well as their probability models, the F-measure of our word segmentation system rises from 94.16% to 96.06%. Another experiment shows that 83.18% of the 862 Japanese names in a set of 109 human-annotated documents can be successfully detected. |
URI: | http://scholars.ntou.edu.tw/handle/123456789/17906 |
Appears in Collections: | 資訊工程學系 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.