Skip navigation
  • 中文
  • English

DSpace CRIS

  • DSpace logo
  • 首頁
  • 研究成果檢索
  • 研究人員
  • 單位
  • 計畫
  • 分類瀏覽
    • 研究成果檢索
    • 研究人員
    • 單位
    • 計畫
  • 機構典藏
  • SDGs
  • 登入
  • 中文
  • English
  1. National Taiwan Ocean University Research Hub

Automatic Analyzing and Tagging of the Morphological Structures in Seediq Words

瀏覽統計 Email 通知 RSS Feed

  • 簡歷

基本資料

Project title
Automatic Analyzing and Tagging of the Morphological Structures in Seediq Words
Code/計畫編號
MOST109-2221-E019-053
Translated Name/計畫中文名
賽德克語構詞結構之自動解析及標記工作
 
Project Coordinator/計畫主持人
Chuan-Jie Lin
Funding Organization/主管機關
National Science and Technology Council
 
Co-Investigator(s)/共同執行人
宋麗梅
 
Department/Unit
Department of Computer Science and Engineering
Website
https://www.grb.gov.tw/search/planDetail?id=13540042
Year
2020
 
Start date/計畫起
01-08-2020
Expected Completion/計畫迄
31-07-2021
 

Description

Abstract
Taiwanese indigenous languages have been announced as “endangered languages” by UNESCO (United Nations Educational, Scientific and Cultural Organization). The issue of preservation and revitalization of the indigenous languages is gaining attention from the public in recent days. In the days of WWW and AI, using NLP techniques to preserve and develop the indigenous languages will be the possible main stream.Due to the lack of machine-readable data in the indigenous languages, this project plans to start studying morphological analysis in Seediq words as the first step to natural language processing researches in all Taiwanese indigenous languages.Word inflection or morphological forms in Seediq are plentiful, mainly for representing the focus or aspect. This project plans to use morphological structures provided in a grammar book to make a complete set of morphological rules, including the complete set of affixes, the rules for presenting and combining the affixes, the rules of repetition, and the transformation rules of vowels and consonants in affixed words. This project also plans to develop an annotation interface for collecting Seediq sentences and integrates the technique of automatic analysis of morphological structures in Seediq into the interface. The results achieved from this project are as follows.(1) Dataset for morphological analysis in Seediq(2) Rules of affixes(3) Rules of repetitions(4) Transformation rules of vowels and consonants in affixed words(5) Results of preliminary experiments in Chinese abusive language classification(6) A large-scale training set for Chinese abusive language detection and classification 臺灣原住民各族語言都已經被聯合國教科文組織列為瀕危語言,族語保存及振興的問題已受到重視。在現今網際網路及人工智慧盛行的年代,運用自然語言處理技術來協助各項原住民族語言的使用與推廣將是未來趨勢。然而原住民族語電子資源更是缺乏,本計畫擬從賽德克語的構詞結構自動解析工作開始,為未來在原住民族語言上的各種自然語言處理技術研究做準備。賽德克語的詞形變化相當多樣,主要是為了標示動詞焦點或時貌。本計畫預計以語法書提供的構詞結構來整理出所有詞綴相關規則,包括詞綴完整集合、出現位置、組合規則、重疊規則、元音輔音變化規則等等。也預計開發一套構詞結構標記系統,加入賽德克語構詞結構自動解析技術。預期完成的成果描述如下:(1) 賽德克語構詞結構實驗資料集(2) 賽德克語詞綴集合、詞綴出現位置規則、組合規則、變化規則(3) 賽德克語重疊前綴規則(4) 賽德克語加綴後元音或輔音的變化規則(5) 賽德克語構詞結構人工標記系統(6) 賽德克語構詞結構自動解析系統
 
Keyword(s)
賽德克語
構詞結構自動解析
臺灣原住民族語之自然語言處理
Seediq
automatic analysis of morphological structures
natural language processing for Taiwanese indigenous languages
 
瀏覽
  • 機構典藏
  • 研究成果檢索
  • 研究人員
  • 單位
  • 計畫
DSpace-CRIS Software Copyright © 2002-  Duraspace   4science - Extension maintained and optimized by NTU Library Logo 4SCIENCE 回饋