Detecting and Rephrasing Profanity in Chinese

瀏覽統計 Email 通知 RSS Feed

簡歷

基本資料

Project title

Code/計畫編號

MOST107-2221-E019-038

Translated Name/計畫中文名

自動偵測並改寫中文不雅文字

Project Coordinator/計畫主持人

Chuan-Jie Lin

Funding Organization/主管機關

National Science and Technology Council

Department/Unit

Department of Computer Science and Engineering

Website

https://www.grb.gov.tw/search/planDetail?id=12669884

Year

2018

Start date/計畫起

01-08-2018

Expected Completion/計畫迄

31-07-2019

Bugetid/研究經費

670千元

ResearchField/研究領域

資訊科學--軟體

在現代社會中，人們經常透過網路和行動通訊與人交流。由於匿名性質與網路的距離感，使得網路上的用語常因沒有顧慮而顯得粗暴，出現各種不理性的漫罵、歧視、人身攻擊、霸凌和言語騷擾等。此外現代人在朋友間常使用不雅詞彙互相打鬧，用來回覆其他網友就可能讓對方覺得不舒服甚至出現糾紛。此項課題在近幾年來相當熱門，但多是針對英文侮辱性文字的偵測。本計畫希望能建立一套屬於中文的實驗資料集，也提出除了偵測不雅文字之外，再進一步給使用者如何同義改寫成不具攻擊性的文句的建議。一來可保留發文的原意，二來也保障使用者的發言權利，三來更期待能讓使用者學會覺察自己用語上的疏忽處。本計畫擬以一年的時間，蒐集更多不雅文字的類型及其關鍵字，研究更多不雅文字偵測規則、更多同義改寫規則，實驗新的偵測技術，並評估各種系統設計的效能。預期完成的成果描述如下：(1) 各種中文不雅文字的使用類型定義(2) 各類不雅文字類型常用的不雅關鍵字(3) 在各社群網站搜尋或蒐集不雅文句資料的方法(4) 中文不雅文字偵測模組(5) 中文不雅文字同義改寫模組(6) 中文不雅文字偵測與同義改寫之實驗資料集(7) 中文不雅文字改寫系統與社群網站整合之模型 Nowadays, people often interact with each other via the Internet and mobile APPs. Because of the anonymity and distance, people tend to post sentences without consideration. There are a lot of personal attacks, bullying, harassment, abusive languages seen in the websites or social media. Moreover, even some offensive words are used between friends to show their intimacy, these words may offend other users and bring up unnecessary argues.Abusive language detection has been a hot topic in recent years. However, most of the researches focus on detecting offensive posts in English. This project wishes to build a large training set for Chinese. Instead of detecting and blocking offensive posts, we want to suggest the users how to rephrase their posts in a more decent and polite way without losing their original meanings. By doing so, their freedom of speech is protected. And hopefully the users can gradually learn how to use their language more carefully.This one-year project plans to collect a large set of sentences containing Chinese profanities, make a clearer definition of Chinese profanities, study how to learn new rules of profanity detection and rephrasing, and evaluate the performance of the rules. The results achieved from this project are as follows.(1) Definitions of categories of Chinese profanities(2) Key words or characters in each Chinese profanity category(3) Methods to collect offensive sentences in different websites or social media(4) Classifiers to detect offensive expressions(5) Rules to rephrase offensive expressions in a polite way(6) A large-scale training set for Chinese profanity detection and rephrasing(7) A model to integrate the proposed system with websites or social media

Keyword(s)

不雅文字偵測
髒話同義改寫
社群網路服務
網路霸凌
offensive language detection
profanity rephrasing
social networking websites
cyberbullying

DSpace CRIS

Detecting and Rephrasing Profanity in Chinese

基本資料

Description