http://scholars.ntou.edu.tw/handle/123456789/25515
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.author | Wang, Jung-Hua | en_US |
dc.contributor.author | Lee, Shih-Kai | en_US |
dc.contributor.author | Wang, Ting-Yuan | en_US |
dc.contributor.author | Chen, Ming-Jer | en_US |
dc.contributor.author | Hsu, Shu-Wei | en_US |
dc.date.accessioned | 2024-11-01T09:18:05Z | - |
dc.date.available | 2024-11-01T09:18:05Z | - |
dc.date.issued | 2024/1/1 | - |
dc.identifier.issn | 2169-3536 | - |
dc.identifier.uri | http://scholars.ntou.edu.tw/handle/123456789/25515 | - |
dc.description.abstract | We present an ensemble learning-based data cleaning approach (touted as ELDC) capable of identifying and pruning anomaly data. ELDC is characterized in that an ensemble of base models can be trained directly with the noisy in-sample data and can dynamically provide clean data during the iterative training. Each base model uses a random subset of the target dataset that may initially contain up to 40% of label errors. Following each training iteration, anomaly data are discriminated against clean ones by a majority voting scheme, and three different types of anomaly (mislabeled, confusing, and outliers) can be identified using a statistical pattern jointly determined by the prediction output of the base models. By iterating such a cycle of train-vote-remove, noisy in-sample data are progressively removed until a prespecified condition is reached. Comprehensive experiments, including out-sample data tests, are conducted to verify the effectiveness of ELDC in simultaneously suppressing bias and variance of the prediction output. The ELDC framework is highly flexible as it is not bound to a specific model and allows different transfer-learning configurations. Neural networks of AlexNet, ResNet50, and GoogleNet are used as based models and trained with various benchmark datasets, the results show that ELDC outperforms state-of-the-art cleaning methods. | en_US |
dc.language.iso | English | en_US |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | en_US |
dc.relation.ispartof | IEEE ACCESS | en_US |
dc.subject | Training | en_US |
dc.subject | Data models | en_US |
dc.subject | Cleaning | en_US |
dc.subject | Noise measurement | en_US |
dc.subject | Image classification | en_US |
dc.subject | Complexity theory | en_US |
dc.subject | Training data | en_US |
dc.subject | Ensemble learning | en_US |
dc.subject | Data integrity | en_US |
dc.subject | Transfer learning | en_US |
dc.subject | Convolutional neural networks | en_US |
dc.subject | Noisy data | en_US |
dc.subject | ensemble learning | en_US |
dc.subject | data cleanline | en_US |
dc.title | Progressive Ensemble Learning for in-Sample Data Cleaning | en_US |
dc.type | journal article | en_US |
dc.identifier.doi | 10.1109/ACCESS.2024.3468035 | - |
dc.identifier.isi | WOS:001329024200001 | - |
dc.relation.journalvolume | 12 | en_US |
dc.relation.pages | 140643-140659 | en_US |
item.cerifentitytype | Publications | - |
item.openairetype | journal article | - |
item.openairecristype | http://purl.org/coar/resource_type/c_6501 | - |
item.fulltext | no fulltext | - |
item.grantfulltext | none | - |
item.languageiso639-1 | English | - |
crisitem.author.dept | College of Electrical Engineering and Computer Science | - |
crisitem.author.dept | Department of Electrical Engineering | - |
crisitem.author.dept | National Taiwan Ocean University,NTOU | - |
crisitem.author.parentorg | National Taiwan Ocean University,NTOU | - |
crisitem.author.parentorg | College of Electrical Engineering and Computer Science | - |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。