http://scholars.ntou.edu.tw/handle/123456789/23604
Title: | Gradient Boosting over Linguistic-Pattern-Structured Trees for Learning Protein-Protein Interaction in the Biomedical Literature | Authors: | Warikoo, Neha Chang, Yung-Chun Ma, Shang-Pin |
Keywords: | protein-protein interaction;natural language processing;gradient-tree boosting;linguistic patterns;bioinformatics | Issue Date: | 1-Oct-2022 | Publisher: | MDPI | Journal Volume: | 12 | Journal Issue: | 20 | Source: | APPLIED SCIENCES-BASEL | Abstract: | Protein-based studies contribute significantly to gathering functional information about biological systems; therefore, the protein-protein interaction detection task is one of the most researched topics in the biomedical literature. To this end, many state-of-the-art systems using syntactic tree kernels (TK) and deep learning have been developed. However, these models are computationally complex and have limited learning interpretability. In this paper, we introduce a linguistic-pattern-representation-based Gradient-Tree Boosting model, i.e., LpGBoost. It uses linguistic patterns to optimize and generate semantically relevant representation vectors for learning over the gradient-tree boosting. The patterns are learned via unsupervised modeling by clustering invariant semantic features. These linguistic representations are semi-interpretable with rich semantic knowledge, and owing to their shallow representation, they are also computationally less expensive. Our experiments with six protein-protein interaction (PPI) corpora demonstrate that LpGBoost outperforms the SOTA tree-kernel models, as well as the CNN-based interaction detection studies for BioInfer and AIMed corpora. |
URI: | http://scholars.ntou.edu.tw/handle/123456789/23604 | DOI: | 10.3390/app122010199 |
Appears in Collections: | 資訊工程學系 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.