Skip navigation
  • 中文
  • English

DSpace CRIS

  • DSpace logo
  • 首頁
  • 研究成果檢索
  • 研究人員
  • 單位
  • 計畫
  • 分類瀏覽
    • 研究成果檢索
    • 研究人員
    • 單位
    • 計畫
  • 機構典藏
  • SDGs
  • 登入
  • 中文
  • English
  1. National Taiwan Ocean University Research Hub

Active Learning Based on Locally Linear Propagation Reconstruction and Selection of Kernel Parameters for Support Vector Machines with General RBF Kernels: Gradient Descent-Based Approaches

瀏覽統計 Email 通知 RSS Feed

  • 簡歷

基本資料

Project title
Active Learning Based on Locally Linear Propagation Reconstruction and Selection of Kernel Parameters for Support Vector Machines with General RBF Kernels: Gradient Descent-Based Approaches
Code/計畫編號
MOST103-2221-E019-032-MY2
Translated Name/計畫中文名
使用梯度下降法於基於區域線性傳遞重建技巧的主動式學習演算法與挑選一般性徑向基底支撐向量機的核函數參數
 
Project Coordinator/計畫主持人
Chin-Chun Chang
Funding Organization/主管機關
National Science and Technology Council
 
Department/Unit
Department of Computer Science and Engineering
Website
https://www.grb.gov.tw/search/planDetail?id=11277873
Year
2015
 
Start date/計畫起
01-08-2015
Expected Completion/計畫迄
31-07-2016
 
Bugetid/研究經費
573千元
 
ResearchField/研究領域
資訊科學--軟體
 

Description

Abstract
"使用人工對一個大資料集的每個樣本給定類別標籤是一項非常耗時的工作。主動式 學習則是可從大量資料集中挑選重要樣本,經監督者給定類別標籤後來學習。其學習效 果與用整體標籤後資料來學習差別不多。基於區域線性嵌入(locally linear embedding)的 流形學習法(manifold learning),我們已發展出一個稱作區域線性傳遞重建技巧 (LLPR) 來找出對於資料群聚結構重要的樣本。第一年,我們將應用LLPR 至批次式的主動式學 習。 支撐向量機是最有效的監督式學習法之一。當線性支撐向量機不足以應付時,則常 使用非線性核函數的支撐向量機。其中一種最常選用的非線性核函數是徑向基底(RBF) 核函數􀜭(􀜠, 􀜠􁇱) = exp (−􀟛‖􀜠 − 􀜠􁇱‖􀬶 􀬶 )。徑向基底核函數定義在樣本向量間的歐幾里德距 離。但是,當樣本向量含有許多與類別不相干或高度相互關聯的成分時,歐幾里德距離 可能不是一個恰當的距離度量。一個更一般性的歐幾里德距離可以定義為‖􀜠 − 􀜠􁇱‖􀛻 = 􀶥(􀜠 − 􀜠􁇱)􀯍􀛻(􀜠 − 􀜠􁇱) , 其 中 M 是半正定性矩陣。對使用一般性RBF 核函數 􀜭􀛻(􀜠, 􀜠􁇱) = exp (−‖􀜠 − 􀜠􁇱‖􀛻 􀬶 )的支撐向量機,決定合適的矩陣M 是第二年的主題。 這兩個研究主題有個共同點:他們皆要解一個有半正定性條件限制的凸函數最佳化 問題。在這個計畫,我們將證明這個最佳化問題可以直接使用梯度下降法解出,不需要 用到其他複雜的數值計算技巧。因此,簡單與可延展性是我們將發展的方法的特色。這 兩年的主題分述如下:  第一年: 基於區域傳遞式線性重建技巧的主動式學習演算法 我們將使用 LLPR 來挑選詢問樣本。在計算LLPR 時,我們需要解一個條件限制的 凸函數最佳化問題。我們將證明它可以用梯度下降法解出。並將用指數式梯度下降法 (exponentiated gradient algorithm)來處理大資料集。  第二年: 選擇一般性RBF 支撐向量機的核函數參數 我們將使用定點遞迴法(fixed-point recursion method)來解M。在每一個疊代,首先 基於先前估計到的矩陣M,應用核函數為􀜭􀛻(􀜠, 􀜠􁇱)的支撐向量機來學習決策函數 (decision function)。然後再藉由最佳化一個由決策函數所構成的凸損失函數(convex loss function)來估計新的矩陣M。這個疊代一直重複到收斂為止。最佳化那個凸損失函數同 時滿足矩陣M 為半正定性條件是整個演算法最具挑戰性的地方。雖然這個問題可以用 一般最佳化凸函數的工具來解,我們將證明它可以用梯度下降法解出,並提出簡單且具 延展性的演算法。 由於這兩個研究主題都有廣泛的應用,非常值得發展此計畫擬研究的技術。""Labeling every sample in a large-scale data set is laborious work. Active learning provides means of selecting informative samples from a large-scale data set to label. Based on locally linear embedding, we have developed a novel technique named locally linear propagation reconstruction (LLPR) to identify samples important to cluster structures previously. We will apply LLPR to batch-mode active learning in the first year. Support vector machines (SVMs) are one of the most powerful techniques of supervised learning. SVMs with the radial basis function (RBF) kernel 􀜭(􀜠, 􀜠􁇱) = exp (−􀟛‖􀜠 − 􀜠􁇱‖􀬶 􀬶 ) are often used when the linear SVM is not complex enough. When there exist many irrelevant or highly correlated components in sample vectors, the Euclidean distance ‖􀜠 − 􀜠􁇱‖􀬶 does not always define proper distances among samples. A more general form of the Euclidean distance ‖􀜠 − 􀜠􁇱‖􀛻 = 􀶥(􀜠 − 􀜠􁇱)􀯍􀛻(􀜠 − 􀜠􁇱) will be considered in this two-year project, where the metric matrix M is a positive semi-definite matrix. Choosing the kernel parameter M for SVMs with the general RBF kernel 􀜭􀛻(􀜠, 􀜠􁇱) = exp (−‖􀜠 − 􀜠􁇱‖􀛻 􀬶 ) is the aim of this project in the second year. The two main topics have a common point in their core problems, which are constrained convex optimization problems. In this two-year project, gradient-descent-based methods will be shown to be capable of solving these two core problems. Hence, simplicity and scalability will be two appealing features of the proposed approaches. The topics of this two-year project are as follows.  The first year: Active learning based on locally linear propagation reconstruction. We will apply LLPR to select query samples. To use LLPR, we need to solve a constrained quadratic minimization problem. In the first year, we will solve this optimization problem by an exponentiated gradient algorithm to handle large-scale data sets.  The second year: Selection of kernel parameters for support vector machines with general RBF kernels. We will use a fixed-point recursion method to solve for M. In each iteration, SVMs with the kernel function 􀜭􀛻(􀜠, 􀜠􁇱) are first applied to learn decision functions with respect to the previous estimate of M, which is then updated by solving a convex exponential loss function in terms of the learned decision functions. The iterations are carried out until convergence. Optimizing the convex exponential loss function and forcing M to be positive semi-definite is the most challenging task and often solved by a general solver for convex optimization problems. We will show that this optimization problem can be solved by the gradient descent method and propose a simple and scalable algorithm in this project. Since these two topics are valuable to many applications, it is very worthy of investigating the technologies to be developed in this two-year project."
 
Keyword(s)
主動式學習
區域線性傳遞式重建
RBF 支撐向量機
距離學習
Active learning
Locally linear propagation reconstruction
RBF support vector machines
Distance metric learning
 
瀏覽
  • 機構典藏
  • 研究成果檢索
  • 研究人員
  • 單位
  • 計畫
DSpace-CRIS Software Copyright © 2002-  Duraspace   4science - Extension maintained and optimized by NTU Library Logo 4SCIENCE 回饋