Skip navigation
  • 中文
  • English

DSpace CRIS

  • DSpace logo
  • Home
  • Research Outputs
  • Researchers
  • Organizations
  • Projects
  • Explore by
    • Research Outputs
    • Researchers
    • Organizations
    • Projects
  • Communities & Collections
  • SDGs
  • Sign in
  • 中文
  • English
  1. National Taiwan Ocean University Research Hub
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://scholars.ntou.edu.tw/handle/123456789/24747
Title: Reordering video shots for event classification using bag-of-words models and string kernels
Authors: Yung-Lun Chen
Shyi-Chyi Cheng 
Yi-Ping Phoebe Chen
Issue Date: Nov-2012
Abstract: 
This paper presents a novel approach to reorder video shots using the state-of-the-art bag-of-words (BoW) approach. The shot reordering approach eliminates the temporal ambiguity which is likely to degrade the performance of conventional video event recognition algorithms using support vector machine (SVM) classifiers with string kernels. A traditional BoW model constructs feature vectors for video frames, regarding the arrangement of the visual words in the 2D image space, to be histograms of visual words which do not consider spatial-temporal information. Our approach first segments the input video clip into a set of video shots where each of them is further divided into multiple three dimensional video patches and cubes. In this paper we present a method to introduce spatial-temporal information into the BoW model by analytically extracting space-time features from individual 3D cubes. The system learns the BoW codebook from these 3D cubes. Every video shot in an input video sequence is represented as a BoW histogram and the corresponding event is then modelled as a sequence of BoW histograms which are further reordered by the proposed normalization scheme. The string kernels for SVM classification are finally adopted to train the SVM classifiers from a set of training samples. These classifiers are used to recognize the event type of a test video clip. Our framework presents a simple and effective way to infuse both temporal and spatial configurations for video events. Results show that the proposed method gives good performance on several publicly available datasets in terms of robustness and recognition rate.
URI: http://scholars.ntou.edu.tw/handle/123456789/24747
DOI: 10.1145/2425836.2425876
Appears in Collections:資訊工程學系

Show full item record

Page view(s)

152
checked on Jun 30, 2025

Google ScholarTM

Check

Altmetric

Altmetric

Related Items in TAIR


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Explore by
  • Communities & Collections
  • Research Outputs
  • Researchers
  • Organizations
  • Projects
Build with DSpace-CRIS - Extension maintained and optimized by Logo 4SCIENCE Feedback