Home
Add
Get on Google Play
Home
> Edit
Add/Update Thesis
Title*
Author's Name*
Supervisor's Name
Abstract
With the advancement in information and communication technologies, sensing devices have now become pervasive. The pervasiveness of camera devices has enabled recording of video data at anytime and anywhere. It gives rise to a massive amount of untrimmed video data being produced, which consist of several human-related activities and actions including some background activities as well. It is important to detect the actions of interest in such long and untrimmed videos so that it can be further used in numerous applications i.e., video analysis, video summarization, surveillance, retrieval and captioning etc. This thesis targets temporal human action detection in long and untrimmed videos. Given a long and untrimmed video, the task of the temporal action detection is to detect starting and ending time of all occurrences of actions of interest and to predict action label of the detected intervals. Detecting human actions in long untrimmed videos is important but a challenging problem because of the unconstrained nature of long untrimmed videos in both space and time. In this work we solve the temporal action detection problem using two di erent paradigms: proposal + classi cation" and end-to-end temporal action detection". In proposal + classi cation approach, the regions which likely to contain human actions, known as proposals, arerst generated from untrimmed videos which are then classi ed into the targeted actions. To this end, we propose two di erent methods to generate action proposals: (1) un-supervised and (2) supervised temporal action proposal methods. In therst method, we propose unsupervised proposal generation method named as Proposals from Motion History Images (PMHI). PMHI discriminates actions from non-action regions by clustering the MHIs into actions and nonaction segments by detecting minima from the energy of MHIs. The strength of PMHI is that it is unsupervised, which alleviates the requirement for any training data. PMHI outperforms the existing proposal methods on the Multi-view Human Action video (MuHAVi)- uncut and Computer Vision and Pattern recognition (CVPR) 2012 Change Detection datasets.PMHI depends upon precise silhouettes extraction which is challenging for realistic videos and for moving cameras. To solve aforementioned problem, we propose a supervised temporal action proposal method named as Temporally Aggregated Bag-of-Discriminant-Words (TAB) which work directly on RGB videos. TAB is based on the observation that there are many overlapping frames in action and background temporal regions of untrimmed videos, which cause di culties in segmenting actions from non-action regions. TAB solve this issue by extracting class-speci c codewords from the action and background videos and extracting the discriminative weights of these codewords based on their ability to discriminate between these two classes. We integrate these discriminative weights with Bag of Word encoding, which we then call Bag-of-Discriminant-Words (BoDW). We sample the untrimmed videos into non-overlapping snippets and temporally aggregate the BoDW representation of multiple snippets into action proposals. We present the e ectiveness of TAB proposal method on two challenging temporal action detection datasets: MSR-II and Thumos14, where it improves upon state-ofthe- art methods. Proposal + classi cation", requires multiple passes through testing data for these two stages, therefore, it is di cult to use these methods in an end-to-end manner. To solve this problem, we propose an end-to-end temporal action detection method known as Bag of Discriminant Snippets (BoDS). BoDS is based on the observation that multiple actions and the background classes have similar snippets, which cause incorrect classi cation of action regions and imprecise boundaries. We solve this issue bynding the key-snippets from the training data of each class and compute their discriminative power which is used in BoDS encoding. During testing of an untrimmed video, wend the BoDS representation for multiple candidate regions andnd their class label based on a majority voting scheme. We test BoDS on the Thumos14 and ActivityNet datasets and obtain state-of-the-art results.
Subject/Specialization
Language
Program
Faculty/Department's Name
Institute Name
Univeristy Type
Public
Private
Campus (if any)
Institute Affiliation Inforamtion (if any)
City where institute is located
Province
Country
Degree Starting Year
Degree Completion Year
Year of Viva Voce Exam
Thesis Completion Year
Thesis Status
Completed
Incomplete
Number of Pages
Urdu Keywords
English Keywords
Link
Select Category
Religious Studies
Social Sciences & Humanities
Science
Technology
Any other inforamtion you want to share such as Table of Contents, Conclusion.
Your email address*