PhD Defense | Understanding Human Activities at Large Scale

Mar 05 2019 04:00 PM - Mar 05 2019 05:00 PM


Abstract

With the growth of online media, surveillance and mobile cameras, the amount and size of video databases are increasing at an incredible pace. For example, YouTube reported that over 400 hours of video are uploaded every minute to their servers. Arguably, people are the most important and interesting subjects of such videos. The computer vision community has embraced this observation to validate the crucial role that human action recognition plays in building smarter surveillance systems, semantically aware video indexes and more natural human-computer interfaces. However, despite the explosion of video data, the ability to automatically recognize and understand human activities is still somewhat limited. This dissertation defense will discuss three contributions for scaling up human activity understanding. Firstly, I will introduce ActivityNet, a large scale database for video analysis that eases the limitations of contemporary datasets. Secondly, I will explain a novel methodology called action proposals generation, which is able to quickly sifts through large video collections and find moments of interest. Thirdly, I will detail an innovative model that uses context information such as objects and places to temporally localize human activities at 30 frames per second while achieving state-of-the-art performance. By creating a large-scale video benchmark, designing efficient action scanning methods, and enriching approaches with high-level semantics for temporal activity localization, I aimed to make a step closer towards general video understanding.

Biography

Fabian is a computer vision scientist and Ph.D. candidate in the Electrical Engineering program at KAUST. He is part of the multi-cultural Image and Visual Understanding Lab (IVUL), where Prof. Bernard Ghanem advises him. Fabian is interested in empowering video analytics systems with artificial intelligence. Specifically, his research interest is human activity understanding, a growing and important area of computer vision.