F.C. Heilbron, A. Thabet, J.C. Niebles, B. Ghanem
Asian Conference on Computer Vision, volume 9006, pp. 583-597, (2014)
This paper describes a framework for recognizing human actions in videos by incorporating a new set of visual cues that represent the context of the action. We develop a weak foreground-background segmentation approach in order to robustly extract not only foreground features that are focused on the actors, but also global camera motion and contextual scene information. Using dense point trajectories, our approach separates and describes the foreground motion from the background, represents the appearance of the extracted static background, and encodes the global camera motion that interestingly is shown to be discriminative for certain action classes. Our experiments on four challenging benchmarks (HMDB51, Hollywood2, Olympic Sports, and UCF50) show that our contextual features enable a significant performance improvement over state-of-the-art algorithms.