Back to Search Start Over

Marginalized average attentional network for weakly-supervised learning

Authors :
Yuan, Yuan
Lyu, Yueming
Shen, Xi
Tsang, Ivor
Yeung, Dit-Yan
SHEN, Xi
Integrated Optimization with Complex Structure (INOCS)
Inria Lille - Nord Europe
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université libre de Bruxelles (ULB)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL)
Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
University of Technology Sydney (UTS)
imagine [Marne-la-Vallée]
Laboratoire d'Informatique Gaspard-Monge (LIGM)
Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM)-Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM)
Alcatel-Lucent Bell - Belgique
Alacatel Lucent
Hong Kong University of Science and Technology (HKUST)
Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS)
Source :
ICLR 2019-Seventh International Conference on Learning Representations, ICLR 2019-Seventh International Conference on Learning Representations, May 2019, New-Orleans, United States
Publication Year :
2019

Abstract

© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. In weakly-supervised temporal action localization, previous works have failed to locate dense and integral regions for each entire action due to the overestimation of the most salient regions. To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner. The MAAN employs a novel marginalized average aggregation (MAA) module and learns a set of latent discriminative probabilities in an end-to-end fashion. MAA samples multiple subsets from the video snippet features according to a set of latent discriminative probabilities and takes the expectation over all the averaged subset features. Theoretically, we prove that the MAA module with learned latent discriminative probabilities successfully reduces the difference in responses between the most salient regions and the others. Therefore, MAAN is able to generate better class activation sequences and identify dense and integral action regions in the videos. Moreover, we propose a fast algorithm to reduce the complexity of constructing MAA from O(2T) to O(T2). Extensive experiments on two large-scale video datasets show that our MAAN achieves a superior performance on weakly-supervised temporal action localization.

Details

Language :
English
Database :
OpenAIRE
Journal :
ICLR 2019-Seventh International Conference on Learning Representations, ICLR 2019-Seventh International Conference on Learning Representations, May 2019, New-Orleans, United States
Accession number :
edsair.doi.dedup.....a452c373da434cc0e4f6be42da12aab1