5 results on '"Bagheri Garakani, Alireza"'
Search Results
2. Multi-objective Ranking via Constrained Optimization
- Author
-
Momma, Michinari, primary, Bagheri Garakani, Alireza, additional, Ma, Nanxun, additional, and Sun, Yi, additional
- Published
- 2020
- Full Text
- View/download PDF
3. Kernel Approximation Methods for Speech Recognition
- Author
-
May, Avner, Bagheri Garakani, Alireza, Lu, Zhiyun, Guo, Dong, Liu, Kuan, Bellet, Aurélien, Fan, Linxi, Collins, Michael, Hsu, Daniel, Kingsbury, Brian, Picheny, Michael, Sha, Fei, Columbia University [New York], University of Southern California (USC), Machine Learning in Information Networks (MAGNET), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Stanford University, IBM Thomas J. Watson Research Center, IBM, and This research is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Defense U.S. Army Research Laboratory (DoD / ARL) contract number W911NF-12-C-0012. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoD/ARL, or the U.S. Government.F.S. is grateful to Lawrence K. Saul (UCSD), L´eon Bottou (Facebook), Alex Smola (Amazon), and Chris J.C. Burges (Microsoft Research) for many fruitful discussions and pointers to relevant work. Additionally, A.B.G. is partially supported by a USC Provost Graduate Fellowship. F.S. is partially supported by a NSF IIS-1065243, 1451412, 1139148 a Google Research Award, an Alfred. P. Sloan Research Fellowship, an ARO YIP Award (W911NF-12-1-0241) and ARO Award W911NF-15-1-0484. A.B. is partially supported by a grant from CPER Nord-Pas de Calais/FEDER DATA Advanced data science and technologies 2015-2020. Computation for the work described in this paper was partially supported by the University of Southern California’s Center for High-Performance Computing (http://hpc.usc.edu).
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,acoustic modeling ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,automatic speech recognition ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) ,kernel methods ,Artificial Intelligence (cs.AI) ,feature selection ,deep neural networks ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,Computation and Language (cs.CL) - Abstract
We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more efficiently than existing approaches. Second, we present a number of frame-level metrics which correlate very strongly with recognition performance when computed on the heldout set; we take advantage of these correlations by monitoring these metrics during training in order to decide when to stop learning. This technique can noticeably improve the recognition performance of both DNN and kernel models, while narrowing the gap between them. Additionally, we show that the linear bottleneck method of Sainath et al. (2013) improves the performance of our kernel models significantly, in addition to speeding up training and making the models more compact. Together, these three methods dramatically improve the performance of kernel acoustic models, making their performance comparable to DNNs on the tasks we explored.
- Published
- 2019
4. Kernel Approximation Methods for Speech Recognition
- Author
-
May, Avner, Bagheri Garakani, Alireza, Lu, Zhiyun, Guo, Dong, Liu, Kuan, Bellet, Aurélien, Fan, Linxi, Collins, Michael, Hsu, Daniel, Kingsbury, Brian, Picheny, Michael, Sha, Fei, Columbia University [New York], University of Southern California (USC), Machine Learning in Information Networks (MAGNET), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Stanford University, IBM Thomas J. Watson Research Center, IBM, and INRIA Lille
- Subjects
Deep Neural Networks ,Kernel Methods ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Acoustic Modeling ,Automatic Speech Recognition ,Logistic Regression ,Feature Selection - Abstract
We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht [2007]. We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more efficiently than existing approaches. Second, we present a number of frame-level metrics which correlate very strongly with recognition performance when computed on the heldout set; we take advantage of these correlations by monitoring these metrics during training in order to decide when to stop learning. This technique can noticeably improve the recognition performance of both DNN and kernel models, while narrowing the gap between them. Additionally, we show that the linear bottleneck method of Sainath et al. [2013a] improves the performance of our kernel models significantly, in addition to speeding up training and making the models more compact. Together, these three methods dramatically improve the performance of kernel acoustic models, making their performance comparable to DNNs on the tasks we explored.
- Published
- 2017
5. How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets
- Author
-
Lu, Zhiyun, May, Avner, Liu, Kuan, Bagheri Garakani, Alireza, Guo, Dong, Bellet, Aurélien, Fan, Linxi, Collins, Michael, Kingsbury, Brian, Picheny, Michael, Sha, Fei, University of Southern California (USC), Columbia University [New York], Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, IBM Thomas J. Watson Research Center, IBM, and University of Southern California
- Subjects
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] - Abstract
The computational complexity of kernel methods has often been a major barrier for applying them to large-scale learning problems. We argue that this barrier can be effectively overcome. In particular, we develop methods to scale up kernel models to successfully tackle large-scale learning problems that are so far only approachable by deep learning architectures. Based on the seminal work by Rahimi and Recht on approximating kernel functions with features derived from random projections, we advance the state-of-the-art by proposing methods that can efficiently train models with hundreds of millions of parameters, and learn optimal representations from multiple kernels. We conduct extensive empirical studies on problems from image recognition and automatic speech recognition, and show that the performance of our kernel models matches that of well-engineered deep neural nets (DNNs). To the best of our knowledge, this is the first time that a direct comparison between these two methods on large-scale problems is reported. Our kernel methods have several appealing properties: training with convex optimization, cost for training a single model comparable to DNNs, and significantly reduced total cost due to fewer hyperparameters to tune for model selection. Our contrastive study between these two very different but equally competitive models sheds light on fundamental questions such as how to learn good representations.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.