Author: "Cox, David" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cox, David"' showing total 7,360 results

Start Over Author "Cox, David"

7,360 results on '"Cox, David"'

1. Design and interpretation of studies: relevant concepts from the past and some extensions

Author: Cox, David R. and Wermuth, Nanny
Published: 2021
Full Text: View/download PDF

2. Notes on Three Formulas of Abel

Author: Cox, David A.
Subjects: Mathematics - Algebraic Geometry, Mathematics - Number Theory, Primary 01A55, 14K20, Secondary 14Q05, 68W30
Abstract: These notes explore three amazing formulas proved by Abel in his 1826 Paris memoir on what we now call Abelian integrals. We discuss the first two formulas from the point of view of symbolic computation and explain their connection to residues and partial fractions. The third formula arises from the first two and is related to the genus and lattice points in the Newton polygon., Comment: 49 pages, 4 figures
Published: 2024

3. Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Author: Shen, Yikang, Stallone, Matthew, Mishra, Mayank, Zhang, Gaoyuan, Tan, Shawn, Prasad, Aditya, Soria, Adriana Meza, Cox, David D., and Panda, Rameswar
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Finding the optimal learning rate for language model pretraining is a challenging task. This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters. Recent studies propose using small proxy models and small corpus to perform hyperparameter searches and transposing the optimal parameters to large models and large corpus. While the zero-shot transferability is theoretically and empirically proven for model size related hyperparameters, like depth and width, the zero-shot transfer from small corpus to large corpus is underexplored. In this paper, we study the correlation between optimal learning rate, batch size, and number of training tokens for the recently proposed WSD scheduler. After thousands of small experiments, we found a power-law relationship between variables and demonstrated its transferability across model sizes. Based on the observation, we propose a new learning rate scheduler, Power scheduler, that is agnostic about the number of training tokens and batch size. The experiment shows that combining the Power scheduler with Maximum Update Parameterization (muP) can consistently achieve impressive performance with one set of hyperparameters regardless of the number of training tokens, batch size, model size, and even model architecture. Our 3B dense and MoE models trained with the Power scheduler achieve comparable performance as state-of-the-art small language models. We open-source these pretrained models at https://ibm.biz/BdKhLa.
Published: 2024

4. Scaling Granite Code Models to 128K Context

Author: Stallone, Matt, Saxena, Vaibhav, Karlinsky, Leonid, McGinn, Bridget, Bula, Tim, Mishra, Mayank, Soria, Adriana Meza, Zhang, Gaoyuan, Prasad, Aditya, Shen, Yikang, Surendran, Saptha, Guttula, Shanmukha, Patel, Hima, Selvam, Parameswaran, Dang, Xuan-Hong, Koyfman, Yan, Sood, Atin, Feris, Rogerio, Desai, Nirmit, Cox, David D., Puri, Ruchir, and Panda, Rameswar
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.
Published: 2024

5. The infrastructure powering IBM's Gen AI model development

Author: Gershon, Talia, Seelam, Seetharami, Belgodere, Brian, Bonilla, Milton, Hoang, Lan, Barnett, Danny, Chung, I-Hsin, Mohan, Apoorve, Chen, Ming-Hung, Luo, Lixiang, Walkup, Robert, Evangelinos, Constantinos, Salaria, Shweta, Dombrowa, Marc, Park, Yoonho, Kayi, Apo, Schour, Liran, Alim, Alim, Sydney, Ali, Maniotis, Pavlos, Schares, Laurent, Metzler, Bernard, Karacali-Akyamac, Bengi, Wen, Sophia, Chiba, Tatsuhiro, Choochotkaew, Sunyanan, Yoshimura, Takeshi, Misale, Claudia, Elengikal, Tonia, Connor, Kevin O, Liu, Zhuoran, Molina, Richard, Schneidenbach, Lars, Caden, James, Laibinis, Christopher, Fonseca, Carlos, Tarasov, Vasily, Sundararaman, Swaminathan, Schmuck, Frank, Guthridge, Scott, Cohn, Jeremy, Eshel, Marc, Muench, Paul, Liu, Runyu, Pointer, William, Wyskida, Drew, Krull, Bob, Rose, Ray, Wolfe, Brent, Cornejo, William, Walter, John, Malone, Colm, Perucci, Clifford, Franco, Frank, Hinds, Nigel, Calio, Bob, Druyan, Pavel, Kilduff, Robert, Kienle, John, McStay, Connor, Figueroa, Andrew, Connolly, Matthew, Fost, Edie, Roma, Gina, Fonseca, Jake, Levy, Ido, Payne, Michele, Schenkel, Ryan, Malki, Amir, Schneider, Lion, Narkhede, Aniruddha, Moshref, Shekeba, Kisin, Alexandra, Dodin, Olga, Rippon, Bill, Wrieth, Henry, Ganci, John, Colino, Johnny, Habeger-Rose, Donna, Pandey, Rakesh, Gidh, Aditya, Gaur, Aditya, Patterson, Dennis, Salmani, Samsuddin, Varma, Rambilas, Rumana, Rumana, Sharma, Shubham, Mishra, Mayank, Panda, Rameswar, Prasad, Aditya, Stallone, Matt, Zhang, Gaoyuan, Shen, Yikang, Cox, David, Puri, Ruchir, Agrawal, Dakshi, Thorstensen, Drew, Belog, Joel, Tang, Brent, Gupta, Saurabh Kumar, Biswas, Amitabha, Maheshwari, Anup, Gampel, Eran, Van Patten, Jason, Runion, Matthew, Kaki, Sai, Bogin, Yigal, Reitz, Brian, Pritko, Steve, Najam, Shahan, Nambala, Surya, Chirra, Radhika, Welp, Rick, DiMitri, Frank, Telles, Felipe, Arvelo, Amilcar, Chu, King, Seminaro, Ed, Schram, Andrew, Eickhoff, Felix, Hanson, William, Mckeever, Eric, Joseph, Dinakaran, Chaudhary, Piyush, Shivam, Piyush, Chaudhary, Puneet, Jones, Wesley, Guthrie, Robert, Bostic, Chris, Islam, Rezaul, Duersch, Steve, Sawdon, Wayne, Lewars, John, Klos, Matthew, Spriggs, Michael, McMillan, Bill, Gao, George, Kamra, Ashish, Singh, Gaurav, Curry, Marc, Katarki, Tushar, Talerico, Joe, Shi, Zenghui, Malleni, Sai Sindhur, and Gallen, Erwan
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence
Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings., Comment: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla
Published: 2024

6. Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Author: Abdelaziz, Ibrahim, Basu, Kinjal, Agarwal, Mayank, Kumaravel, Sadhana, Stallone, Matthew, Panda, Rameswar, Rizk, Yara, Bhargav, GP, Crouse, Maxwell, Gunasekara, Chulaka, Ikbal, Shajith, Joshi, Sachin, Karanam, Hima, Kumar, Vineet, Munawar, Asim, Neelam, Sumit, Raghu, Dinesh, Sharma, Udit, Soria, Adriana Meza, Sreedhar, Dheeraj, Venkateswaran, Praveen, Unuvar, Merve, Cox, David, Roukos, Salim, Lastras, Luis, and Kapanipathi, Pavan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (APIs) to complete complex tasks. These tasks together are termed function calling. Endowing LLMs with function calling abilities leads to a myriad of advantages, such as access to current and domain-specific information in databases and knowledge sources, and the ability to outsource tasks that can be reliably performed by tools, e.g., a Python interpreter or calculator. While there has been significant progress in function calling with LLMs, there is still a dearth of open models that perform on par with proprietary LLMs like GPT, Claude, and Gemini. Therefore, in this work, we introduce the GRANITE-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks encompassed in function calling, those being Nested Function Calling, Function Chaining, Parallel Functions, Function Name Detection, Parameter-Value Pair Detection, Next-Best Function, and Response Generation. We present a comprehensive evaluation on multiple out-of-domain datasets comparing GRANITE-20B-FUNCTIONCALLING to more than 15 other best proprietary and open models. GRANITE-20B-FUNCTIONCALLING provides the best performance among all open models on the Berkeley Function Calling Leaderboard and fourth overall. As a result of the diverse tasks and datasets used for training our model, we show that GRANITE-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
Published: 2024

7. Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Author: Kang, Junmo, Karlinsky, Leonid, Luo, Hongyin, Wang, Zhen, Hansen, Jacob, Glass, James, Cox, David, Panda, Rameswar, Feris, Rogerio, and Ritter, Alan
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements (6.5%p on average) over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
Published: 2024

8. Wavefront Threading Enables Effective High-Level Synthesis

Author: Pelton, Blake, Sapek, Adam, Eguro, Ken, Lo, Daniel, Forin, Alessandro, Humphrey, Matt, Xi, Jinwen, Cox, David, Karandikar, Rajas, Licht, Johannes de Fine, Babin, Evgeny, Caulfield, Adrian, and Burger, Doug
Subjects: Computer Science - Programming Languages
Abstract: Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware designs. This paper describes Kanagawa, a language that takes a new approach to combine the programmer productivity benefits of traditional High-Level Synthesis (HLS) approaches with the expressibility and hardware efficiency of Register-Transfer Level (RTL) design. The language's concise syntax, matched with a hardware design-friendly execution model, permits a relatively simple toolchain to map high-level code into efficient hardware implementations., Comment: Accepted to PLDI'24
Published: 2024
Full Text: View/download PDF

9. $\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Author: Wang, Runqian, Ghosh, Soumya, Cox, David, Antognini, Diego, Oliva, Aude, Feris, Rogerio, and Karlinsky, Leonid
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modules need to be re-trained. Such re-training requires access to the data used to train the LoRA for the original base model. This is especially problematic for commercial cloud applications where the LoRA modules and the base models are hosted by service providers who may not be allowed to host proprietary client task data. To address this challenge, we propose $\textit{Trans-LoRA}$ -- a novel method for lossless, nearly data-free transfer of LoRAs across base models. Our approach relies on synthetic data to transfer LoRA modules. Using large language models, we design a synthetic data generator to approximate the data-generating process of the $\textit{observed}$ task data subset. Training on the resulting synthetic dataset transfers LoRA modules to new models. We show the effectiveness of our approach using both LLama and Gemma model families. Our approach achieves lossless (mostly improved) LoRA transfer between models within and across different base model families, and even between different PEFT methods, on a wide variety of tasks.
Published: 2024

10. Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Author: Mishra, Mayank, Stallone, Matt, Zhang, Gaoyuan, Shen, Yikang, Prasad, Aditya, Soria, Adriana Meza, Merler, Michele, Selvam, Parameswaran, Surendran, Saptha, Singh, Shivdeep, Sethi, Manish, Dang, Xuan-Hong, Li, Pengyuan, Wu, Kun-Lung, Zawad, Syed, Coleman, Andrew, White, Matthew, Lewis, Mark, Pavuluri, Raju, Koyfman, Yan, Lublinsky, Boris, de Bayser, Maximilien, Abdelaziz, Ibrahim, Basu, Kinjal, Agarwal, Mayank, Zhou, Yi, Johnson, Chris, Goyal, Aanchal, Patel, Hima, Shah, Yousaf, Zerfos, Petros, Ludwig, Heiko, Munawar, Asim, Crouse, Maxwell, Kapanipathi, Pavan, Salaria, Shweta, Calio, Bob, Wen, Sophia, Seelam, Seetharami, Belgodere, Brian, Fonseca, Carlos, Singhee, Amith, Desai, Nirmit, Cox, David D., Puri, Ruchir, and Panda, Rameswar
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Software Engineering
Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use., Comment: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang
Published: 2024

11. LAB: Large-Scale Alignment for ChatBots

Author: Sudalairaj, Shivchander, Bhandwaldar, Abhishek, Pareja, Aldo, Xu, Kai, Cox, David D., and Srivastava, Akash
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data generation process and a multi-phase tuning framework, LAB significantly reduces reliance on expensive human annotations and proprietary models like GPT-4. We demonstrate that LAB-trained models can achieve competitive performance across several benchmarks compared to models trained with traditional human-annotated or GPT-4 generated synthetic data. Thus offering a scalable, cost-effective solution for enhancing LLM capabilities and instruction-following behaviors without the drawbacks of catastrophic forgetting, marking a step forward in the efficient training of LLMs for a wide range of applications., Comment: Corresponding Author: Akash Srivastava. Equal Contribution: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Akash Srivastava, Code: https://github.com/instructlab
Published: 2024

12. United States house dust Pb concentrations are influenced by soil, paint, and house age: insights from a national survey

Author: Sowers, Tyler D., Nelson, Clay M., Blackmon, Matthew D., Li, Kevin, Jerden, Marissa L., Kirby, Alicia M., Kovalcik, Kasey, Cox, David, Dewalt, Gary, Friedman, Warren, Pinzer, Eugene A., Ashley, Peter J., and Bradham, Karen D.
Published: 2024
Full Text: View/download PDF

13. Audio-Visual Neural Syntax Acquisition

Author: Lai, Cheng-I Jeff, Shi, Freda, Peng, Puyuan, Kim, Yoon, Gimpel, Kevin, Chang, Shiyu, Chuang, Yung-Sung, Bhati, Saurabhchand, Cox, David, Harwath, David, Zhang, Yang, Livescu, Karen, and Glass, James
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without ever being exposed to text. By training on paired images and spoken captions, AV-NSL exhibits the capability to infer meaningful phrase structures that are comparable to those derived by naturally-supervised text parsers, for both English and German. Our findings extend prior work in unsupervised language acquisition from speech and grounded grammar induction, and present one approach to bridge the gap between the two topics.
Published: 2023

14. SALMON: Self-Alignment with Instructable Reward Models

Author: Sun, Zhiqing, Shen, Yikang, Zhang, Hongxin, Zhou, Qinhong, Chen, Zhenfang, Cox, David, Yang, Yiming, and Gan, Chuang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents. However, a significant limitation of such an approach is its dependency on high-quality human annotations, making its application to intricate tasks challenging due to difficulties in obtaining consistent response demonstrations and in-distribution response preferences. This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision, using only a small set of human-defined principles, yet achieving superior performance. Central to our approach is an instructable reward model. Trained on synthetic preference data, this model can generate reward scores based on arbitrary human-defined principles. By merely adjusting these principles during the RL training phase, we gain full control over the preferences with the instructable reward model, subsequently influencing the behavior of the RL-trained policy models, and reducing the reliance on the collection of online human preferences. Applying our method to the LLaMA-2-70b base language model, we developed an AI assistant named Dromedary-2. With only 6 exemplars for in-context learning and 31 human-defined principles, Dromedary-2 significantly surpasses the performance of several state-of-the-art AI systems, including LLaMA-2-Chat-70b, on various benchmark datasets. We have open-sourced the code and model weights to encourage further research into aligning LLM-based AI agents with enhanced supervision efficiency, improved controllability, and scalable oversight., Comment: Previous Title: SALMON: Self-Alignment with Principle-Following Reward Models. Accepted to ICLR 2024. Project page: https://github.com/IBM/SALMON
Published: 2023

15. Detection Sensitivity Limit of Hundreds of Atoms with X-Ray Fluorescence Microscopy

Author: Masteghin, Mateus G., Gervais, Toussaint, Clowes, Steven K., Cox, David C., Zelyk, Veronika, Pattammattel, Ajith, Chu, Yong S., Kolev, Nikola, Stock, Taylor Z., Curson, Neil, Evans, Paul G., Stuckelberger, Michael, and Murdin, Benedict N.
Subjects: Condensed Matter - Materials Science, Physics - Accelerator Physics, Physics - Applied Physics
Abstract: We report X-ray fluorescence (XRF) imaging of nanoscale inclusions of impurities for quantum technology. A very bright diffraction-limited focus of the X-ray beam produces very high sensitivity and resolution. We investigated gallium (Ga) dopants in silicon (Si) produced by a focused ion beam (FIB). These dopants might provide 3/2-spin qubits or p-type electrical contacts and quantum dots. We find that the ion beam spot is somewhat larger than expected, and the technique provides a useful calibration for the resolution of FIBs. Enticingly, we demonstrate that with a single shot detection of 1 second integration time, the sensitivity of the XRF would be sufficient to find amongst background a single isolated inclusion of unknown location comprising only 3000 Ga impurities (a mass of just 350 zg) without any need for specialized nm-thickness lamellae, and down from >105 atoms in previous reports of similar work. With increased integration we were able to detect 650 impurities. The results show that planned facility upgrades might achieve single atom sensitivity with a generally applicable, non-destructive technique in the near future., Comment: 8 pages, 5 figures
Published: 2023

16. Self-Specialization: Uncovering Latent Expertise within Large Language Models

Author: Kang, Junmo, Luo, Hongyin, Zhu, Yada, Hansen, Jacob, Glass, James, Cox, David, Ritter, Alan, Feris, Rogerio, and Karlinsky, Leonid
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent works have demonstrated the effectiveness of self-alignment in which a large language model is aligned to follow general instructions using instructional data generated from the model itself starting from a handful of human-written seeds. Instead of general alignment, in this work, we focus on self-alignment for expert domain specialization (e.g., biomedicine, finance). As a preliminary, we quantitively show the marginal effect that generic instruction-following training has on downstream expert domains' performance. To remedy this, we propose self-specialization - allowing for effective model specialization while achieving cross-task generalization by leveraging only a few labeled seeds. Self-specialization offers a data- and parameter-efficient way of "carving out" an expert model out of a generalist pre-trained LLM. Exploring a variety of popular open large models as a base for specialization, our experimental results in both biomedical and financial domains show that our self-specialized models outperform their base models by a large margin, and even larger models that are generally instruction-tuned or that have been adapted to the target domain by other means., Comment: ACL 2024 (Findings; Long Paper)
Published: 2023

17. The Challenges Ahead: Concepts, Analytics, and Ethics of Value-Based Care in Applied Behavior Analysis

Author: Cox, David J.
Published: 2024
Full Text: View/download PDF

18. Memory Trade: A Prehistory of Cyberculture by Darren Tofts (review)

Author: Cox, David
Published: 2017

19. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Author: Sun, Zhiqing, Shen, Yikang, Zhou, Qinhong, Zhang, Hongxin, Chen, Zhenfang, Cox, David, Yang, Yiming, and Gan, Chuang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the prompt diversity; second, we use a small set of human-written principles for AI models to follow, and guide the LLM through in-context learning from demonstrations (of principles application) to produce helpful, ethical, and reliable responses to user's queries; third, we fine-tune the original LLM with the high-quality self-aligned responses so that the resulting model can generate desirable responses for each query directly without the principle set and the demonstrations anymore; and finally, we offer a refinement step to address the issues of overly-brief or indirect responses. Applying SELF-ALIGN to the LLaMA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning). Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings., Comment: Accepted at NeurIPS 2023 (Spotlight). Project page: https://github.com/IBM/Dromedary
Published: 2023

20. Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

Author: Ding, Mingyu, Xu, Yan, Chen, Zhenfang, Cox, David Daniel, Luo, Ping, Tenenbaum, Joshua B., and Gan, Chuang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts, build semantic maps and plan actions to complete tasks by learning purely from human demonstrations and language instructions, without access to ground-truth semantic and depth supervisions from simulations. ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging the learned concepts; and (iv) a program executor with deterministic policies to execute each program. ECL has several appealing benefits thanks to its modularized design. Firstly, it enables the robotic agent to learn semantics and depth unsupervisedly acting like babies, e.g., ground concepts through active interaction and perceive depth by disparities when moving forward. Secondly, ECL is fully transparent and step-by-step interpretable in long-term planning. Thirdly, ECL could be beneficial for the embodied instruction following (EIF), outperforming previous works on the ALFRED benchmark when the semantic label is not provided. Also, the learned concept can be reused for other downstream tasks, such as reasoning of object states. Project page: http://ecl.csail.mit.edu/, Comment: CoRL 2022
Published: 2023

21. Learning to Grow Pretrained Models for Efficient Transformer Training

Author: Wang, Peihao, Panda, Rameswar, Hennigen, Lucas Torroba, Greengard, Philip, Karlinsky, Leonid, Feris, Rogerio, Cox, David Daniel, Wang, Zhangyang, and Kim, Yoon
Subjects: Computer Science - Machine Learning
Abstract: Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis. New instances of such models are typically trained completely from scratch, despite the fact that they are often just scaled-up versions of their smaller counterparts. How can we use the implicit knowledge in the parameters of smaller, extant models to enable faster training of newer, larger models? This paper describes an approach for accelerating transformer training by learning to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. For tractable learning, we factorize the linear transformation as a composition of (linear) width- and depth-growth operators, and further employ a Kronecker factorization of these growth operators to encode architectural knowledge. Extensive experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch, while also consistently outperforming strong baselines that also reuse smaller pretrained models to initialize larger models., Comment: International Conference on Learning Representations (ICLR), 2023
Published: 2023

22. Rapid Development of Compositional AI

Author: Martie, Lee, Rosenberg, Jessie, Demers, Veronique, Zhang, Gaoyuan, Bhardwaj, Onkar, Henning, John, Prasad, Aditya, Stallone, Matt, Lee, Ja Young, Yip, Lucy, Adesina, Damilola, Paikari, Elahe, Resendiz, Oscar, Shaw, Sarah, and Cox, David
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications, we have developed a novel framework called (Bee)* (written as a regular expression and pronounced as "beestar"). We illustrate how (Bee)* supports building integrated, scalable, and interactive compositional AI applications with a simplified developer experience., Comment: Accepted to ICSE 2023, NIER track
Published: 2023

23. Canada in World Affairs. XI. 1959 to 1961 by Richard A. Preston (review)

Author: Cox, David
Published: 2016

24. The Decolonization of Arctic Library and Archives Metadata (DALAM) Thematic Network at the University of the Arctic

Author: Farnel, Sharon, Campbell, Sandra M., Cox, David, II, Iselid, Lars, Lund, Peter, Parikka, Susanna, Rankin, Sharon, Stokkeland, Ivar, Wendelius, Päivi, Ford, James D., Series Editor, Desjardins, Sean, Editorial Board Member, Eicken, Hajo, Editorial Board Member, Falardeau-Cote, Marianne, Editorial Board Member, Jackson, Jen, Editorial Board Member, Mustonen, Tero, Editorial Board Member, Nenasheva, Marina, Editorial Board Member, Olsen, Julia, Editorial Board Member, and Acadia, Spencer, editor
Published: 2024
Full Text: View/download PDF

25. ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

Author: Smith, James Seale, Cascante-Bonilla, Paola, Arbelle, Assaf, Kim, Donghyun, Panda, Rameswar, Cox, David, Yang, Diyi, Kira, Zsolt, Feris, Rogerio, and Karlinsky, Leonid
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object attributes, states, and inter-object relations. This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting. In this work, we introduce the first Continual Data-Free Structured VL Concepts Learning (ConStruct-VL) benchmark and show it is challenging for many existing data-free CL strategies. We, therefore, propose a data-free method comprised of a new approach of Adversarial Pseudo-Replay (APR) which generates adversarial reminders of past tasks from past task models. To use this method efficiently, we also propose a continual parameter-efficient Layered-LoRA (LaLo) neural architecture allowing no-memory-cost access to all past models at train time. We show this approach outperforms all data-free methods by as much as ~7% while even matching some levels of experience-replay (prohibitive for applications where data-privacy must be preserved). Our code is publicly available at https://github.com/jamessealesmith/ConStruct-VL, Comment: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)
Published: 2022

26. We Live in Interesting Times: Introduction to the Special Section on Big Data & Behavior Science

Author: Cox, David J. and Young, Michael E.
Published: 2024
Full Text: View/download PDF

27. Starting the Conversation Around the Ethical Use of Artificial Intelligence in Applied Behavior Analysis

Author: Jennings, Adrienne M. and Cox, David J.
Published: 2024
Full Text: View/download PDF

28. The Promises and Possibilities of Artificial Intelligence in the Delivery of Behavior Analytic Services

Author: Cox, David J. and Jennings, Adrienne M.
Published: 2024
Full Text: View/download PDF

29. Identifying Trends in the Open-Access Behavior Analytic Literature via Computational Analyses (I): Simple Descriptions of Text

Author: Sosine, Jacob and Cox, David J.
Abstract: Published research in scientific journals are critical resources for researchers as primary sources about: what is important in the field, the direction the field is headed, how the field relates to other sciences, and as a historical record for each of these. In this exploratory study, we analyzed the articles of five behavior analytic journals to identify trends in these areas. To do this, we downloaded all available articles (N = 10,405) since the inception of five behavior analytic journals and one control journal. We then used computational techniques to turn the collection of raw text into a structured dataset for descriptive, exploratory analyses. We found consistent differences in the length and variability of published research across behavior analytic journals compared to a control journal. We also found increasing article lengths over time which, combined with the previous finding, may highlight changing editorial contingencies that influence the writing behavior of researchers. Further, we found evidence suggesting distinct (though still connected) verbal communities between the experimental analysis of behavior and applied behavior analysis. Lastly, keyword trends suggest that increased focus on "functional analyses," "problem behavior," and "autism spectrum disorder" currently dominates the research being published in these journals similar to the practitioner arm of behavior analysis. Researchers interested in studying published behavior analytic textual stimuli will find the corresponding open dataset useful. And, for those interested in computational analyses of these data, this first pass at simple descriptions provides a launching point for much fruitful future research.
Published: 2023
Full Text: View/download PDF

30. VALHALLA: Visual Hallucination for Machine Translation

Author: Li, Yi, Panda, Rameswar, Kim, Yoon, Chen, Chun-Fu, Feris, Rogerio, Cox, David, and Vasconcelos, Nuno
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation. We train the hallucination transformer jointly with the translation transformer using standard backpropagation with cross-entropy losses while being guided by an additional loss that encourages consistency between predictions using either ground-truth or hallucinated visual representations. Extensive experiments on three standard translation datasets with a diverse set of language pairs demonstrate the effectiveness of our approach over both text-only baselines and state-of-the-art methods. Project page: http://www.svcl.ucsd.edu/projects/valhalla., Comment: CVPR 2022
Published: 2022

31. ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Author: Qian, Kaizhi, Zhang, Yang, Gao, Heting, Ni, Junrui, Lai, Cheng-I, Cox, David, Hasegawa-Johnson, Mark, and Chang, Shiyu
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted variations, such as speaker variations, from the content. However, disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well, and the damage of the latter usually far outweighs the benefit of the former. In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. Our approach is adapted from the HuBERT framework, and incorporates disentangling mechanisms to regularize both the teacher labels and the learned representations. We evaluate the benefit of speaker disentanglement on a set of content-related downstream tasks, and observe a consistent and notable performance advantage of our speaker-disentangled representations.
Published: 2022

32. Influence of cocaine use reduction on markers of immune function

Author: Stoops, William W., Shellenberg, Thomas P., Regnier, Sean D., Cox, David H., Adatorwovor, Reuben, Hays, Lon R., Anderson, Danielle M., Lile, Joshua A., Schmitz, Joy M., Havens, Jennifer R., and Segerstrom, Suzanne C.
Published: 2024
Full Text: View/download PDF

33. Per- and polyfluoroalkyl substances (PFAS) in paired tap water and house dust from United States homes

Author: DeLuca, Nicole M., Boettger, Jason, Miller, Kelsey E., Fuller, Christopher, Minucci, Jeffrey M., Ashley, Peter J., Cox, David, DeWalt, Gary, Friedman, Warren, Pinzer, Eugene A., Bradham, Karen D., McCord, James, and Cohen Hubal, Elaine A.
Published: 2024
Full Text: View/download PDF

34. Subresultants and the Shape Lemma

Author: Cox, David A. and D'Andrea, Carlos
Subjects: Mathematics - Commutative Algebra, 13P10, 13P15, 14M25
Abstract: In nice cases, a zero-dimensional complete intersection ideal over a field of characteristic zero has a Shape Lemma. There are also cases where the ideal is generated by the resultant and first subresultant polynomials of the generators. This paper explores the relation between these representations and studies when the resultant generates the elimination ideal. We also prove a Poisson formula for resultants arising from the hidden variable method., Comment: 25 pages, revised version with several changes in sections 2, 3, and 5. Accepted for publication at Mathematics of Computation
Published: 2021

35. Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Author: Dapello, Joel, Feather, Jenelle, Le, Hang, Marques, Tiago, Cox, David D., McDermott, Josh H., DiCarlo, James J., and Chung, SueYeon
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response stochasticity, like that exhibited by biological neurons. Here, using recently developed geometrical techniques from computational neuroscience, we investigate how adversarial perturbations influence the internal representations of standard, adversarially trained, and biologically-inspired stochastic networks. We find distinct geometric signatures for each type of network, revealing different mechanisms for achieving robust representations. Next, we generalize these results to the auditory domain, showing that neural stochasticity also makes auditory models more robust to adversarial perturbations. Geometric analysis of the stochastic networks reveals overlap between representations of clean and adversarially perturbed stimuli, and quantitatively demonstrates that competing geometric effects of stochasticity mediate a tradeoff between adversarial and clean performance. Our results shed light on the strategies of robust perception utilized by adversarially trained and stochastic networks, and help explain how stochasticity may be beneficial to machine and biological computation., Comment: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Published: 2021

36. Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

Author: Fu, Yonggan, Yu, Qixuan, Zhang, Yang, Wu, Shang, Ouyang, Xu, Cox, David, and Lin, Yingyan
Subjects: Computer Science - Machine Learning
Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for the first time that there exist subnetworks with inborn robustness, matching or surpassing the robust accuracy of the adversarially trained networks with comparable model sizes, within randomly initialized networks without any model training, indicating that adversarial training on model weights is not indispensable towards adversarial robustness. We name such subnetworks Robust Scratch Tickets (RSTs), which are also by nature efficient. Distinct from the popular lottery ticket hypothesis, neither the original dense networks nor the identified RSTs need to be trained. To validate and understand this fascinating finding, we further conduct extensive experiments to study the existence and properties of RSTs under different models, datasets, sparsity patterns, and attacks, drawing insights regarding the relationship between DNNs' robustness and their initialization/overparameterization. Furthermore, we identify the poor adversarial transferability between RSTs of different sparsity ratios drawn from the same randomly initialized dense network, and propose a Random RST Switch (R2S) technique, which randomly switches between different RSTs, as a novel defense method built on top of RSTs. We believe our findings about RSTs have opened up a new perspective to study model robustness and extend the lottery ticket hypothesis., Comment: Accepted at NeurIPS 2021
Published: 2021

37. On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

Author: Lai, Cheng-I Jeff, Cooper, Erica, Zhang, Yang, Chang, Shiyu, Qian, Kaizhi, Liao, Yi-Lun, Chuang, Yung-Sung, Liu, Alexander H., Yamagishi, Junichi, Cox, David, and Glass, James
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serves as a starting point to explore pruning both spectrogram prediction networks and vocoders. We thoroughly investigate the tradeoffs between sparsity and its subsequent effects on synthetic speech. Additionally, we explored several aspects of TTS pruning: amount of finetuning data versus sparsity, TTS-Augmentation to utilize unspoken text, and combining knowledge distillation and pruning. Our findings suggest that not only are end-to-end TTS models highly prunable, but also, perhaps surprisingly, pruned TTS models can produce synthetic speech with equal or higher naturalness and intelligibility, with similar prosody. All of our experiments are conducted on publicly available models, and findings in this work are backed by large-scale subjective tests and objective measures. Code and 200 pruned models are made available to facilitate future research on efficiency in TTS.
Published: 2021

38. Increasing Children's Vegetable Consumption: Translating a Review of the Evidence Base to Develop Best Practice Guidelines

Author: Hendrie, Gilly A., Anastasiou, Kim, Brindal, Emily, Wiggins, Bonnie, Baird, Danielle L., Johnson, Brittany J., Bell, Lucinda K., Gardner, Claire, Arguelles, Jennifer C., Kelaart, Amber, Cox, David N., and Golley, Rebecca K.
Published: 2024
Full Text: View/download PDF

39. NOT SO SWEET

Author: Cox, David
Subjects: Taste, Business, Science and technology
Abstract: Low-calorie sugar substitutes are so ubiquitous that you probably consume them without realising. But with controversies over their impact on our health, is there a better way to get a [...]
Published: 2023

40. Evaluation of Statement Accuracy on Ethical Decision-Making

Author: Brodhead, Matthew T., Cascarilla, Allison N., and Cox, David J.
Published: 2023
Full Text: View/download PDF

41. Examination of Ethical Decision-Making Models Across Disciplines: Common Elements and Application to the Field of Behavior Analysis

Author: Suarez, Victoria D., Marya, Videsha, Weiss, Mary Jane, and Cox, David
Published: 2023
Full Text: View/download PDF

42. NZ wine still hot brand in UK

Author: Cox, David and Rushworth, Charlotte
Published: 2011

43. Global Rhythm Style Transfer Without Text Transcriptions

Author: Qian, Kaizhi, Zhang, Yang, Chang, Shiyu, Xiong, Jinjun, Gan, Chuang, Cox, David, and Hasegawa-Johnson, Mark
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
Abstract: Prosody plays an important role in characterizing the style of a speaker or an emotion, but most non-parallel voice or emotion style transfer algorithms do not convert any prosody information. Two major components of prosody are pitch and rhythm. Disentangling the prosody information, particularly the rhythm component, from the speech is challenging because it involves breaking the synchrony between the input speech and the disentangled speech representation. As a result, most existing prosody style transfer algorithms would need to rely on some form of text transcriptions to identify the content information, which confines their application to high-resource languages only. Recently, SpeechSplit has made sizeable progress towards unsupervised prosody style transfer, but it is unable to extract high-level global prosody style in an unsupervised manner. In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions. AutoPST is an Autoencoder-based Prosody Style Transfer framework with a thorough rhythm removal module guided by the self-expressive representation learning. Experiments on different style transfer tasks show that AutoPST can effectively convert prosody that correctly reflects the styles of the target domains.
Published: 2021

44. Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators

Author: Fu, Yonggan, Zhang, Yongan, Zhang, Yang, Cox, David, and Lin, Yingyan
Subjects: Computer Science - Machine Learning
Abstract: While maximizing deep neural networks' (DNNs') acceleration efficiency requires a joint search/design of three different yet highly coupled aspects, including the networks, bitwidths, and accelerators, the challenges associated with such a joint search have not yet been fully understood and addressed. The key challenges include (1) the dilemma of whether to explode the memory consumption due to the huge joint space or achieve sub-optimal designs, (2) the discrete nature of the accelerator design space that is coupled yet different from that of the networks and bitwidths, and (3) the chicken and egg problem associated with network-accelerator co-search, i.e., co-search requires operation-wise hardware cost, which is lacking during search as the optimal accelerator depending on the whole network is still unknown during search. To tackle these daunting challenges towards optimal and fast development of DNN accelerators, we propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators, by efficiently localizing the optimal design within the huge joint design space for each target dataset and acceleration specification. Our Auto-NBA integrates a heterogeneous sampling strategy to achieve unbiased search with constant memory consumption, and a novel joint-search pipeline equipped with a generic differentiable accelerator search engine. Extensive experiments and ablation studies validate that both Auto-NBA generated networks and accelerators consistently outperform state-of-the-art designs (including co-search/exploration techniques, hardware-aware NAS methods, and DNN accelerators), in terms of search time, task accuracy, and accelerator efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA., Comment: Accepted at ICML 2021
Published: 2021

45. PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Author: Lai, Cheng-I Jeff, Zhang, Yang, Liu, Alexander H., Chang, Shiyu, Liao, Yi-Lun, Chuang, Yung-Sung, Qian, Kaizhi, Khurana, Sameer, Cox, David, and Glass, James
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Self-supervised speech representation learning (speech SSL) has demonstrated the benefit of scale in learning rich representations for Automatic Speech Recognition (ASR) with limited paired data, such as wav2vec 2.0. We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results. However, directly applying widely adopted pruning methods such as the Lottery Ticket Hypothesis (LTH) is suboptimal in the computational cost needed. Moreover, we show that the discovered subnetworks yield minimal performance gain compared to the original dense network. We present Prune-Adjust-Re-Prune (PARP), which discovers and finetunes subnetworks for much better performance, while only requiring a single downstream ASR finetuning run. PARP is inspired by our surprising observation that subnetworks pruned for pre-training tasks need merely a slight adjustment to achieve a sizeable performance boost in downstream ASR tasks. Extensive experiments on low-resource ASR verify (1) sparse subnetworks exist in mono-lingual/multi-lingual pre-trained speech SSL, and (2) the computational advantage and performance gain of PARP over baseline pruning methods. In particular, on the 10min Librispeech split without LM decoding, PARP discovers subnetworks from wav2vec 2.0 with an absolute 10.9%/12.6% WER decrease compared to the full model. We further demonstrate the effectiveness of PARP via: cross-lingual pruning without any phone recognition degradation, the discovery of a multi-lingual subnetwork for 10 spoken languages in 1 finetuning run, and its applicability to pre-trained BERT/XLNet for natural language tasks.
Published: 2021

46. NZ should flaunt logo

Author: Cox, David
Published: 2010

47. Improving sea container clearance

Author: Cox, David
Published: 2008

48. Airway inflammation accelerates pulmonary exacerbations in cystic fibrosis

Author: Liou, Theodore G., Argel, Natalia, Asfour, Fadi, Brown, Perry S., Chatfield, Barbara A., Cox, David R., Daines, Cori L., Durham, Dixie, Francis, Jessica A., Glover, Barbara, Helms, My, Heynekamp, Theresa, Hoidal, John R., Jensen, Judy L., Kartsonaki, Christiana, Keogh, Ruth, Kopecky, Carol M., Lechtzin, Noah, Li, Yanping, Lysinger, Jerimiah, Molina, Osmara, Nakamura, Craig, Packer, Kristyn A., Paine, Robert, III, Poch, Katie R., Quittner, Alexandra L., Radford, Peggy, Redway, Abby J., Sagel, Scott D., Szczesniak, Rhonda D., Sprandel, Shawna, Taylor-Cousar, Jennifer L., Vroom, Jane B., Yoshikawa, Ryan, Clancy, John P., Elborn, J. Stuart, Olivier, Kenneth N., and Adler, Frederick R.
Published: 2024
Full Text: View/download PDF

49. Retreating from the Cold War

Author: Cox, David
Subjects: BOOK REVIEWS
Published: 1997

50. Influence of Televisibility and Harm Probability on Clinical-Ethical Decision Making

Author: Cox, David J. and Javed, Asim
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

7,360 results on '"Cox, David"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources