Author: "Grubisic, Dejan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Grubisic, Dejan"' showing total 9 results

Start Over Author "Grubisic, Dejan"

9 results on '"Grubisic, Dejan"'

1. Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

Author: Cummins, Chris, Seeker, Volker, Grubisic, Dejan, Roziere, Baptiste, Gehring, Jonas, Synnaeve, Gabriel, and Leather, Hugh
Subjects: Computer Science - Programming Languages, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimization remains underexplored. Training LLMs is resource-intensive, requiring substantial GPU hours and extensive data collection, which can be prohibitive. To address this gap, we introduce Meta Large Language Model Compiler (LLM Compiler), a suite of robust, openly available, pre-trained models specifically designed for code optimization tasks. Built on the foundation of Code Llama, LLM Compiler enhances the understanding of compiler intermediate representations (IRs), assembly language, and optimization techniques. The model has been trained on a vast corpus of 546 billion tokens of LLVM-IR and assembly code and has undergone instruction fine-tuning to interpret compiler behavior. LLM Compiler is released under a bespoke commercial license to allow wide reuse and is available in two sizes: 7 billion and 13 billion parameters. We also present fine-tuned versions of the model, demonstrating its enhanced capabilities in optimizing code size and disassembling from x86_64 and ARM assembly back into LLVM-IR. These achieve 77% of the optimising potential of an autotuning search, and 45% disassembly round trip (14% exact match). This release aims to provide a scalable, cost-effective foundation for further research and development in compiler optimization by both academic researchers and industry practitioners.
Published: 2024

2. Compiler generated feedback for Large Language Models

Author: Grubisic, Dejan, Cummins, Chris, Seeker, Volker, and Leather, Hugh
Subjects: Computer Science - Programming Languages, Computer Science - Machine Learning
Abstract: We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly. The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs. Then we compile the input with generated optimization passes and evaluate if the predicted instruction count is correct, generated IR is compilable, and corresponds to compiled code. We provide this feedback back to LLM and give it another chance to optimize code. This approach adds an extra 0.53% improvement over -Oz to the original model. Even though, adding more information with feedback seems intuitive, simple sampling techniques achieve much higher performance given 10 or more samples.
Published: 2024

3. Priority Sampling of Large Language Models for Compilers

Author: Grubisic, Dejan, Cummins, Chris, Seeker, Volker, and Leather, Hugh
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Performance
Abstract: Large language models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
Published: 2024

4. Large Language Models for Compiler Optimization

Author: Cummins, Chris, Seeker, Volker, Grubisic, Dejan, Elhoushi, Mostafa, Liang, Youwei, Roziere, Baptiste, Gehring, Jonas, Gloeckle, Fabian, Hazelwood, Kim, Synnaeve, Gabriel, and Leather, Hugh
Subjects: Computer Science - Programming Languages, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.
Published: 2023

5. LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Author: Grubisic, Dejan, Wasti, Bram, Cummins, Chris, Mellor-Crummey, John, and Zlateski, Aleksandar
Subjects: Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries introduce unsustainable costs. To address this, we developed LoopTune, a deep reinforcement learning compiler that optimizes tensor computations in deep learning models for the CPU. LoopTune optimizes tensor traversal order while using the ultra-fast lightweight code generator LoopNest to perform hardware-specific optimizations. With a novel graph-based representation and action space, LoopTune speeds up LoopNest by 3.2x, generating an order of magnitude faster code than TVM, 2.8x faster than MetaSchedule, and 1.08x faster than AutoTVM, consistently performing at the level of the hand-tuned library Numpy. Moreover, LoopTune tunes code in order of seconds.
Published: 2023

6. Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

Author: Zhou, Keren, Adhianto, Laksono, Anderson, Jonathon, Cherian, Aaron, Grubisic, Dejan, Krentel, Mark, Liu, Yumeng, Meng, Xiaozhu, and Mellor-Crummey, John
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit's new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.
Published: 2021
Full Text: View/download PDF

7. An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications

Author: Zhou, Keren, primary, Meng, Xiaozhu, additional, Sai, Ryuichi, additional, Grubisic, Dejan, additional, and Mellor-Crummey, John, additional
Published: 2022
Full Text: View/download PDF

8. Measurement and analysis of GPU-accelerated applications with HPCToolkit

Author: Zhou, Keren, primary, Adhianto, Laksono, additional, Anderson, Jonathon, additional, Cherian, Aaron, additional, Grubisic, Dejan, additional, Krentel, Mark, additional, Liu, Yumeng, additional, Meng, Xiaozhu, additional, and Mellor-Crummey, John, additional
Published: 2021
Full Text: View/download PDF

9. Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

Author: Cherian, Aaron Thomas, primary, Zhou, Keren, additional, Grubisic, Dejan, additional, Meng, Xiaozhu, additional, and Mellor-Crummey, John, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Grubisic, Dejan"'

1. Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

2. Compiler generated feedback for Large Language Models

3. Priority Sampling of Large Language Models for Compilers

4. Large Language Models for Compiler Optimization

5. LoopTune: Optimizing Tensor Computations with Reinforcement Learning

6. Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

7. An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications

8. Measurement and analysis of GPU-accelerated applications with HPCToolkit

9. Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

9 results on '"Grubisic, Dejan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources