Search

Your search keyword '"Jia, Zhihao"' showing total 35 results

Search Constraints

Start Over You searched for: Author "Jia, Zhihao" Remove constraint Author: "Jia, Zhihao" Publication Type Reports Remove constraint Publication Type: Reports
35 results on '"Jia, Zhihao"'

Search Results

1. Communication Bounds for the Distributed Experts Problem

2. A System for Microserving of LLMs

3. SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference

4. MagicPIG: LSH Sampling for Efficient LLM Generation

5. TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

6. Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs (Extended Version)

7. GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

8. SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

9. Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

10. Mirage: A Multi-Level Superoptimizer for Tensor Programs

11. Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances

12. FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

13. Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

14. Accelerating Retrieval-Augmented Language Model Serving with Speculation

15. Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

16. Drone-NeRF: Efficient NeRF Based 3D Scene Reconstruction for Large-Scale Drone Survey

17. Quarl: A Learning-Based Quantum Circuit Optimizer

18. SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

19. Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks

20. OLLIE: Derivation-based Tensor Program Optimizer

21. BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs

22. Optimizing Mixture of Experts using Dynamic Recompilations

23. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

24. Quartz: Superoptimization of Quantum Circuits (Extended Version)

25. TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

26. Quanto: Optimizing Quantum Circuits with Automatic Generation of Circuit Identities

27. Collage: Seamless Integration of Deep Learning Backends with Automatic Placement

28. TOD: GPU-accelerated Outlier Detection via Tensor Operations

29. GradSign: Model Performance Inference with Theoretical Insights

30. Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads

31. Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

32. IOS: Inter-Operator Scheduler for CNN Acceleration

33. Redundancy-Free Computation Graphs for Graph Neural Networks

34. Beyond Data and Model Parallelism for Deep Neural Networks

35. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

Catalog

Books, media, physical & digital resources