759 results on '"Ion Stoica"'
Search Results
2. Fairness in Serving Large Language Models.
3. ZKML: An Optimizing System for ML Inference in Zero-Knowledge Proofs.
4. Starburst: A Cost-aware Scheduler for Hybrid Cloud.
5. Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.
6. Can't Be Late: Optimizing Spot Instance Savings under Deadlines.
7. Cloudcast: High-Throughput, Cost-Aware Overlay Multicast in the Cloud.
8. Towards Optimal Transaction Scheduling.
9. Composing MPC With LQR and Neural Network for Amortized Efficiency and Stable Control.
10. R2E: Turning any Github Repository into a Programming Agent Environment.
11. MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving.
12. Break the Sequential Dependency of LLM Inference Using Lookahead Decoding.
13. Online Speculative Decoding.
14. Trustless Audits without Revealing Data or Models.
15. Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.
16. LLM-Assisted Code Cleaning For Training Accurate Code Generators.
17. LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
18. SLoRA: Scalable Serving of Thousands of LoRA Adapters.
19. Managing Bandwidth: The Key to Cloud-Assisted Autonomous Driving.
20. JudgeBench: A Benchmark for Evaluating LLM-based Judges.
21. How to Evaluate Reward Models for RLHF.
22. Efficient LLM Scheduling by Learning to Rank.
23. Post-Training Sparse Attention with Double Sparsity.
24. MPC-Minimized Secure LLM Inference.
25. Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design.
26. GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications.
27. Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.
28. Crafting Interpretable Embeddings by Asking LLMs Questions.
29. From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline.
30. OR-Bench: An Over-Refusal Benchmark for Large Language Models.
31. depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers.
32. Stylus: Automatic Adapter Selection for Diffusion Models.
33. Optimizing LLM Queries in Relational Workloads.
34. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code.
35. MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving.
36. RAFT: Adapting Language Model to Domain Specific RAG.
37. RouteLLM: Learning to Route LLMs with Preference Data.
38. Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems.
39. M\'elange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity.
40. AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
41. Take Out the TraChe: Maximizing (Tra)nsactional Ca(che) Hit Rate.
42. ExoFlow: A Universal Workflow System for Exactly-Once DAGs.
43. Cilantro: Performance-Aware Resource Allocation for General Objectives via Online Feedback.
44. Leveraging Cloud Computing to Make Autonomous Vehicles Safer.
45. Efficient Memory Management for Large Language Model Serving with PagedAttention.
46. SkyPilot: An Intercloud Broker for Sky Computing.
47. Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays.
48. SHEPHERD: Serving DNNs in the Wild.
49. Exoshuffle: An Extensible Shuffle Architecture.
50. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.