781 results on '"Ion Stoica"'
Search Results
2. Locality-aware Fair Scheduling in LLM Serving.
3. Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards.
4. The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution.
5. FogROS2-FT: Fault Tolerant Cloud Robotics.
6. Revisiting Cache Freshness for Emerging Real-Time Applications.
7. Fairness in Serving Large Language Models.
8. ZKML: An Optimizing System for ML Inference in Zero-Knowledge Proofs.
9. Starburst: A Cost-aware Scheduler for Hybrid Cloud.
10. Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.
11. Can't Be Late: Optimizing Spot Instance Savings under Deadlines.
12. Cloudcast: High-Throughput, Cost-Aware Overlay Multicast in the Cloud.
13. Towards Optimal Transaction Scheduling.
14. Composing MPC With LQR and Neural Network for Amortized Efficiency and Stable Control.
15. Are More LLM Calls All You Need? Towards the Scaling Properties of Compound AI Systems.
16. Crafting Interpretable Embeddings for Language Neuroscience by Asking LLMs Questions.
17. SGLang: Efficient Execution of Structured Language Model Programs.
18. Efficient LLM Scheduling by Learning to Rank.
19. Stylus: Automatic Adapter Selection for Diffusion Models.
20. R2E: Turning any Github Repository into a Programming Agent Environment.
21. MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving.
22. Break the Sequential Dependency of LLM Inference Using Lookahead Decoding.
23. Online Speculative Decoding.
24. Trustless Audits without Revealing Data or Models.
25. Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.
26. LLM-Assisted Code Cleaning For Training Accurate Code Generators.
27. LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
28. SLoRA: Scalable Serving of Thousands of LoRA Adapters.
29. AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
30. Take Out the TraChe: Maximizing (Tra)nsactional Ca(che) Hit Rate.
31. ExoFlow: A Universal Workflow System for Exactly-Once DAGs.
32. Cilantro: Performance-Aware Resource Allocation for General Objectives via Online Feedback.
33. Leveraging Cloud Computing to Make Autonomous Vehicles Safer.
34. Efficient Memory Management for Large Language Model Serving with PagedAttention.
35. SkyPilot: An Intercloud Broker for Sky Computing.
36. Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays.
37. SHEPHERD: Serving DNNs in the Wild.
38. Exoshuffle: An Extensible Shuffle Architecture.
39. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
40. CLUTR: Curriculum Learning via Unsupervised Task Representation Learning.
41. FogROS2: An Adaptive Platform for Cloud and Fog Robotics Using ROS 2.
42. Pie: Pooling CPU Memory for LLM Inference.
43. Specifications: The missing link to making the development of LLM systems an engineering discipline.
44. A Statistical Framework for Ranking LLM-Based Chatbots.
45. VisionArena: 230K Real World User-VLM Conversations with Preference Labels.
46. GameArena: Evaluating LLM Reasoning through Live Computer Games.
47. BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching.
48. MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs.
49. HashAttention: Semantic Sparsity for Faster Inference.
50. NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.