109 results on '"Tushar Krishna"'
Search Results
2. Understanding Performance Implications of LLM Inference on CPUs.
3. Special Session: Neuro-Symbolic Architecture Meets Large Language Models: A Memory-Centric Perspective.
4. CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-training Quantization of ViTs.
5. Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference.
6. Towards a Standardized Representation for Deep Learning Collective Algorithms.
7. FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching.
8. H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations.
9. Towards Cognitive AI Systems: Workload and Characterization of Neuro-Symbolic AI.
10. LIBRA: Enabling Workload-Aware Multi-Dimensional Network Topology Optimization for Distributed Training of Large AI Models.
11. Leveraging Memory Expansion to Accelerate Large-Scale DL Training.
12. Accurate Low-Degree Polynomial Approximation of Non-Polynomial Operators for Fast Private Inference in Homomorphic Encryption.
13. Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing.
14. FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks.
15. SNATCH: Stealing Neural Network Architecture from ML Accelerator in Intelligent Sensors.
16. VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.
17. Proteus : HLS-based NoC Generator and Simulator.
18. AIrchitect: Automating Hardware Architecture and Mapping Optimization.
19. Efficient Distributed Inference of Deep Neural Networks via Restructuring and Pruning.
20. Characterization of Data Compression in Datacenters.
21. ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale.
22. Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators.
23. Impact of RoCE Congestion Control Policies on Distributed Training of DNNs.
24. Themis: a network bandwidth-aware collective scheduling policy for distributed training of DL models.
25. Stay in your Lane: A NoC with Low-overhead Multi-packet Bypassing.
26. MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores.
27. Demystifying Map Space Exploration for NPUs.
28. DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators.
29. MicroEdge: a multi-tenant edge cluster system architecture for scalable camera processing.
30. Self adaptive reconfigurable arrays (SARA): learning flexible GEMM accelerator configuration and mapping-space using ML.
31. XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse.
32. Subgraph Stationary Hardware-Software Inference Co-Design.
33. Understanding Data Compression in Warehouse-Scale Datacenter Services.
34. Technology-aware Router Architectures for On-Chip-Networks in Heterogeneous Technologies.
35. Extending Sparse Tensor Accelerators to Support Multiple Compression Formats.
36. DUB: dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads.
37. A novel network fabric for efficient spatio-temporal reduction in flexible DNN accelerators.
38. Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms.
39. Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators.
40. Pitstop: Enabling a Virtual Network Free Network-on-Chip.
41. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads.
42. STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators.
43. Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package.
44. Bridging the Frequency Gap in Heterogeneous 3D SoCs through Technology-Specific NoC Router Architectures.
45. Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators.
46. E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge Device.
47. RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU.
48. Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.
49. Breaking Barriers: Maximizing Array Utilization for Compute in-Memory Fabrics.
50. Statistical Array Allocation and Partitioning for Compute In-Memory Fabrics.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.