17 results on '"A. K. Kapoor"'
Search Results
2. CAPMIG: Coherence-Aware Block Placement and Migration in Multiretention STT-RAM Caches
- Author
-
Sheel Sindhu Manohar and Hemangee K. Kapoor
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2023
- Full Text
- View/download PDF
3. Pop-Crypt: Identification and Management of <u>Pop</u>ular Words for Enhancing Lifetime of En<u>Crypt</u>ed Nonvolatile Main Memories
- Author
-
Arijit Nath and Hemangee K. Kapoor
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
- Full Text
- View/download PDF
4. nZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs
- Author
-
Palash Das and Hemangee K. Kapoor
- Subjects
Hybrid Memory Cube ,Exploit ,Computer science ,Dataflow ,Feature extraction ,02 engineering and technology ,Energy consumption ,Parallel computing ,Computer Graphics and Computer-Aided Design ,Convolutional neural network ,020202 computer hardware & architecture ,Parallel processing (DSP implementation) ,Encoding (memory) ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Software - Abstract
Convolutional neural networks (CNNs) are one of the most popular machine learning tools for computer vision. The ubiquitous use in several applications with its high computation-cost has made it lucrative for optimization through accelerated architecture. State-of-the-art has either exploited the parallelism of CNNs, or eliminated computations through sparsity or used near-memory processing (NMP) to accelerate the CNNs. We introduce NMP-fully sparse architecture, which acquires all three capabilities. The proposed architecture is parallel and hence processes the independent CNN tasks concurrently. To exploit the sparsity, the proposed system employs a dataflow, namely, Near-3D-Memory Zero Skipping Parallel dataflow or nZESPA dataflow. This dataflow maintains the compressed-sparse encoding of data that skips all ineffectual zero-valued computations of CNNs. We design a custom accelerator which employs the nZESPA dataflow. The grids of nZESPA modules are integrated into the logic layer of the hybrid memory cube. This integration saves a significant amount of off-chip communications while implementing the concept of NMP. We compare the proposed architecture with three other architectures which either do not exploit sparsity (NMP-dense) or do not employ NMP (traditional-fully sparse) or do not include both (traditional-dense). The proposed system outperforms the baselines in terms of performance and energy consumption while executing CNN inference.
- Published
- 2021
- Full Text
- View/download PDF
5. Investigating Frequency Scaling, Nonvolatile, and Hybrid Memory Technologies for On-Chip Routers to Support the Era of Dark Silicon
- Author
-
Hemangee K. Kapoor and Khushboo Rani
- Subjects
Router ,Network packet ,business.industry ,Computer science ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Power budget ,020202 computer hardware & architecture ,Network on a chip ,Dark silicon ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,System on a chip ,Static random-access memory ,Electrical and Electronic Engineering ,Frequency scaling ,business ,Software ,Computer network - Abstract
In the era of dark silicon, several components on the chip [i.e., cores, memory, and network on chip (NoC)] need to be powered-off or run in low-power mode. This is mainly due to the increased leakage power consumption at smaller technology nodes. Other than the power consumed by cores and caches, power and performance of the interconnects is a significant factor as the communication network consumes a considerable share of the power budget. In particular, the buffers used at every port of the NoC router consume considerable dynamic as well as static power. To support dark silicon and save energy, a popular approach is to power off the routers and wake them up when needed. However, this affects the packet latency, and we need to observe the traffic through the nodes to decide turning the routers ON–OFF. In this article, we propose to keep the routers always powered ON to maintain constant connectivity and investigate various approaches. One proposal is to frequency scale the routers connected to powered OFF nodes, and the other proposals are to use a combination of SRAM and nonvolatile spin-transfer torque random access memory-based VCs in the routers. By managing which VCs to be active at a given time, we achieve energy savings. The proposals are evaluated by varying the percentage of dark nodes on the chip. The experimental results show that all proposals yield significant energy savings while maintaining connectivity.
- Published
- 2021
- Full Text
- View/download PDF
6. Improving the Lifetime of Non-Volatile Cache by Write Restriction
- Author
-
Hemangee K. Kapoor and Sukarn Agarwal
- Subjects
Hardware_MEMORYSTRUCTURES ,Memory hierarchy ,Computer science ,Parallel computing ,Partition (database) ,Theoretical Computer Science ,Reduction (complexity) ,Set (abstract data type) ,Non-volatile memory ,Computational Theory and Mathematics ,Hardware and Architecture ,Microsoft Windows ,Cache ,Static random-access memory ,Software - Abstract
The attractive features such as low static power and high density exhibited by the Non-Volatile Memory (NVM) technologies makes them a promising candidate in the memory hierarchy, including caches. However, the limited write endurance with the write variations governed by the access patterns and the applied replacement policies reduce the chance of NVMs as a successor of SRAM. These write variations are of concern as they not only breakdown the NVM cells but also reduce the effective lifetime. This paper proposes efficient techniques to mitigate the intra-set write variation to improve the lifetime of the NVM cache. Our first two techniques partition the cache into windows of equal size and distribute the writes uniformly across the cache set by employing the window as write-restricted or read-only. The selection of the window in these techniques is by rotation or with the help of counters. In our third technique, different cache ways are employed as a write-restricted over the period of execution to distribute the writes uniformly. Experimental results using full system simulation show the significant reduction in intra-set write variation along with improvement in the cache lifetime.
- Published
- 2019
- Full Text
- View/download PDF
7. Write Variation Aware Buffer Assignment for Improved Lifetime of Non-Volatile Buffers in On-Chip Interconnects
- Author
-
Khushboo Rani and Hemangee K. Kapoor
- Subjects
Router ,Interconnection ,Random access memory ,Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,02 engineering and technology ,Telecommunications network ,Power budget ,020202 computer hardware & architecture ,Power (physics) ,Non-volatile memory ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,System on a chip ,Electrical and Electronic Engineering ,Resource management (computing) ,business ,Software ,Computer network - Abstract
With multiple cores integrated on the same die, communication across cores is managed by on-chip interconnect called network-on-chip (NoC). Power and performance of these interconnect is a significant factor as the communication network consumes a considerable share of the power budget. In particular, the buffers used at every port of the NoC router consume considerable dynamic as well as static power. This paper attempts to reduce static power consumption by using non-volatile memory technology-based spin-transfer torque random access memory (STT-RAM) buffers. STT-RAM technology has the advantage of high density and low leakage but suffers from weaker write endurance. This impacts the lifetime of the router as a whole. The buffers in a router are allocated to virtual networks (VNets) and in-turn to virtual channels (VCs) within each VNet. To reduce uneven writes across the buffers, we propose policies to reduce intra-VNet write variation and inter-VNet write variation. The former performs write variation aware VC allocation in each VNet, and the latter does write variation aware buffer assignments to each VNet. Experimental evaluation on full system simulator shows that proposed policies reduce write variation to almost 0% and improve lifetime by 3.3 and 19.9 times for intra-VNet and inter-VNet, respectively. We also get significant gains in the energy delay product.
- Published
- 2019
- Full Text
- View/download PDF
8. Reuse-Distance-Aware Write-Intensity Prediction of Dataless Entries for Energy-Efficient Hybrid Caches
- Author
-
Hemangee K. Kapoor and Sukarn Agarwal
- Subjects
010302 applied physics ,Hardware_MEMORYSTRUCTURES ,Computer science ,CPU cache ,Cache-only memory architecture ,02 engineering and technology ,Energy consumption ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Non-volatile memory ,Hardware and Architecture ,0103 physical sciences ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Static random-access memory ,Cache ,Electrical and Electronic Engineering ,Software ,Efficient energy use ,Block (data storage) - Abstract
Emerging nonvolatile memory technologies act as a prominent choice for the larger on-chip caches on account of high density, good scalability, and low static power consumption. However, costly write operations reduce their possibility as a successor of SRAM. To mitigate this problem, a spin-transfer torque random-access memory (STT-RAM)-SRAM hybrid cache architecture is proposed. In such cache architectures, allocation of a write-intensive block is the key challenge for energy efficiency. This paper presents a data allocation policy that reduces the number of writes and energy consumption of the STT-RAM region in the last-level cache by considering the existence of private blocks. Dataless entries are allocated in STT region for such private blocks, and actual data is written only when the block is written back from L1. Heavily written blocks are subsequently migrated to SRAM region. We also present a predictor that helps to redirect the write backs from L1 of dataless entries directly to SRAM region, depending on the predicted reuse-distance-aware write intensity. Experimental evaluation shows that this technique reduces the energy consumption by 34.3% (19.6%) and 23.3% (14.1%), respectively, over two existing techniques in the case of dual (quad) core system.
- Published
- 2018
- Full Text
- View/download PDF
9. Analysing the Role of Last Level Caches in Controlling Chip Temperature
- Author
-
Hemangee K. Kapoor and Shounak Chakraborty
- Subjects
Hardware_MEMORYSTRUCTURES ,Control and Optimization ,Temperature control ,Renewable Energy, Sustainability and the Environment ,CPU cache ,Computer science ,business.industry ,Hardware_PERFORMANCEANDRELIABILITY ,Chip ,Power (physics) ,Reduction (complexity) ,Task (computing) ,Computational Theory and Mathematics ,Hardware and Architecture ,Embedded system ,Hardware_INTEGRATEDCIRCUITS ,System on a chip ,Cache ,business ,Software - Abstract
Dynamic Thermal Management (DTM) has become a major concern for the chip-designers, as it becomes a challenging task in recent power densed high performance Chip Multi-Processors (CMPs), due to integration of more on-chip components to meet ever increasing demand of processing power. The increased chip temperature incorporates severe circuit errors along with significant increment in leakage power consumption. Traditional DTM techniques apply DVFS or task migration to reduce core temperature, as cores are considered as the hottest on-chip components. Additionally, to commensurate high data demand of these high performance cores, large on-chip Last Level Caches (LLCs) are attached, which are the principal contributors to the on-chip leakage power consumption and occupy the largest on-chip area. As power consumption reduction plays the pivotal role in temperature reduction, hence, this work dynamically shrinks the cache size not only to reduce leakage power consumption, but also, to create on-chip thermal buffers for reducing average chip temperature by exploiting the heat transfer physics. Cache resizing decisions are taken based upon the generated cache hotspots and/or the access patterns, during process execution. Simulation results of the proposed thermal management method are compared with an existing DVFS based method (at cores) and a prior drowsy cache based technique to show its effectiveness.
- Published
- 2018
- Full Text
- View/download PDF
10. ALAMNI: Adaptive LookAside Memory based Near-Memory Inference Engine for Eliminating Multiplications in Real-Time
- Author
-
Palash Das, Shashank Sharma, and Hemangee K. Kapoor
- Subjects
Computational Theory and Mathematics ,Hardware and Architecture ,Software ,Theoretical Computer Science - Published
- 2022
- Full Text
- View/download PDF
11. Dynamic Associativity Management in Tiled CMPs by Runtime Adaptation of Fellow Sets
- Author
-
Shirshendu Das and Hemangee K. Kapoor
- Subjects
010302 applied physics ,Computer science ,Cycles per instruction ,02 engineering and technology ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Set (abstract data type) ,Computational Theory and Mathematics ,Hardware and Architecture ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Cache ,Adaptation (computer science) ,Associative property - Abstract
The non-uniform distribution of memory accesses among the cache sets results in some sets being used heavily while certain others remaining underutilized. Dynamic associativity management (DAM) is a technique to allow the heavily used sets to distribute their load among the lightly used sets thus improving the overall utilization of the cache. CMP-SVR is a previously proposed DAM based technique, where each set is divided into two sections: normal storage (NT) and reserve storage (RT). Some number of ways (25 to 50 percent) from each set are reserved for RT and the remaining ways belong to NT. The sets are divided into groups called fellow-groups and a set can use the reserve-ways of its fellow sets to increase its associativity during execution. Though CMP-SVR improves performance the formation of its fellow-groups is static: once created it never changes. It has been observed that some fellow-groups have more number of heavily used sets than the other fellow-groups. As a result the cache loads are not uniformly distributed among the fellow-groups. Also the behavior of sets changes dynamically: a lightly used set may become heavily used after a number of execution cycles. This paper studies the behavior of each set in detail and proposes a DAM based technique which improves the performance compared to other DAM based techniques. The proposed technique called FS-DAM dynamically creates fellow-groups based on the current set loads ensuring that the heavily used sets are evenly distributed among all the fellow-groups. Such distribution increases the utilization of the cache and hence improves performance. Full system simulation shows an average of 6.62 and 16.74 percent improvements, in FS-DAM as compared to CMP-SVR, in terms of CPI (Cycles Per Instruction) and MPKI (Miss Per Thousand Instructions) respectively. Comparing with Z-Cache the improvements are 6.21 percent (CPI) and 14.65 percent (MPKI). The proposed policy also shows better performance over V-Way and SBC.
- Published
- 2017
- Full Text
- View/download PDF
12. SWEL-COFAE : Wear Leveling and Adaptive Encoding Assisted Compression of Frequent Words in Non-Volatile Main Memories
- Author
-
Hemangee K. Kapoor and Arijit Nath
- Subjects
Adaptive encoding ,Computational Theory and Mathematics ,Hardware and Architecture ,Computer science ,business.industry ,Computer vision ,Artificial intelligence ,Compression (physics) ,business ,Software ,Wear leveling ,Theoretical Computer Science - Published
- 2021
- Full Text
- View/download PDF
13. Formal Approach for DVS-Based Power Management for Multiple Server System in Presence of Server Failure and Repair
- Author
-
Hemangee K. Kapoor and L. Chandnani
- Subjects
Power management ,Queueing theory ,Control and Systems Engineering ,Computer science ,Robustness (computer science) ,Server ,Real-time computing ,Multiprocessing ,Electrical and Electronic Engineering ,Computer Science Applications ,Information Systems ,Reliability engineering ,Power optimization - Abstract
The paper presents a DVS-based power management policy for multiprocessor systems. The aim is to optimize power consumption by keeping the job loss probability as a system-wide constraint. Optimal values for service rate are computed using an ideal setting where speed can change continuously. As real processors have discrete speed levels, we switch between two nearby speeds to achieve the optimal rate. We develop a formal model of such a system using the probabilistic model checker PRISM and prove properties satisfied by the system. We demonstrated the applicability of the policy on multiple servers and under both kinds of deadlines: DBS and DES. For a constraint value of 25%, the DBS model achieved power savings of 29.46% in theoretical, 8.75% in actual, and 7.23% in leakage power. The DES model achieved power savings of 30% in theoretical, 11.9% in actual, and 8.7% in leakage power. For robustness, a repair facility was used which can have repairmen varying from one to the total number of servers.
- Published
- 2013
- Full Text
- View/download PDF
14. $\hbox{BCl}_{3}/\hbox{Cl}_{2}$-Based Inductively Coupled Plasma Etching of GaN/AlGaN Using Photoresist Mask
- Author
-
Vanita R. Agarwal, Rangarajan Muralidharan, Hitendra K. Malik, Dipendra Singh Rawal, B. K. Sehgal, and A. K. Kapoor
- Subjects
Nuclear and High Energy Physics ,Materials science ,business.industry ,Wide-bandgap semiconductor ,Gallium nitride ,High-electron-mobility transistor ,Photoresist ,Condensed Matter Physics ,chemistry.chemical_compound ,chemistry ,Etching (microfabrication) ,Aluminium gallium nitride ,Optoelectronics ,Wafer ,Inductively coupled plasma ,business - Abstract
Gallium nitride/aluminium gallium nitride (GaN/AlGaN) etching in BCl3/Cl2-based inductively coupled plasma (ICP) is investigated for high electron mobility transistor (HEMT) mesa etching using rarely preferred mask-photoresist. The critical issues related to photresist burning/deforming, resist removal, selectivity, mesa edge roughening, and nonuniform etching of GaN and AlGaN layers are discussed in detail using plasma of BCl3/ Cl2 gases. The effect of ICP process parameters like ICP power, RF power, pressure, and BCl3/Cl2 flow rate ratio on etch rate of GaN/AlGaN layers and mask is studied systematically for the optimization of a HEMT mesa etching process that results in smooth etched surface with sharp and highly anisotropic mesa edges. The photoresist mask selectivity is found to depend strongly on pressure and RF power, whereas the etched surface morphology changes significantly with the gas flow rate ratio and chamber pressure. The AlGaN etch rate and selectivity with respect to GaN is also characterized for different Al concentrations varying up to 33%. The etch process is finally applied to GaN/AlGaN HEMT mesa etching, where the mesa features with depth of ~ 1500 A° are etched successfully. The resultant process etch uniformity is found to be better than 5% over 2-in wafer.
- Published
- 2012
- Full Text
- View/download PDF
15. A Process Algebraic View of Latency-Insensitive Systems
- Author
-
Hemangee K. Kapoor
- Subjects
High-level verification ,Finite-state machine ,Computer science ,Process calculus ,Liveness ,Communicating sequential processes ,Deadlock ,Theoretical Computer Science ,Computational Theory and Mathematics ,Hardware and Architecture ,Formal specification ,Formal language ,Theory of computation ,Algorithm ,Formal verification ,Equivalence (measure theory) ,computer ,Software ,computer.programming_language - Abstract
Latency-insensitive (LI) systems are those which can function correctly in spite of delays along its connecting wires. This delay is assumed to be a multiple of the clock period. The paper presents a single-clock process algebraic model for such systems. It gives the definitions for LI computational blocks and LI connectors. Important properties for these are shown to be satisfied. Composition of such modules can be done by the parallel composition operator of the process algebra. Conditions are given to check for liveness and deadlock freedom of LI systems. Comparison of latency equivalence between streams of events can be done using the model and this leads to a method of proving latency-equivalent modules. The paper is a step toward high-level specification and verification of such systems. The work can be extended to address more complex interconnections by modeling the underlying finite-state machines.
- Published
- 2009
- Full Text
- View/download PDF
16. Tantalum nitride-p-silicon high-voltage Schottky diodes
- Author
-
Ashok K. Kapoor, M.E. Thomas, M.P. Hartnett, and J.F. Ciacchella
- Subjects
Materials science ,Silicon ,Annealing (metallurgy) ,Contact resistance ,Analytical chemistry ,Schottky diode ,chemistry.chemical_element ,High voltage ,Electronic, Optical and Magnetic Materials ,chemistry.chemical_compound ,Tantalum nitride ,chemistry ,Electronic engineering ,Breakdown voltage ,Electrical and Electronic Engineering ,Diode - Abstract
Rectifying contacts between TaN and p-silicon with very high reverse breakdown voltage (V/sub BR/ >700 V) without using any guard ring have been realized. Barrier heights of TaN to both p-type silicon and n-type silicon have been measured at 0.68 and 0.48 eV, respectively. The breakdown voltage V/sub BR/ of TaN to p-silicon diodes, as deposited, is approximately 400 V and decreases to less than 200 V after annealing in hydrogen at 450 degrees C for 30 min. On the other hand, annealing in a nitrogen ambient at 450 degrees C for 30 min. increases the V/sub BR/ of these diodes to more than 700 V. An explanation for the difference in V/sub BR/ is sought in terms of the structural/chemical changes introduced at the interface by the annealing process. The high forward drop of TaN to p-silicon diodes (>1 V at 10 mA) results from the high substrate resistance and the probe contact resistance, and it is being optimized. >
- Published
- 1988
- Full Text
- View/download PDF
17. A low-barrier Schottky process using MoSi2
- Author
-
M.B. Vora, Ashok K. Kapoor, and M.E. Thomas
- Subjects
Auger electron spectroscopy ,Materials science ,Silicon ,Schottky barrier ,Molybdenum disilicide ,Analytical chemistry ,chemistry.chemical_element ,Infrared spectroscopy ,Schottky diode ,Electronic, Optical and Magnetic Materials ,chemistry.chemical_compound ,chemistry ,Electrical and Electronic Engineering ,Absorption (electromagnetic radiation) ,Diode - Abstract
A technology to produce low-barrier MoSi 2 Schottky diodes for use in LSI bipolar circuits has been developed. Molybdenum disilicide is formed on single-crystal silicon by a self-aligned process under extremely clean conditions. Auger electron spectroscopy (AES) and infrared (IR) absorption techniques are used extensively to monitor the formation and thickness of MoSi 2 films. Capacitance-voltage and current-voltage measurements at varying temperatures are employed to characterize the Schottky barrier, which has a measured potential of 0.66 eV.
- Published
- 1986
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.