Author: "Todd Austin" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. These Aren’t The Caches You’re Looking For: Stochastic Channels on Randomized Caches

Author: Tarunesh Verma, Achilleas Anastasopoulos, and Todd Austin
Published: 2022
Full Text: View/download PDF

2. Sequestered Encryption: A Hardware Technique for Comprehensive Data Privacy

Author: Lauren Biernacki, Meron Zerihun Demissie, Kidus Birkayehu Workneh, Fitsum Assamnew Andargie, and Todd Austin
Published: 2022
Full Text: View/download PDF

3. Software-driven Security Attacks: From Vulnerability Sources to Durable Hardware Defenses

Author: Lauren Biernacki, Shijia Wei, Baris Kasikci, Mark Gallagher, Zhixing Xu, Sharad Malik, Mohit Tiwari, Misiker Tadesse Aga, Austin Harris, and Todd Austin
Subjects: 010302 applied physics, business.industry, Computer science, Vulnerability, 020207 software engineering, 02 engineering and technology, Space (commercial competition), 01 natural sciences, Software, Work (electrical), Hardware and Architecture, Taxonomy (general), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, business, Computer hardware
Abstract: There is an increasing body of work in the area of hardware defenses for software-driven security attacks. A significant challenge in developing these defenses is that the space of security vulnerabilities and exploits is large and not fully understood. This results in specific point defenses that aim to patch particular vulnerabilities. While these defenses are valuable, they are often blindsided by fresh attacks that exploit new vulnerabilities. This article aims to address this issue by suggesting ways to make future defenses more durable based on an organization of security vulnerabilities as they arise throughout the program life cycle. We classify these vulnerability sources through programming, compilation, and hardware realization, and we show how each source introduces unintended states and transitions into the implementation. Further, we show how security exploits gain control by moving the implementation to an unintended state using knowledge of these sources and how defenses work to prevent these transitions. This framework of analyzing vulnerability sources, exploits, and defenses provides insights into developing durable defenses that could defend against broader categories of exploits. We present illustrative case studies of four important attack genealogies—showing how they fit into the presented framework and how the sophistication of the exploits and defenses have evolved over time, providing us insights for the future.
Published: 2021
Full Text: View/download PDF

4. PriMax

Author: Nicholas Wendt, Todd Austin, and Valeria Bertacco
Published: 2022
Full Text: View/download PDF

5. Author response for 'An independent analysis of bias sources and variability in wind plant pre-construction energy yield estimation methods'

Author: null Todd, Austin C., null Optis, Mike, null Bodini, Nicola, null Fields, Michael Jason, null Perr-Sauer, Jordan, null Lee, Joseph C. Y., null Simley, Eric, and null Hammond, Robert
Published: 2022
Full Text: View/download PDF

6. Twine: A Chisel Extension for Component-Level Heterogeneous Design

Author: Shibo Chen, Yonathan Fisseha, Jean-Baptiste Jeannin, and Todd Austin
Published: 2022
Full Text: View/download PDF

7. Morpheus II: A RISC-V Security Extension for Protecting Vulnerable Software and Hardware

Author: Austin Harris, Tarunesh Verma, Shijia Wei, Lauren Biernacki, Alex Kisil, Misiker Tadesse Aga, Valeria Bertacco, Baris Kasikci, Mohit Tiwari, and Todd Austin
Published: 2021
Full Text: View/download PDF

8. Chopin: Composing Cost-Effective Custom Chips with Algorithmic Chiplets

Author: Pete Ehrett, Todd Austin, and Valeria Bertacco
Published: 2021
Full Text: View/download PDF

9. VIP-Bench: A Benchmark Suite for Evaluating Privacy-Enhanced Computation Frameworks

Author: Kidus Birkayehu Workneh, Meron Zerihun Demissie, Todd Austin, Lauren Biernacki, Fitsum Assamnew Andargie, Brandon Reagen, Plato Gebremedhin, and Galane Basha Namomsa
Subjects: Information privacy, Information sensitivity, business.industry, Computer science, Distributed computing, Computation, Suite, Benchmark (computing), Cryptography, Encryption, business, Software architecture
Abstract: Privacy-enhanced computation enables the processing of encrypted data without exposing underlying sensitive information. Such technologies are extremely promising for the advancement of data privacy, as they remove plaintexts from the attackers’ reach. However, each privacy technology provides varying degrees of computational capabilities and performance overheads, creating challenges for adoption. For example, some publicly available homomorphic encryption schemes are limited in expressiveness or cannot support deep computation without incurring significant overheads. This diversity warrants a benchmark suite that can adequately assess capability and performance while supporting a variety of privacy-enhanced software architectures. We propose VIP-Bench, a benchmark suite designed with varying operational complexity and computational depth to evaluate competing privacy frameworks fairly and directly. VIP-Bench defines a forward-looking privacy-enhanced computation model and then develops under that model an array of privacy-focused benchmarks. The benchmark set is designed to flexibly cover the whole range of expected computational power and capability, enabling VIP-Bench to evaluate the privacy-enhanced computation capabilities of both today and tomorrow.
Published: 2021
Full Text: View/download PDF

10. Morpheus II: A RISC-V Security Extension for Protecting Vulnerable Software and Hardware

Author: Austin Harris, Misiker Tadesse Aga, Valeria Bertacco, Baris Kasikci, Alex Kisil, Tarunesh Verma, Shijia Wei, Todd Austin, and Mohit Tiwari
Subjects: Software, Computer science, business.industry, RISC-V, Operating system, Code injection, Resource (Windows), Extension (predicate logic), business, computer.software_genre, computer
Abstract: • Software protects data • All software is (eventually) hackable • Finding/fixing vulnerabilities doesn’t scale • E.g., Malicious 7: buffer errors, code injection, numeric errors, permissions, resource mgt
Published: 2021
Full Text: View/download PDF

11. ChipAdvisor: A Machine Learning Approach for Mapping Applications to Heterogeneous Systems

Author: Valeria Bertacco, Todd Austin, Hiwot Tadese Kassa, and Tarunesh Verma
Subjects: Data access, Computer engineering, Computer science, Computation, Leverage (statistics), Field-programmable gate array, Oracle, Microarchitecture, Efficient energy use, Reusability
Abstract: While hardware accelerators provide significant performance and energy improvements over general-purpose processors, their limited reusability incurs high design costs. It is thus impractical to have a unique accelerator for each application. Hence, it is critical to develop solutions that can leverage the accelerators available to the best of their capabilities for a wide range of applications. In this paper, we note the common computation, data access, and communication patterns of applications, and based on these patterns, we identify significant intrinsic properties across applications. We then correlate these properties with the unique microarchitectural properties of the compute platforms available and develop a framework, ChipAdvisor, to predict the platform that provides the best performance and energy efficiency for an application. We evaluate ChipAdvisor for applications from several domains, targeting CPUs, GPUs, and FPGAs as example compute platforms. ChipAdvisor achieves an accuracy of up to 98% in predicting the best performing platform, and 94% in predicting the most energy-efficient one, compared to an oracle analysis, that is, one which always selects the best platform for all applications.
Published: 2021
Full Text: View/download PDF

12. A Defense-Inspired Benchmark Suite

Author: Valeria Bertacco, Adrian Berding, Pranav Srinivasan, Pete Ehrett, John Paul Koenig, Todd Austin, Bing Schaefer, and Nathan Block
Subjects: 010302 applied physics, Signal processing, Process (engineering), Computer science, business.industry, Suite, Image processing, 02 engineering and technology, 01 natural sciences, 020202 computer hardware & architecture, Software, Computer architecture, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Systems design, Enhanced Data Rates for GSM Evolution, business
Abstract: This work previews the MilSpec suite, a diverse collection of benchmarks targeting heterogeneous embedded/edge systems. These end-to-end and kernel-level benchmarks are inspired by defense-related applications and applicable to the broader system design space. They exercise a wide range of computational capabilities, such as signal processing, image processing/computer vision, and matrix-based computation, which are particularly relevant for modern embedded applications that must process large amounts of data at the edge.
Published: 2021
Full Text: View/download PDF

13. Thwarting Control Plane Attacks with Displaced and Dilated Address Spaces

Author: Valeria Bertacco, Mark Gallagher, Lauren Biernacki, and Todd Austin
Subjects: 010302 applied physics, Exploit, Computer science, Address space, business.industry, 02 engineering and technology, Computer security, computer.software_genre, 01 natural sciences, 020202 computer hardware & architecture, Software, Pointer (computer programming), Gadget, 0103 physical sciences, Obfuscation, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Entropy (information theory), business, computer
Abstract: To maintain the control-flow integrity of today’s machines, code pointers must be protected. Exploits forge and manipulate code pointers to execute arbitrary, malicious code on a host machine. A corrupted code pointer can effectively redirect program execution to attacker-injected code or existing code gadgets, giving attackers the necessary foothold to circumvent system protections. To combat this class of exploits, we employ a Displaced and Dilated Address Space (DDAS), which uses a novel address space inflation mechanism to obfuscate code pointers, code locations, and the relative distance between code objects. By leveraging runtime re-randomization and custom hardware, we are able to achieve a high-entropy control-flow defense with performance overheads well below 5% and similarly low power and silicon area overheads. With DDAS in force, attackers come up against 63 bits of entropy when forging absolute addresses and 18 to 55 bits of entropy for relative addresses, depending on the distance to the desired code gadget. Moreover, an incorrectly forged code address will result in a security exception with a probability greater than 99.996%. Using hardware-based address obfuscation, we provide significantly higher entropy at lower performance overheads than previous software techniques, and our re-randomization mechanism offers additional protections against possible pointer disclosures.
Published: 2020
Full Text: View/download PDF

14. Slide-free MUSE Microscopy to H&E Histology Modality Conversion via Unpaired Image-to-Image Translation GAN Models

Author: Abraham, Tanishq, Shaw, Andrew, O'Connor, Daniel, Todd, Austin, and Levenson, Richard
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: MUSE is a novel slide-free imaging technique for histological examination of tissues that can serve as an alternative to traditional histology. In order to bridge the gap between MUSE and traditional histology, we aim to convert MUSE images to resemble authentic hematoxylin- and eosin-stained (H&E) images. We evaluated four models: a non-machine-learning-based color-mapping unmixing-based tool, CycleGAN, DualGAN, and GANILLA. CycleGAN and GANILLA provided visually compelling results that appropriately transferred H&E style and preserved MUSE content. Based on training an automated critic on real and generated H&E images, we determined that CycleGAN demonstrated the best performance. We have also found that MUSE color inversion may be a necessary step for accurate modality conversion to H&E. We believe that our MUSE-to-H&E model can help improve adoption of novel slide-free methods by bridging a perceptual gap between MUSE imaging and traditional histology., Comment: 4 pages plus 1 page references. Presented at the ICML Computational Biology Workshop 2020
Published: 2020

15. Exploiting the analog properties of digital circuits for malicious hardware

Author: Matthew Hicks, Qing Dong, Kaiyuan Yang, Dennis Sylvester, and Todd Austin
Subjects: Digital electronics, General Computer Science, Analogue electronics, Computer science, business.industry, Transistor, Hardware_PERFORMANCEANDRELIABILITY, 02 engineering and technology, Integrated circuit design, 020202 computer hardware & architecture, law.invention, Capacitor, law, Embedded system, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Computer hardware
Abstract: While the move to smaller transistors has been a boon for performance it has dramatically increased the cost to fabricate chips using those smaller transistors. This forces the vast majority of chip design companies to trust a third party---often overseas---to fabricate their design. To guard against shipping chips with errors (intentional or otherwise) chip design companies rely on post-fabrication testing. Unfortunately, this type of testing leaves the door open to malicious modifications since attackers can craft attack triggers requiring a sequence of unlikely events, which will never be encountered by even the most diligent tester. In this paper, we show how a fabrication-time attacker can leverage analog circuits to create a hardware attack that is small (i.e., requires as little as one gate) and stealthy (i.e., requires an unlikely trigger sequence before affecting a chip's functionality). In the open spaces of an already placed and routed design, we construct a circuit that uses capacitors to siphon charge from nearby wires as they transit between digital values. When the capacitors are fully charged, they deploy an attack that forces a victim flip-flop to a desired value. We weaponize this attack into a remotely controllable privilege escalation by attaching the capacitor to a controllable wire and by selecting a victim flip-flop that holds the privilege bit for our processor. We implement this attack in an OR1200 processor and fabricate a chip. Experimental results show that the purposed attack works. It eludes activation by a diverse set of benchmarks and evades known defenses.
Published: 2017
Full Text: View/download PDF

16. Slide-free MUSE Microscopy to H&E Histology Modality Conversion via Unpaired Image-to-Image Translation GAN Models

Author: Abraham, Tanishq, Shaw, Andrew, O'Connor, Daniel, Todd, Austin, and Levenson, Richard
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, Machine Learning (cs.LG)
Abstract: MUSE is a novel slide-free imaging technique for histological examination of tissues that can serve as an alternative to traditional histology. In order to bridge the gap between MUSE and traditional histology, we aim to convert MUSE images to resemble authentic hematoxylin- and eosin-stained (H&E) images. We evaluated four models: a non-machine-learning-based color-mapping unmixing-based tool, CycleGAN, DualGAN, and GANILLA. CycleGAN and GANILLA provided visually compelling results that appropriately transferred H&E style and preserved MUSE content. Based on training an automated critic on real and generated H&E images, we determined that CycleGAN demonstrated the best performance. We have also found that MUSE color inversion may be a necessary step for accurate modality conversion to H&E. We believe that our MUSE-to-H&E model can help improve adoption of novel slide-free methods by bridging a perceptual gap between MUSE imaging and traditional histology., 4 pages plus 1 page references. Presented at the ICML Computational Biology Workshop 2020
Published: 2020
Full Text: View/download PDF

17. Cyclone

Author: Pranav Kumar, Mohit Tiwari, Todd Austin, Shijia Wei, Austin Harris, and Prateek Sahu
Subjects: 010302 applied physics, business.industry, Address space, Computer science, Bandwidth (signal processing), Speculative execution, 02 engineering and technology, Interference (wave propagation), 01 natural sciences, 020202 computer hardware & architecture, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, State (computer science), Isolation (database systems), Cache, business, Computer network, Communication channel
Abstract: Micro-architecture units like caches are notorious for leaking secrets across security domains. An attacker program can contend for on-chip state or bandwidth and can even use speculative execution in processors to drive this contention; and protecting against all contention-driven attacks is exceptionally challenging. Prior works can mitigate contention channels through caches by partitioning the larger, lower-level caches or by looking for anomalous performance or contention behavior. Neither scales to large number of fine-grained domains as required by browsers and web-services that place many domains within the same address space. We observe that cache contention channels have a unique property - contention leaks information only when it is cyclic, i.e., domain A interferes with domain B, followed by interference from B to A. We propose to use this cyclic interference property to detect micro-architectural attacks as anomalous cyclic interference. Unlike partitioning, our detection approach scales to many concurrent domains in a single address space; and unlike prior anomaly detectors, cyclic interference is robust to noise from benign interference. We track cyclic interference using non-intrusive detectors in an out-of-order core and stress test our prototype, Cyclone, with fine-grained isolation in browsers (against speculation-driven attacks) and coarse-grained isolation of cores (against covert-channels embedded in database and machine learning workloads). Full-system simulations on an ARM micro-architecture show close to perfect detection rates and 260 - 1000× lower false positives than using (state-of-the-art) contention alone, with slowdowns of only ~3.6%.
Published: 2019
Full Text: View/download PDF

18. When two worldviews meet: promoting mutual understanding between ‘secular’ and religious students of Islamic studies in Russia and the United States

Author: Alexander Knysh, Anna Matochkina, Daria Ulanova, Philomena Meechan, and Todd Austin
Published: 2019
Full Text: View/download PDF

19. Morpheus

Author: Salessawi Ferede Yitbarek, Baris Kasikci, Mohit Tiwari, Misiker Tadesse Aga, Austin Harris, Valeria Bertacco, Zhixing Xu, Todd Austin, Sharad Malik, Zelalem Birhanu Aweke, Mark Gallagher, Shibo Chen, and Lauren Biernacki
Subjects: Hardware architecture, business.industry, Computer science, Offensive, 02 engineering and technology, Encryption, Computer security, computer.software_genre, Security testing, 020202 computer hardware & architecture, 020204 information systems, Pointer (computer programming), 0202 electrical engineering, electronic engineering, information engineering, Systems design, Moving target defense, Architecture, business, computer
Abstract: Attacks often succeed by abusing the gap between program and machine-level semantics-- for example, by locating a sensitive pointer, exploiting a bug to overwrite this sensitive data, and hijacking the victim program's execution. In this work, we take secure system design on the offensive by continuously obfuscating information that attackers need but normal programs do not use, such as representation of code and pointers or the exact location of code and data. Our secure hardware architecture, Morpheus, combines two powerful protections: ensembles of moving target defenses and churn. Ensembles of moving target defenses randomize key program values (e.g., relocating pointers and encrypting code and pointers) which forces attackers to extensively probe the system prior to an attack. To ensure attack probes fail, the architecture incorporates churn to transparently re-randomize program values underneath the running system. With frequent churn, systems quickly become impractically difficult to penetrate. We demonstrate Morpheus through a RISC-V-based prototype designed to stop control-flow attacks. Each moving target defense in Morpheus uses hardware support to individually offer more randomness at a lower cost than previous techniques. When ensembled with churn, Morpheus defenses offer strong protection against control-flow attacks, with our security testing and performance studies revealing: i) high-coverage protection for a broad array of control-flow attacks, including protections for advanced attacks and an attack disclosed after the design of Morpheus, and ii) negligible performance impacts (1%) with churn periods up to 50 ms, which our study estimates to be at least 5000x faster than the time necessary to possibly penetrate Morpheus.
Published: 2019
Full Text: View/download PDF

20. SiPterposer: A Fault-Tolerant Substrate for Flexible System-in-Package Design

Author: Valeria Bertacco, Todd Austin, and Pete Ehrett
Subjects: 010302 applied physics, Flexibility (engineering), business.industry, Computer science, Overhead (engineering), Fault tolerance, 02 engineering and technology, 01 natural sciences, 020202 computer hardware & architecture, Power (physics), System in package, Embedded system, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, business
Abstract: As Moore’s Law scaling slows down, specialized heterogeneous designs are needed to sustain computing performance improvements. Unfortunately, the non-recurring engineering (NRE) costs of chip design—designing interconnects, creating masks, etc.—are often prohibitive. Chiplet-based disintegrated design solutions could address these economic issues, but current technologies lack the flexibility to express a rich variety of designs without redesigning the communication substrate. Moreover, as the number of chiplets increases, yield suffers due to 2.5D assembly defects. This work addresses these problems by presenting a flexible communication fabric that supports construction of arbitrary network topologies and provides robust fault-tolerance, demonstrating near-100% chip assembly yield at typical bonding defect rates. We achieve these goals with less than 3% additional power and zero exposed latency overhead for various real-world applications running on an example SiP.
Published: 2019
Full Text: View/download PDF

21. ANVIL

Author: Reetuparna Das, Zelalem Birhanu Aweke, Yossi Oren, Matthew Hicks, Todd Austin, Salessawi Ferede Yitbarek, and Rui Qiao
Subjects: 010302 applied physics, Hardware_MEMORYSTRUCTURES, Computer science, business.industry, Locality, General Medicine, 02 engineering and technology, Computer security, computer.software_genre, Computer Graphics and Computer-Aided Design, 01 natural sciences, Refresh rate, 020202 computer hardware & architecture, Software, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, General Earth and Planetary Sciences, Code injection, Cache, business, computer, Row, Dram, General Environmental Science
Abstract: Ensuring the integrity and security of the memory system is critical. Recent studies have shown serious security concerns due to "rowhammer" attacks, where repeated accesses to a row of memory cause bit flips in adjacent rows. Recent work by Google's Project Zero has shown how to leverage rowhammer-induced bit-flips as the basis for security exploits that include malicious code injection and memory privilege escalation. Being an important security concern, industry has attempted to defend against rowhammer attacks. Deployed defenses employ two strategies: (1) doubling the system DRAM refresh rate and (2) restricting access to the CLFLUSH instruction that attackers use to bypass the cache to increase memory access frequency (i.e., the rate of rowhammering). We demonstrate that such defenses are inadequte: we implement rowhammer attacks that both avoid using the CLFLUSH instruction and cause bit flips with a doubled refresh rate. Our next-generation CLFLUSH-free rowhammer attack bypasses the cache by manipulating cache replacement state to allow frequent misses out of the last-level cache to DRAM rows of our choosing. To protect existing systems from more advanced rowhammer attacks, we develop a software-based defense, ANVIL, which thwarts all known rowhammer attacks on existing systems. ANVIL detects rowhammer attacks by tracking the locality of DRAM accesses using existing hardware performance counters. Our detector identifies the rows being frequently accessed (i.e., the aggressors), then selectively refreshes the nearby victim rows to prevent hammering. Experiments running on real hardware with the SPEC2006 benchmarks show that ANVIL has less than a 1% false positive rate and an average slowdown of 1%. ANVIL is low-cost and robust, and our experiments indicate that it is an effective approach for protecting existing and future systems from even advanced rowhammer attacks.
Published: 2016
Full Text: View/download PDF

22. Smokestack: Thwarting DOP Attacks with Runtime Stack Layout Randomization

Author: Misiker Tadesse Aga and Todd Austin
Published: 2019
Full Text: View/download PDF

23. Wrangling in the Power of Code Pointers with ProxyCFI

Author: Todd Austin, Misiker Tadesse Aga, and Colton Holoday
Subjects: Set (abstract data type), Forcing (recursion theory), Control flow, Computer engineering, Computer science, Path (graph theory), Indirect branch, Code (cryptography), Control flow graph, Power (physics)
Abstract: Despite being a more than 40-year-old dark art, control flow attacks remain a significant and attractive means of penetrating applications. Control Flow Integrity (CFI) prevents control flow attacks by forcing the execution path of a program to follow the control flow graph (CFG). This is performed by inserting checks before indirect jumps to ensure that the target is within a statically determined valid target set. However, recent advanced control flow attacks have been shown to undermine prior CFI techniques by swapping targets of an indirect jump with another one from the valid set.
Published: 2019
Full Text: View/download PDF

24. VLSI-SoC: Design and Engineering of Electronics Systems Based on New Computing Paradigms

Author: Masahiro Fujita, Nicola Bombieri, Todd Austin, Ricardo Reis, and Graziano Pravadelli
Subjects: Very-large-scale integration, Computer architecture, Computer science, Electronics
Published: 2019
Full Text: View/download PDF

25. Vulnerability-tolerant secure architectures

Author: Todd Austin, Sharad Malik, Mohit Tiwari, Valeria Bertacco, and Baris Kasikci
Subjects: Computer science, Offensive, Vulnerability, 020207 software engineering, 02 engineering and technology, Computer security, computer.software_genre, 020202 computer hardware & architecture, Immune system, Software bug, 0202 electrical engineering, electronic engineering, information engineering, Systems design, State (computer science), Speculation, computer
Abstract: Today, secure systems are built by identifying potential vulnerabilities and then adding protections to thwart the associated attacks. Unfortunately, the complexity of today's systems makes it impossible to prove that all attacks are stopped, so clever attackers find a way around even the most carefully designed protections. In this article, we take a sobering look at the state of secure system design, and ask ourselves why the "security arms race" never ends? The answer lies in our inability to develop adequate security verification technologies. We then examine an advanced defensive system in nature - the human immune system - and we discover that it does not remove vulnerabilities, rather it adds offensive measures to protect the body when its vulnerabilities are penetrated We close the article with brief speculation on how the human immune system could inspire more capable secure system designs.
Published: 2018
Full Text: View/download PDF

26. SWAN

Author: Pete Ehrett, Timothy Linscott, Todd Austin, and Valeria Bertacco
Subjects: Computer science, business.industry, media_common.quotation_subject, Overhead (engineering), 0211 other engineering and technologies, 02 engineering and technology, Ambiguity, 020202 computer hardware & architecture, Trojan, 0202 electrical engineering, electronic engineering, information engineering, Decoy, business, Computer hardware, 021106 design practice & management, media_common
Abstract: For the past decade, security experts have warned that malicious engineers could modify hardware designs to include hardware back-doors (trojans), which, in turn, could grant attackers full control over a system. Proposed defenses to detect these attacks have been outpaced by the development of increasingly small, but equally dangerous, trojans. To thwart trojan-based attacks, we propose a novel architecture that maps the security-critical portions of a processor design to a one-time programmable, LUT-free fabric. The programmable fabric is automatically generated by analyzing the HDL of targeted modules. We present our tools to generate the fabric and map functionally equivalent designs onto the fabric. By having a trusted party randomly select a mapping and configure each chip, we prevent an attacker from knowing the physical location of targeted signals at manufacturing time. In addition, we provide decoy options (canaries) for the mapping of security-critical signals, such that hardware trojans hitting a decoy are thwarted and exposed. Using this defense approach, any trojan capable of analyzing the entire configurable fabric must employ complex logic functions with a large silicon footprint, thus exposing it to detection by inspection. We evaluated our solution on a RISC-V BOOM processor and demonstrated that, by providing the ability to map each critical signal to 6 distinct locations on the chip, we can reduce the chance of attack success by an undetectable trojan by 99%, incurring only a 27% area overhead.
Published: 2018
Full Text: View/download PDF

27. Reducing the overhead of authenticated memory encryption using delta encoding and ECC memory

Author: Salessawi Ferede Yitbarek and Todd Austin
Subjects: 010302 applied physics, Random access memory, Hardware_MEMORYSTRUCTURES, Delta encoding, business.industry, Computer science, 02 engineering and technology, Encryption, 01 natural sciences, 020202 computer hardware & architecture, law.invention, ECC memory, Memory management, law, Embedded system, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Error detection and correction, business, Dram, Parity bit
Abstract: Data stored in an off-chip memory, such as DRAM or non-volatile main memory, can potentially be extracted or tampered by an attacker with physical access to a device. Protecting such attacks requires storing message authentication codes and counters - which incur a 22% storage overhead. In this work, we propose techniques for reducing these overheads. We first present a scheme that leverages ECC DRAMs to reduce MAC verification & storage overheads. We replace the parity bits in standard ECC by a combination of MAC and parity bits to provide both authentication and error correction. This eliminates the extra MAC storage and minimizes the verification overhead as MACs can be read in parallel with data through the ECC bus. Next, we use efficient integer encodings to reduce counter storage overhead by 6× while enhancing application performance.
Published: 2018
Full Text: View/download PDF

28. uSFI: Ultra-lightweight software fault isolation for IoT-class devices

Author: Zelalem Birhanu Aweke and Todd Austin
Subjects: 010302 applied physics, Source lines of code, business.industry, Computer science, Write protection, 02 engineering and technology, 01 natural sciences, 020202 computer hardware & architecture, Microcontroller, Software, Software bug, Sandbox (computer security), Embedded system, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Instrumentation (computer programming), Isolation (database systems), business, Implementation, Memory protection
Abstract: Embedded device security is a particularly difficult challenge, as the quantity of devices makes them attractive targets, while their cost-sensitive design leads to less-than-desirable security implementations. Most current low-end embedded devices do not include any form of security or only include simple memory protection support. One line of research in crafting low-cost security for low-end embedded devices has focused on sand-boxing trusted code from untrusted code using both hardware and software techniques. These previous attempts suffer from large trusted code bases (e.g., including the entire kernel), high runtime overheads (e.g., due to code instrumentation), partial protection (e.g., only provide write protection), or heavyweight hardware modifications. In this work, we leverage the rudimentary memory protection support found in modern IoT-class microcontrollers to build a low-profile, low-overhead, flexible sandboxing mechanism that can provide isolation between tightly-coupled software modules. With our approach, named uSFI, only the trust management code must be trusted. Through the use of a static verifier and monitored inter-module transitions, module code at all privilege levels (including the kernel) is able to run uninstrumented and untrusted code. We implemented uSFI on an ARMv7-M based processor, both bare metal and running the freeRTOS kernel, and analyzed the performance using the MiBench embedded benchmark suite and two additional highly detailed applications. We found that performance overheads were minimal, with at most 1.1% slowdown, and code size overheads were also low, at a maximum of 10%. In addition, our trusted code base is trivially small at only 150 lines of code.
Published: 2018
Full Text: View/download PDF

29. Energy efficient object detection on the mobile GP-GPU

Author: Valeria Bertacco, Todd Austin, Jonathan Rose, and Fitsum Assamnew Andargie
Subjects: Speedup, business.product_category, Computer science, business.industry, 0211 other engineering and technologies, 021107 urban & regional planning, 02 engineering and technology, computer.software_genre, Object detection, Instruction set, 03 medical and health sciences, 0302 clinical medicine, Embedded system, Laptop, Operating system, 030212 general & internal medicine, Mobile telephony, Central processing unit, Graphics, business, computer, Efficient energy use
Abstract: Smartphone« and tablets now include General Purpose Graphics Processing Units (GP-GPUs) that can be used for computation beyond driving the high-resolution screens. In this paper we present a mobile GP-GPU-based object detection algorithm and system, based on the work by Viola and Jones (which is also used in the OpenCV library). This implementation achieved twofold speed up compared to OpenCV running on the CPU of the same smartphone, and up to 84% energy saving. Interestingly, the new implementation saves energy vs. the CPU even when it executes slower than the OpenCV implementation, because the GPU consumes less power than the CPU, something that is not typical in desktop or laptop systems.
Published: 2017
Full Text: View/download PDF

30. SNIFFER: A high-accuracy malware detector for enterprise-based systems

Author: Valeria Bertacco, Harrison Davis, Evan Chavis, Salessawi Ferede Yitbarek, Matthew Hicks, Todd Austin, and Yijun Hou
Subjects: Engineering, business.industry, Feature extraction, computer.software_genre, Computer security, Cryptovirology, Software, Software deployment, Server, Code (cryptography), Malware, Software system, business, computer
Abstract: In the continual battle between malware attacks and antivirus technologies, both sides strive to deploy their techniques at always lower layers in the software system stack. The goal is to monitor and control the software executing in the levels above their own deployment, to detect attacks or to defeat defenses. Recent antivirus solutions have gone even below the software, by enlisting hardware support. However, so far, they have only mimicked classic software techniques by monitoring software clues of an attack. As a result, malware can easily defeat them by employing metamorphic manifestation patterns. With this work, we propose a hardware-monitoring solution, SNIFFER, which tracks malware manifestations in system-level behavior, rather than code patterns, and it thus cannot be circumvented unless malware renounces its very nature, that is, to attack. SNIFFER leverages in-hardware feature monitoring, and uses machine learning to assess whether a system shows signs of an attack. Experiments with a virtual SNIFFER implementation, which supports 13 features and tests against five common network-based malicious behaviors, show that SNIFFER detects malware nearly 100% of the time, unless the malware aggressively throttle its attack. Our experiments also highlight the need for machine-learning classifiers employing a range of diverse system features, as many of the tested malware require multiple, seemingly disconnected, features for accurate detection.
Published: 2017
Full Text: View/download PDF

31. Keynote: Peering into the post Moore's Law world

Author: Todd Austin
Subjects: Value (ethics), Moore's law, Computer science, business.industry, media_common.quotation_subject, Electrical engineering, Work (electrical), Scale (social sciences), Scalability, Peering, Telecommunications, business, Scaling, Scaling problem, media_common
Abstract: For decades, Moore's Law dimensional scaling has been the fuel that propelled the computing industry forward, by delivering performance, power and cost advantages with each new generation of silicon. Today, these scaling benefits are slowing to a crawl. If the computing industry wants to continue to make scalability the primary source of value in tomorrow's computing systems, we will have to quickly find new and productive ways to scale future systems. In this talk, I will highlight my work and the work of others that is rejuvenating scaling through the application of heterogeneous parallel designs. Leveraging these technologies to solve the scaling problem will be a significant challenge, as future scalability success will ultimately be less about “how” to do it and more about “how much” will it cost.
Published: 2017
Full Text: View/download PDF

32. Regaining Lost Cycles with HotCalls

Author: Valeria Bertacco, Todd Austin, and Ofir Weisse
Subjects: 010302 applied physics, Hardware security module, Speedup, business.industry, Computer science, 020206 networking & telecommunications, Cryptography, Cloud computing, 02 engineering and technology, computer.software_genre, Encryption, 01 natural sciences, System call, Server, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Leverage (statistics), business, computer
Abstract: Intel's SGX secure execution technology allows running computations on secret data using untrusted servers. While recent work showed how to port applications and large-scale computations to run under SGX, the performance implications of using the technology remains an open question. We present the first comprehensive quantitative study to evaluate the performance of SGX. We show that straightforward use of SGX library primitives for calling functions add between 8,200 - 17,000 cycles overhead, compared to 150 cycles of a typical system call. We quantify the performance impact of these library calls and show that in applications with high system calls frequency, such as memcached, openVPN, and lighttpd, which all have high bandwidth network requirements, the performance degradation may be as high as 79%. We investigate the sources of this performance degradation by leveraging a new set of microbenchmarks for SGX-specific operations such as enclave entry-calls and out-calls, and encrypted memory I/O accesses. We leverage the insights we gain from these analyses to design a new SGX interface framework HotCalls. HotCalls are based on a synchronization spin-lock mechanism and provide a 13-27x speedup over the default interface. It can easily be integrated into existing code, making it a practical solution. Compared to a baseline SGX implementation of memcached, openVPN, and lighttpd - we show that using the new interface boosts the throughput by 2.6-3.7x, and reduces application latency by 62-74%.
Published: 2017
Full Text: View/download PDF

33. When good protections go bad: Exploiting anti-DoS measures to accelerate rowhammer attacks

Author: Misiker Tadesse Aga, Todd Austin, and Zelalem Birhanu Aweke
Subjects: 010302 applied physics, Engineering, business.industry, Subtext, 02 engineering and technology, computer.software_genre, Computer security, 01 natural sciences, 020202 computer hardware & architecture, Refresh rate, Software, Virtual machine, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Data Protection Act 1998, Cache, business, computer, Dram, Vulnerability (computing), computer.programming_language
Abstract: The rowhammer vulnerability, where repeated accesses to a DRAM row can speed the discharge of neighboring bits, has emerged as a significant security concern in the computing industry. To address the problem, computer and software vendors have: i) doubled DRAM refresh rates, ii) restricted access to virtual-to-physical page mappings, and iii) disabled access to cache-flush operations in sandboxed environments. While recent efforts have shown how to overcome each of these protections individually, machines today are protected from rowhammer attacks if they employ all three of these protections simultaneously. In this paper, we demonstrate the first rowhammer attack that overcomes all three of these protections when used in tandem. Our attack is a virtual-memory based cache-flush free attack that is sufficiently fast to rowhammer with double rate refresh. The most astonishing aspect of our attack is that it is enabled by the recently introduced Cache Allocation Technology, a mechanism designed in part to protect virtual machines from inter-VM denial-of-service attacks. The subtext of this paper asks the question: “Is there any hope for system security, when the protections for one attack enable yet another?” We claim that the solution to this conundrum lies in the approach taken to protecting systems. Adopting a subtractive approach to secure systems, in contrast to additive measures, could go a long way toward building provably secure systems.
Published: 2017
Full Text: View/download PDF

34. Cold Boot Attacks are Still Hot: Security Analysis of Memory Scramblers in Modern Processors

Author: Misiker Tadesse Aga, Reetuparna Das, Salessawi Ferede Yitbarek, and Todd Austin
Subjects: Hardware_MEMORYSTRUCTURES, business.industry, Computer science, Cold boot attack, Cryptography, 02 engineering and technology, DIMM, Encryption, 020202 computer hardware & architecture, Disk encryption, 020204 information systems, Embedded system, 0202 electrical engineering, electronic engineering, information engineering, business, Replay attack, Dram, Computer network, Reboot
Abstract: Previous work has demonstrated that systems with unencrypted DRAM interfaces are susceptible to cold boot attacks – where the DRAM in a system is frozen to give it sufficient retention time and is then re-read after reboot, or is transferred to an attacker's machine for extracting sensitive data. This method has been shown to be an effective attack vector for extracting disk encryption keys out of locked devices. However, most modern systems incorporate some form of data scrambling into their DRAM interfaces making cold boot attacks challenging. While first added as a measure to improve signal integrity and reduce power supply noise, these scram-blers today serve the added purpose of obscuring the DRAM contents. It has previously been shown that scrambled DDR3 systems do not provide meaningful protection against cold boot attacks. In this paper, we investigate the enhancements that have been introduced in DDR4 memory scramblers in the 6th generation Intel Core (Skylake) processors. We then present an attack that demonstrates these enhanced DDR4 scramblers still do not provide sufficient protection against cold boot attacks. We detail a proof-of-concept attack that extracts memory resident AES keys, including disk encryption keys. The limitations of memory scramblers we point out in this paper motivate the need for strong yet low-overhead full-memory encryption schemes. Existing schemes such as Intel's SGX can effectively prevent such attacks, but have overheads that may not be acceptable for performance-sensitive applications. However, it is possible to deploy a memory encryption scheme that has zero performance overhead by forgoing integrity checking and replay attack protections afforded by Intel SGX. To that end, we present analyses that confirm modern stream ciphers such as ChaCha8 are sufficiently fast that it is now possible to completely overlap keystream generation with DRAM row buffer access latency, thereby enabling the creation of strongly encrypted DRAMs with zero exposed latency. Adopting such low-overhead measures in future generation of products can effectively shut down cold boot attacks in systems where the overhead of existing memory encryption schemes is unacceptable. Furthermore, the emergence of non-volatile DIMMs that fit into DDR4 buses is going to exacerbate the risk of cold boot attacks. Hence, strong full memory encryption is going to be even more crucial on such systems.
Published: 2017
Full Text: View/download PDF

35. Ozone: Efficient Execution with Zero Timing Leakage for Modern Microarchitectures

Author: Zelalem Birhanu Aweke and Todd Austin
Subjects: 010302 applied physics, FOS: Computer and information sciences, Computer Science - Cryptography and Security, Computer science, business.industry, 02 engineering and technology, Thread (computing), Encryption, 01 natural sciences, 020202 computer hardware & architecture, Microarchitecture, Instruction set, Cipher, Embedded system, 0103 physical sciences, Hardware Architecture (cs.AR), 0202 electrical engineering, electronic engineering, information engineering, Side channel attack, Cache, Computer Science - Hardware Architecture, business, Cryptography and Security (cs.CR), Scratchpad memory
Abstract: Time variation during program execution can leak sensitive information. Time variations due to program control flow and hardware resource contention have been used to steal encryption keys in cipher implementations such as AES and RSA. A number of approaches to mitigate timing-based side-channel attacks have been proposed including cache partitioning, control-flow obfuscation and injecting timing noise into the outputs of code. While these techniques make timing-based side-channel attacks more difficult, they do not eliminate the risks. Prior techniques are either too specific or too expensive, and all leave remnants of the original timing side channel for later attackers to attempt to exploit. In this work, we show that the state-of-the-art techniques in timing side-channel protection, which limit timing leakage but do not eliminate it, still have significant vulnerabilities to timing-based side-channel attacks. To provide a means for total protection from timing-based side-channel attacks, we develop Ozone, the first zero timing leakage execution resource for a modern microarchitecture. Code in Ozone executes under a special hardware thread that gains exclusive access to a single core's resources for a fixed (and limited) number of cycles during which it cannot be interrupted. Memory access under Ozone thread execution is limited to pre-allocated cache lines that can not be evicted, and all Ozone threads begin execution with a known fixed microarchitectural state. We evaluate Ozone using a number of security sensitive kernels that have previously been targets of timing side-channel attacks, and show that Ozone eliminates timing leakage with minimal performance overhead.
Published: 2017
Full Text: View/download PDF

36. A2: Analog Malicious Hardware

Author: Matthew Hicks, Todd Austin, Qing Dong, Dennis Sylvester, and Kaiyuan Yang
Subjects: Guard (information security), Analogue electronics, business.industry, Computer science, Transistor, 020206 networking & telecommunications, Hardware_PERFORMANCEANDRELIABILITY, 02 engineering and technology, Integrated circuit design, Computer security, computer.software_genre, 020202 computer hardware & architecture, law.invention, Capacitor, law, Trojan, Embedded system, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, business, computer, Computer hardware
Abstract: While the move to smaller transistors has been a boon for performance it has dramatically increased the cost to fabricate chips using those smaller transistors. This forces the vast majority of chip design companies to trust a third party -- often overseas -- to fabricate their design. To guard against shipping chips with errors (intentional or otherwise) chip design companies rely on post-fabrication testing. Unfortunately, this type of testing leaves the door open to malicious modifications since attackers can craft attack triggers requiring a sequence of unlikely events, which will never be encountered by even the most diligent tester. In this paper, we show how a fabrication-time attacker can leverage analog circuits to create a hardware attack that is small (i.e., requires as little as one gate) and stealthy (i.e., requires an unlikely trigger sequence before effecting a chip's functionality). In the open spaces of an already placed and routed design, we construct a circuit that uses capacitors to siphon charge from nearby wires as they transition between digital values. When the capacitors fully charge, they deploy an attack that forces a victim flip-flop to a desired value. We weaponize this attack into a remotely-controllable privilege escalation by attaching the capacitor to a wire controllable and by selecting a victim flip-flop that holds the privilege bit for our processor. We implement this attack in an OR1200 processor and fabricate a chip. Experimental results show that our attacks work, show that our attacks elude activation by a diverse set of benchmarks, and suggest that our attacks evade known defenses.
Published: 2016
Full Text: View/download PDF

37. A case for unlimited watchpoints

Author: Hongyi Xin, Todd Austin, Joseph L. Greathouse, and Yixin Luo
Subjects: Computer science, business.industry, General Medicine, computer.file_format, computer.software_genre, Computer Graphics and Computer-Aided Design, Set (abstract data type), Range (mathematics), Taint checking, Software, Operating system, Bitmap, Cache, Software analysis pattern, business, computer
Abstract: Numerous tools have been proposed to help developers fix software errors and inefficiencies. Widely-used techniques such as memory checking suffer from overheads that limit their use to pre-deployment testing, while more advanced systems have such severe performance impacts that they may require special-purpose hardware. Previous works have described hardware that can accelerate individual analyses, but such specialization stymies adoption; generalized mechanisms are more likely to be added to commercial processors. This paper demonstrates that the ability to set an unlimited number of fine-grain data watchpoints can reduce the runtime overheads of numerous dynamic software analysis techniques. We detail the watchpoint capabilities required to accelerate these analyses while remaining general enough to be useful in the future. We describe a hardware design that stores watchpoints in main memory and utilizes two different on-chip caches to accelerate performance. The first is a bitmap lookaside buffer that stores fine-grained watchpoints, while the second is a range cache that can efficiently hold large contiguous regions of watchpoints. As an example of the power of such a system, it is possible to use watchpoints to accelerate read/write set checks in a software data race detector by nearly 9x.
Published: 2012
Full Text: View/download PDF

38. Demand-driven software race detection using hardware performance counters

Author: Ramesh Peri, Zhiqiang Ma, Todd Austin, Matthew I. Frank, and Joseph L. Greathouse
Subjects: Computer science, business.industry, Concurrency, Thread (computing), General Medicine, computer.software_genre, Instruction set, Software, Embedded system, Operating system, Benchmark (computing), Cache, business, computer, Computer hardware, Cache coherence
Abstract: Dynamic data race detectors are an important mechanism for creating robust parallel programs. Software race detectors instrument the program under test, observe each memory access, and watch for inter-thread data sharing that could lead to concurrency errors. While this method of bug hunting can find races that are normally difficult to observe, it also suffers from high runtime overheads. It is not uncommon for commercial race detectors to experience 300x slowdowns, limiting their usage. This paper presents a hardware-assisted demand-driven race detector. We are able to observe cache events that are indicative of data sharing between threads by taking advantage of hardware available on modern commercial microprocessors. We use these to build a race detector that is only enabled when it is likely that inter-thread data sharing is occurring. When little sharing takes place, this demand-driven analysis is much faster than contemporary continuous-analysis tools without a large loss of detection accuracy. We modified the race detector in Intel(R) Inspector XE to utilize our hardware-based sharing indicator and were able to achieve performance increases of 3x and 10x in two parallel benchmark suites and 51x for one particular program.
Published: 2011
Full Text: View/download PDF

39. Energy-Efficient Subthreshold Processor Design

Author: Sanjay Pant, M. Minuth, Leyla Nazhandali, Scott Hanson, A. Reeves, Dennis Sylvester, R. Helfand, Bo Zhai, David Blaauw, Todd Austin, and J. Olson
Subjects: Engineering, business.industry, Subthreshold conduction, Processor design, Clock rate, Electrical engineering, Propagation delay, Microarchitecture, Process variation, Hardware and Architecture, Low-power electronics, Electronic engineering, Electrical and Electronic Engineering, business, Software, Efficient energy use
Abstract: Subthreshold circuits have drawn a strong interest in recent ultralow power research. In this paper, we present a highly efficient subthreshold microprocessor targeting sensor application. It is optimized across different design stages including ISA definition, microarchitecture evaluation and circuit and implementation optimization. Our investigation concludes that microarchitectural decisions in the subthreshold regime differ significantly from that in conventional superthreshold mode. We propose a new general-purpose sensor processor architecture, which we call the Subliminal Processor. On the circuit side, subthreshold operation is known to exhibit an optimal energy point (Knin)- However, propagation delay also becomes more sensitive to process variation and can reduce the energy scaling gain. We conduct thorough analysis on how supply voltage and operating frequency impact energy efficiency in a statistical context. With careful library cell selection and robust static RAM design, the Subliminal Processor operates correctly down to 200 mV in a 0.13-mum technology, which is sufficiently low to operate at Vmin . Silicon measurements of the Subliminal Processor show a maximum energy efficiency of 2.6 pJ/instruction at 360 mV supply voltage and 833 kHz operating frequency. Finally, we examine the variation in frequency and Vmin across die to verify our analysis of adaptive tuning of the clock frequency and Vmin for optimal energy efficiency.
Published: 2009
Full Text: View/download PDF

40. A Flexible Software-Based Framework for Online Detection of Hardware Defects

Author: Valeria Bertacco, Onur Mutlu, Todd Austin, and Kypros Constantinides
Subjects: business.industry, Computer science, Firmware, media_common.quotation_subject, Reliability (computer networking), Control reconfiguration, Hardware_PERFORMANCEANDRELIABILITY, computer.software_genre, Theoretical Computer Science, law.invention, Microprocessor, Software, Computational Theory and Mathematics, Debugging, Hardware and Architecture, law, Embedded system, business, computer, Computer hardware, media_common, Register-transfer level
Abstract: This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called access-control extensions (ACE), that can access and control the microprocessor's internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chip-multiprocessor based on Sun's Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip's overall power consumption.
Published: 2009
Full Text: View/download PDF

41. Reliable Systems on Unreliable Fabrics

Author: Yu Cao, Scott Mahlke, Valeria Bertacco, and Todd Austin
Subjects: Engineering, Dominant design, Correctness, business.industry, media_common.quotation_subject, Design team, Reliability engineering, Hardware and Architecture, Systems research, Key (cryptography), Quality (business), Electrical and Electronic Engineering, Stress reduction techniques, business, Software, Reliability (statistics), media_common
Abstract: The continued scaling of silicon fabrication technology has led to significant reliability concerns, which are quickly becoming a dominant design challenge. Design integrity is threatened by complexity challenges in the form of immense designs defying complete verification, and physical challenges such as silicon aging and soft errors, which impair correct system operation. The Gigascale Systems Research Center Resilient-System Design Team is addressing these key challenges through synergistic research thrusts, ranging from near-term reliability stress reduction techniques to methods for improving the quality of today's silicon, to longer-term technologies that can detect, recover, and repair faulty systems. These efforts are supported and complemented by an active fault-modeling research effort and a strong focus on functional-verification methodologies. The team's goal is to provide highly effective, low-cost solutions to ensure both correctness and reliability in future designs and technology nodes, thereby extending the lifetime of silicon fabrication technologies beyond what can be currently foreseen as profitable.
Published: 2008
Full Text: View/download PDF

42. Exploring Variability and Performance in a Sub-200-mV Processor

Author: M. Minuth, Leyla Nazhandali, M. Singhal, J. Olson, David Blaauw, Bo Zhai, Todd Austin, Kevin Zhou, Dennis Sylvester, Scott Hanson, Mingoo Seok, and Brian Cline
Subjects: Engineering, business.industry, Subthreshold conduction, Electrical engineering, Biasing, Hardware_PERFORMANCEANDRELIABILITY, Energy consumption, Process variation, Logic gate, Low-power electronics, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, Electrical and Electronic Engineering, business, Low voltage, Hardware_LOGICDESIGN, Efficient energy use
Abstract: In this study, we explore the design of a subthreshold processor for use in ultra-low-energy sensor systems. We describe an 8-bit subthreshold processor that has been designed with energy efficiency as the primary constraint. The processor, which is functional below Vdd=200 mV, consumes only 3.5 pJ/inst at Vdd=350 mV and, under a reverse body bias, draws only 11 nW at Vdd=160 mV. Process and temperature variations in subthreshold circuits can cause dramatic fluctuations in performance and energy consumption and can lead to robustness problems. We investigate the use of body biasing to adapt to process and temperature variations. Test-chip measurements show that body biasing is particularly effective in subthreshold circuits and can eliminate performance variations with minimal energy penalties. Reduced performance is also problematic at low voltages, so we investigate global and local techniques for improving performance while maintaining energy efficiency.
Published: 2008
Full Text: View/download PDF

43. Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors

Author: Valeria Bertacco, Ilya Wagner, and Todd Austin
Subjects: Correctness, business.industry, Computer science, Integrated circuit design, Computer Graphics and Computer-Aided Design, law.invention, Microprocessor, Logic synthesis, Computer engineering, Software bug, law, Embedded system, Process control, State (computer science), Electrical and Electronic Engineering, Control logic, business, Software
Abstract: Functional correctness is a vital attribute of any hardware design. Unfortunately, due to extremely complex architectures, widespread components, such as microprocessors, are often released with latent bugs. The inability of modern verification tools to handle the fast growth of design complexity exacerbates the problem even further. In this paper, we propose a novel hardware-patching mechanism, called the field-repairable control logic (FRCL), that is designed for in-the-field correction of errors in the design's control logic-the most common type of defects, as our analysis demonstrates. Our solution introduces an additional component in the processor's hardware, a state matcher, that can be programmed to identify erroneous configurations using signals in the critical control state of the processor. Once a flawed configuration is ldquomatched,rdquo the processor switches into a degraded mode, a mode of operation which excludes most features of the system and is simple enough to be formally verified, yet still capable to execute the full instruction-set architecture at one instruction at a time. Once the program segment exposing the design flaw has been executed in a degraded mode, we can switch the processor back to its full-performance mode. In this paper, we analyze a range of approaches to selecting signals comprising the processor's critical control state and evaluate their effectiveness in representing a variety of design errors. We also introduce a new metric (average specificity per signal) that encodes the bug-detection capability and amount of control state of a particular critical signal set. We demonstrate that the FRCL can support the detection and correction of multiple design errors with a performance impact of less than 5% as long as the incidence of the flawed configurations is below 1% of dynamic instructions. In addition, the area impact of our solution is less than 2% for the two microprocessor designs that we investigated in our experiments.
Published: 2008
Full Text: View/download PDF

44. Exploring Specialized Near-Memory Processing for Data Intensive Operations

Author: Todd Austin, Tao Yang, Salessawi Ferede Yitbarek, and Reetuparna Das
Subjects: Stack-based memory allocation, Flat memory model, Speedup, Computer science, Registered memory, 02 engineering and technology, Overlay, 01 natural sciences, Non-uniform memory access, Memory address, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Interleaved memory, Computing with Memory, Memory refresh, Computer memory, Conventional memory, 010302 applied physics, Random access memory, Xeon, business.industry, Uniform memory access, Semiconductor memory, Memory bandwidth, Memory map, 020202 computer hardware & architecture, Extended memory, Memory management, Shared memory, Embedded system, Computer data storage, Distributed memory, business, Computer hardware
Abstract: Emerging 3D stacked memory systems provide significantly more bandwidth than current DDR modules. However, general purpose processors do not take full advantage of these resources offered by the memory modules. Taking advantage of the increased bandwidth requires the use of specialized processing units. In this paper, we evaluate the benefits of placing hardware accelerators at the bottom layer of a 3D stacked memory system compared to accelerators that are placed external to the memory stack. Our evaluation of the design using cycle-accurate simulation and RTL synthesis shows that, for important data intensive kernels, near-memory accelerators inside a single 3D memory package provide 3x-13x speedup over a Quad-core Xeon processor. Most of the benefits are from the application of accelerators, as the near-memory configurations provide marginal benefits compared to the same number of accelerators placed on a die external to the memory package. This comparable performance for external accelerators is due to the high bandwidth afforded by the high-speed off-chip links. On the other hand, near-memory accelerators consume 7%–39% less energy than the external accelerators.
Published: 2016
Full Text: View/download PDF

45. Architectural implications of brick and mortar silicon manufacturing

Author: Mojtaba Mehrara, Martha Mercaldi Kim, Todd Austin, and Mark Oskin
Subjects: Brick, Application-specific integrated circuit, business.industry, Computer science, Embedded system, Hardware_INTEGRATEDCIRCUITS, Integrated circuit design, General Medicine, Brick and mortar, business
Abstract: We introduce a novel chip fabrication technique called "brick and mortar", in which chips are made from small, pre-fabricated ASIC bricks and bonded in a designer-specified arrangement to an inter-brick communication backbone chip. The goal of brick and mortar assembly is to provide a low-overhead method to produce custom chips, yet with performance that tracks an ASIC more closely than an FPGA. This paper examines the architectural design choices in this chip-design system. These choices include the definition of reasonable bricks, both in functionality and size, as well as the communication interconnect that the I/O cap provides. To do this we synthesize candidate bricks, analyze their area and bandwidth demands, and present an architectural design for the inter-brick communication network. We discuss a sample chip design, a 16-way CMP, and analyze the costs and benefits of designing chips with brick and mortar. We find that this method of producing chips incurs only a small performance loss (8%) compared to a fully custom ASIC, which is significantly less than the degradation seen from other low-overhead chip options, such as FPGAs. Finally, we measure the effect that architectural design decisions have on the behavior of the proposed physical brick assembly technique, fluidic self-assembly.
Published: 2007
Full Text: View/download PDF

46. Microprocessor Verification via Feedback-Adjusted Markov Models

Author: Ilya Wagner, Valeria Bertacco, and Todd Austin
Subjects: Correctness, Computer science, business.industry, Random number generation, Markov model, Computer Graphics and Computer-Aided Design, law.invention, Microprocessor, law, Embedded system, Electrical and Electronic Engineering, Hidden Markov model, business, Software
Abstract: The challenge of verifying a modern microprocessor design is an overwhelming one: Increasingly complex microarchitectures combined with heavy time-to-market pressure have forced microprocessor vendors to employ immense verification teams in the hope of finding the most critical bugs in a timely manner. Unfortunately, too often, size does not seem to matter in verification, as design schedules continue to slip and microprocessors find their way to the marketplace with design errors. In this paper, we describe a novel closed-loop simulation-based approach to hardware verification and present a tool called StressTest that uses our methods to locate hard-to-find corner-case design bugs and performance problems. StressTest is based on a Markov-model-driven random instruction generator with activity monitors. The model is generated from the user-specified template files and is used to generate the instructions sent to the design under test (DUT). In addition, the user specifies key activity nodes within the design that should be stressed and monitored throughout the simulation. The StressTest engine then uses closed-loop feedback techniques to transform the Markov model into one that effectively stresses the user-selected points of interest. In parallel, StressTest monitors the correctness of the DUT response and, if the design behaves against expectation, it reports a bug and a trace leading to it. Using two microarchitectures as example testbeds, we demonstrate that StressTest finds more bugs with less effort than open-loop random instruction test generation techniques
Published: 2007
Full Text: View/download PDF

47. Locking down insecure indirection with hardware-based control-data isolation

Author: Reetuparna Das, William Arthur, Todd Austin, and Sahil Madeka
Subjects: Indirection, Exploit, business.industry, Memoization, Computer science, Program transformation, Parallel computing, computer.software_genre, Control flow, Embedded system, Code injection, Compiler, Cache, Isolation (database systems), Programmer, business, computer
Abstract: Arbitrary code injection pervades as a central issue in computer security where attackers seek to exploit the software attack surface. A key component in many exploits today is the successful execution of a control-flow attack. Control-Data Isolation (CDI) has emerged as a work which eliminates the root cause of contemporary control-flow attacks: indirect control flow instructions. These instructions are replaced by direct control flow edges dictated by the programmer and encoded into the application by the compiler. By subtracting the root cause of control-flow attack, Control-Data Isolation sidesteps the vulnerabilities and restrictive threat models adopted by other solutions in this space (e.g., Control-Flow Integrity). The CDI approach, while eliminating contemporary control-flow attacks, introduces non-trivial overheads to validate indirect targets at runtime. In this work we introduce novel architectural support to accelerate the execution of CDI-compliant code. Through the addition of an edge cache, we are able to cache legal indirect target edges and eliminate nearly all execution overhead for indirection-free applications. We demonstrate that through memoization of compiler-confirmed control flow transitions, overheads are reduced from 19% to 0.5% on average for Control-Data Isolated applications. Additionally, we show that the edge cache can efficiently provide the double-duty of predicting multi-way branch targets, thus providing even speedups for some CDI-compliant executions, compared to an architecture with unsophisticated indirect control prediction (e.g., BTB).
Published: 2015
Full Text: View/download PDF

48. Keynote talk I: Ending the Tyranny of Amdahl's Law

Author: Todd Austin
Subjects: Value (ethics), symbols.namesake, Amdahl's law, Work (electrical), Computer science, Scale (chemistry), Scalability, symbols, Operating system, Computer security, computer.software_genre, computer
Abstract: If the computing industry wants to continue to make scalability the primary source of value in tomorrow's computing systems, we will have to quickly find new and productive ways to scale the serial portions of important applications. In this talk, I will highlight my work and the work of others to do just this through the application of heterogeneous parallel designs. Of course, we will want to address the scalability of sequential codes, but future scalability success will ultimately hinge on addressing how we address the scalability of future applications' through more affordable design and manufacturing techniques.
Published: 2015
Full Text: View/download PDF

49. Ultra low-cost defect protection for microprocessor pipelines

Author: Sujay Phadke, Valeria Bertacco, Kypros Constantinides, Smitha Shyam, and Todd Austin
Subjects: Silicon, business.industry, Computer science, Transistor, chemistry.chemical_element, law.invention, Pipeline transport, Microprocessor, chemistry, law, Embedded system, Redundancy (engineering), General Earth and Planetary Sciences, System on a chip, business, TO-18, General Environmental Science
Abstract: The sustained push toward smaller and smaller technology sizes has reached a point where device reliability has moved to the forefront of concerns for next-generation designs. Silicon failure mechanisms, such as transistor wearout and manufacturing defects, are a growing challenge that threatens the yield and product lifetime of future systems. In this paper we introduce the BulletProof pipeline, the first ultra low-cost mechanism to protect a microprocessor pipeline and on-chip memory system from silicon defects. To achieve this goal we combine area-frugal on-line testing techniques and system-level checkpointing to provide the same guarantees of reliability found in traditional solutions, but at much lower cost. Our approach utilizes a microarchitectural checkpointing mechanism which creates coarse-grained epochs of execution, during which distributed on-line built in self-test (BIST) mechanisms validate the integrity of the underlying hardware. In case a failure is detected, we rely on the natural redundancy of instructionlevel parallel processors to repair the system so that it can still operate in a degraded performance mode. Using detailed circuit-level and architectural simulation, we find that our approach provides very high coverage of silicon defects (89%) with little area cost (5.8%). In addition, when a defect occurs, the subsequent degraded mode of operation was found to have only moderate performance impacts, (from 4% to 18% slowdown).
Published: 2006
Full Text: View/download PDF

50. Razor: circuit-level correction of timing errors for low-power operation

Author: David Blaauw, Shidhartha Das, Seokwoo Lee, Trevor Mudge, Todd Austin, Nam Sung Kim, Daniel J. Ernst, and Krisztian Flautner
Subjects: business.industry, Computer science, Electrical engineering, Hardware_PERFORMANCEANDRELIABILITY, Condensed Matter::Mesoscopic Systems and Quantum Hall Effect, Power (physics), Dynamic voltage scaling, Computer Science::Hardware Architecture, Hardware and Architecture, Embedded system, Low-power electronics, Hardware_INTEGRATEDCIRCUITS, Electrical and Electronic Engineering, business, Computer Science::Operating Systems, Software, Voltage
Abstract: Dynamic voltage scaling is one of the more effective and widely used methods for power-aware computing. We present a DVS approach that uses dynamic detection and correction of circuit timing errors to tune processor supply voltage and eliminate the need for voltage margins
Published: 2004
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

139 results on '"Todd Austin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources