38 results on '"Todd Austin"'
Search Results
2. Thwarting Control Plane Attacks with Displaced and Dilated Address Spaces
- Author
-
Valeria Bertacco, Mark Gallagher, Lauren Biernacki, and Todd Austin
- Subjects
010302 applied physics ,Exploit ,Computer science ,Address space ,business.industry ,02 engineering and technology ,Computer security ,computer.software_genre ,01 natural sciences ,020202 computer hardware & architecture ,Software ,Pointer (computer programming) ,Gadget ,0103 physical sciences ,Obfuscation ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Entropy (information theory) ,business ,computer - Abstract
To maintain the control-flow integrity of today’s machines, code pointers must be protected. Exploits forge and manipulate code pointers to execute arbitrary, malicious code on a host machine. A corrupted code pointer can effectively redirect program execution to attacker-injected code or existing code gadgets, giving attackers the necessary foothold to circumvent system protections. To combat this class of exploits, we employ a Displaced and Dilated Address Space (DDAS), which uses a novel address space inflation mechanism to obfuscate code pointers, code locations, and the relative distance between code objects. By leveraging runtime re-randomization and custom hardware, we are able to achieve a high-entropy control-flow defense with performance overheads well below 5% and similarly low power and silicon area overheads. With DDAS in force, attackers come up against 63 bits of entropy when forging absolute addresses and 18 to 55 bits of entropy for relative addresses, depending on the distance to the desired code gadget. Moreover, an incorrectly forged code address will result in a security exception with a probability greater than 99.996%. Using hardware-based address obfuscation, we provide significantly higher entropy at lower performance overheads than previous software techniques, and our re-randomization mechanism offers additional protections against possible pointer disclosures.
- Published
- 2020
- Full Text
- View/download PDF
3. Morpheus
- Author
-
Salessawi Ferede Yitbarek, Baris Kasikci, Mohit Tiwari, Misiker Tadesse Aga, Austin Harris, Valeria Bertacco, Zhixing Xu, Todd Austin, Sharad Malik, Zelalem Birhanu Aweke, Mark Gallagher, Shibo Chen, and Lauren Biernacki
- Subjects
Hardware architecture ,business.industry ,Computer science ,Offensive ,02 engineering and technology ,Encryption ,Computer security ,computer.software_genre ,Security testing ,020202 computer hardware & architecture ,020204 information systems ,Pointer (computer programming) ,0202 electrical engineering, electronic engineering, information engineering ,Systems design ,Moving target defense ,Architecture ,business ,computer - Abstract
Attacks often succeed by abusing the gap between program and machine-level semantics-- for example, by locating a sensitive pointer, exploiting a bug to overwrite this sensitive data, and hijacking the victim program's execution. In this work, we take secure system design on the offensive by continuously obfuscating information that attackers need but normal programs do not use, such as representation of code and pointers or the exact location of code and data. Our secure hardware architecture, Morpheus, combines two powerful protections: ensembles of moving target defenses and churn. Ensembles of moving target defenses randomize key program values (e.g., relocating pointers and encrypting code and pointers) which forces attackers to extensively probe the system prior to an attack. To ensure attack probes fail, the architecture incorporates churn to transparently re-randomize program values underneath the running system. With frequent churn, systems quickly become impractically difficult to penetrate. We demonstrate Morpheus through a RISC-V-based prototype designed to stop control-flow attacks. Each moving target defense in Morpheus uses hardware support to individually offer more randomness at a lower cost than previous techniques. When ensembled with churn, Morpheus defenses offer strong protection against control-flow attacks, with our security testing and performance studies revealing: i) high-coverage protection for a broad array of control-flow attacks, including protections for advanced attacks and an attack disclosed after the design of Morpheus, and ii) negligible performance impacts (1%) with churn periods up to 50 ms, which our study estimates to be at least 5000x faster than the time necessary to possibly penetrate Morpheus.
- Published
- 2019
- Full Text
- View/download PDF
4. ANVIL
- Author
-
Reetuparna Das, Zelalem Birhanu Aweke, Yossi Oren, Matthew Hicks, Todd Austin, Salessawi Ferede Yitbarek, and Rui Qiao
- Subjects
010302 applied physics ,Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,Locality ,General Medicine ,02 engineering and technology ,Computer security ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,01 natural sciences ,Refresh rate ,020202 computer hardware & architecture ,Software ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,Code injection ,Cache ,business ,computer ,Row ,Dram ,General Environmental Science - Abstract
Ensuring the integrity and security of the memory system is critical. Recent studies have shown serious security concerns due to "rowhammer" attacks, where repeated accesses to a row of memory cause bit flips in adjacent rows. Recent work by Google's Project Zero has shown how to leverage rowhammer-induced bit-flips as the basis for security exploits that include malicious code injection and memory privilege escalation. Being an important security concern, industry has attempted to defend against rowhammer attacks. Deployed defenses employ two strategies: (1) doubling the system DRAM refresh rate and (2) restricting access to the CLFLUSH instruction that attackers use to bypass the cache to increase memory access frequency (i.e., the rate of rowhammering). We demonstrate that such defenses are inadequte: we implement rowhammer attacks that both avoid using the CLFLUSH instruction and cause bit flips with a doubled refresh rate. Our next-generation CLFLUSH-free rowhammer attack bypasses the cache by manipulating cache replacement state to allow frequent misses out of the last-level cache to DRAM rows of our choosing. To protect existing systems from more advanced rowhammer attacks, we develop a software-based defense, ANVIL, which thwarts all known rowhammer attacks on existing systems. ANVIL detects rowhammer attacks by tracking the locality of DRAM accesses using existing hardware performance counters. Our detector identifies the rows being frequently accessed (i.e., the aggressors), then selectively refreshes the nearby victim rows to prevent hammering. Experiments running on real hardware with the SPEC2006 benchmarks show that ANVIL has less than a 1% false positive rate and an average slowdown of 1%. ANVIL is low-cost and robust, and our experiments indicate that it is an effective approach for protecting existing and future systems from even advanced rowhammer attacks.
- Published
- 2016
- Full Text
- View/download PDF
5. Vulnerability-tolerant secure architectures
- Author
-
Todd Austin, Sharad Malik, Mohit Tiwari, Valeria Bertacco, and Baris Kasikci
- Subjects
Computer science ,Offensive ,Vulnerability ,020207 software engineering ,02 engineering and technology ,Computer security ,computer.software_genre ,020202 computer hardware & architecture ,Immune system ,Software bug ,0202 electrical engineering, electronic engineering, information engineering ,Systems design ,State (computer science) ,Speculation ,computer - Abstract
Today, secure systems are built by identifying potential vulnerabilities and then adding protections to thwart the associated attacks. Unfortunately, the complexity of today's systems makes it impossible to prove that all attacks are stopped, so clever attackers find a way around even the most carefully designed protections. In this article, we take a sobering look at the state of secure system design, and ask ourselves why the "security arms race" never ends? The answer lies in our inability to develop adequate security verification technologies. We then examine an advanced defensive system in nature - the human immune system - and we discover that it does not remove vulnerabilities, rather it adds offensive measures to protect the body when its vulnerabilities are penetrated We close the article with brief speculation on how the human immune system could inspire more capable secure system designs.
- Published
- 2018
- Full Text
- View/download PDF
6. Energy efficient object detection on the mobile GP-GPU
- Author
-
Valeria Bertacco, Todd Austin, Jonathan Rose, and Fitsum Assamnew Andargie
- Subjects
Speedup ,business.product_category ,Computer science ,business.industry ,0211 other engineering and technologies ,021107 urban & regional planning ,02 engineering and technology ,computer.software_genre ,Object detection ,Instruction set ,03 medical and health sciences ,0302 clinical medicine ,Embedded system ,Laptop ,Operating system ,030212 general & internal medicine ,Mobile telephony ,Central processing unit ,Graphics ,business ,computer ,Efficient energy use - Abstract
Smartphone« and tablets now include General Purpose Graphics Processing Units (GP-GPUs) that can be used for computation beyond driving the high-resolution screens. In this paper we present a mobile GP-GPU-based object detection algorithm and system, based on the work by Viola and Jones (which is also used in the OpenCV library). This implementation achieved twofold speed up compared to OpenCV running on the CPU of the same smartphone, and up to 84% energy saving. Interestingly, the new implementation saves energy vs. the CPU even when it executes slower than the OpenCV implementation, because the GPU consumes less power than the CPU, something that is not typical in desktop or laptop systems.
- Published
- 2017
- Full Text
- View/download PDF
7. SNIFFER: A high-accuracy malware detector for enterprise-based systems
- Author
-
Valeria Bertacco, Harrison Davis, Evan Chavis, Salessawi Ferede Yitbarek, Matthew Hicks, Todd Austin, and Yijun Hou
- Subjects
Engineering ,business.industry ,Feature extraction ,computer.software_genre ,Computer security ,Cryptovirology ,Software ,Software deployment ,Server ,Code (cryptography) ,Malware ,Software system ,business ,computer - Abstract
In the continual battle between malware attacks and antivirus technologies, both sides strive to deploy their techniques at always lower layers in the software system stack. The goal is to monitor and control the software executing in the levels above their own deployment, to detect attacks or to defeat defenses. Recent antivirus solutions have gone even below the software, by enlisting hardware support. However, so far, they have only mimicked classic software techniques by monitoring software clues of an attack. As a result, malware can easily defeat them by employing metamorphic manifestation patterns. With this work, we propose a hardware-monitoring solution, SNIFFER, which tracks malware manifestations in system-level behavior, rather than code patterns, and it thus cannot be circumvented unless malware renounces its very nature, that is, to attack. SNIFFER leverages in-hardware feature monitoring, and uses machine learning to assess whether a system shows signs of an attack. Experiments with a virtual SNIFFER implementation, which supports 13 features and tests against five common network-based malicious behaviors, show that SNIFFER detects malware nearly 100% of the time, unless the malware aggressively throttle its attack. Our experiments also highlight the need for machine-learning classifiers employing a range of diverse system features, as many of the tested malware require multiple, seemingly disconnected, features for accurate detection.
- Published
- 2017
- Full Text
- View/download PDF
8. Regaining Lost Cycles with HotCalls
- Author
-
Valeria Bertacco, Todd Austin, and Ofir Weisse
- Subjects
010302 applied physics ,Hardware security module ,Speedup ,business.industry ,Computer science ,020206 networking & telecommunications ,Cryptography ,Cloud computing ,02 engineering and technology ,computer.software_genre ,Encryption ,01 natural sciences ,System call ,Server ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,Leverage (statistics) ,business ,computer - Abstract
Intel's SGX secure execution technology allows running computations on secret data using untrusted servers. While recent work showed how to port applications and large-scale computations to run under SGX, the performance implications of using the technology remains an open question. We present the first comprehensive quantitative study to evaluate the performance of SGX. We show that straightforward use of SGX library primitives for calling functions add between 8,200 - 17,000 cycles overhead, compared to 150 cycles of a typical system call. We quantify the performance impact of these library calls and show that in applications with high system calls frequency, such as memcached, openVPN, and lighttpd, which all have high bandwidth network requirements, the performance degradation may be as high as 79%. We investigate the sources of this performance degradation by leveraging a new set of microbenchmarks for SGX-specific operations such as enclave entry-calls and out-calls, and encrypted memory I/O accesses. We leverage the insights we gain from these analyses to design a new SGX interface framework HotCalls. HotCalls are based on a synchronization spin-lock mechanism and provide a 13-27x speedup over the default interface. It can easily be integrated into existing code, making it a practical solution. Compared to a baseline SGX implementation of memcached, openVPN, and lighttpd - we show that using the new interface boosts the throughput by 2.6-3.7x, and reduces application latency by 62-74%.
- Published
- 2017
- Full Text
- View/download PDF
9. When good protections go bad: Exploiting anti-DoS measures to accelerate rowhammer attacks
- Author
-
Misiker Tadesse Aga, Todd Austin, and Zelalem Birhanu Aweke
- Subjects
010302 applied physics ,Engineering ,business.industry ,Subtext ,02 engineering and technology ,computer.software_genre ,Computer security ,01 natural sciences ,020202 computer hardware & architecture ,Refresh rate ,Software ,Virtual machine ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Data Protection Act 1998 ,Cache ,business ,computer ,Dram ,Vulnerability (computing) ,computer.programming_language - Abstract
The rowhammer vulnerability, where repeated accesses to a DRAM row can speed the discharge of neighboring bits, has emerged as a significant security concern in the computing industry. To address the problem, computer and software vendors have: i) doubled DRAM refresh rates, ii) restricted access to virtual-to-physical page mappings, and iii) disabled access to cache-flush operations in sandboxed environments. While recent efforts have shown how to overcome each of these protections individually, machines today are protected from rowhammer attacks if they employ all three of these protections simultaneously. In this paper, we demonstrate the first rowhammer attack that overcomes all three of these protections when used in tandem. Our attack is a virtual-memory based cache-flush free attack that is sufficiently fast to rowhammer with double rate refresh. The most astonishing aspect of our attack is that it is enabled by the recently introduced Cache Allocation Technology, a mechanism designed in part to protect virtual machines from inter-VM denial-of-service attacks. The subtext of this paper asks the question: “Is there any hope for system security, when the protections for one attack enable yet another?” We claim that the solution to this conundrum lies in the approach taken to protecting systems. Adopting a subtractive approach to secure systems, in contrast to additive measures, could go a long way toward building provably secure systems.
- Published
- 2017
- Full Text
- View/download PDF
10. A2: Analog Malicious Hardware
- Author
-
Matthew Hicks, Todd Austin, Qing Dong, Dennis Sylvester, and Kaiyuan Yang
- Subjects
Guard (information security) ,Analogue electronics ,business.industry ,Computer science ,Transistor ,020206 networking & telecommunications ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Integrated circuit design ,Computer security ,computer.software_genre ,020202 computer hardware & architecture ,law.invention ,Capacitor ,law ,Trojan ,Embedded system ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer ,Computer hardware - Abstract
While the move to smaller transistors has been a boon for performance it has dramatically increased the cost to fabricate chips using those smaller transistors. This forces the vast majority of chip design companies to trust a third party -- often overseas -- to fabricate their design. To guard against shipping chips with errors (intentional or otherwise) chip design companies rely on post-fabrication testing. Unfortunately, this type of testing leaves the door open to malicious modifications since attackers can craft attack triggers requiring a sequence of unlikely events, which will never be encountered by even the most diligent tester. In this paper, we show how a fabrication-time attacker can leverage analog circuits to create a hardware attack that is small (i.e., requires as little as one gate) and stealthy (i.e., requires an unlikely trigger sequence before effecting a chip's functionality). In the open spaces of an already placed and routed design, we construct a circuit that uses capacitors to siphon charge from nearby wires as they transition between digital values. When the capacitors fully charge, they deploy an attack that forces a victim flip-flop to a desired value. We weaponize this attack into a remotely-controllable privilege escalation by attaching the capacitor to a wire controllable and by selecting a victim flip-flop that holds the privilege bit for our processor. We implement this attack in an OR1200 processor and fabricate a chip. Experimental results show that our attacks work, show that our attacks elude activation by a diverse set of benchmarks, and suggest that our attacks evade known defenses.
- Published
- 2016
- Full Text
- View/download PDF
11. A case for unlimited watchpoints
- Author
-
Hongyi Xin, Todd Austin, Joseph L. Greathouse, and Yixin Luo
- Subjects
Computer science ,business.industry ,General Medicine ,computer.file_format ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Set (abstract data type) ,Range (mathematics) ,Taint checking ,Software ,Operating system ,Bitmap ,Cache ,Software analysis pattern ,business ,computer - Abstract
Numerous tools have been proposed to help developers fix software errors and inefficiencies. Widely-used techniques such as memory checking suffer from overheads that limit their use to pre-deployment testing, while more advanced systems have such severe performance impacts that they may require special-purpose hardware. Previous works have described hardware that can accelerate individual analyses, but such specialization stymies adoption; generalized mechanisms are more likely to be added to commercial processors. This paper demonstrates that the ability to set an unlimited number of fine-grain data watchpoints can reduce the runtime overheads of numerous dynamic software analysis techniques. We detail the watchpoint capabilities required to accelerate these analyses while remaining general enough to be useful in the future. We describe a hardware design that stores watchpoints in main memory and utilizes two different on-chip caches to accelerate performance. The first is a bitmap lookaside buffer that stores fine-grained watchpoints, while the second is a range cache that can efficiently hold large contiguous regions of watchpoints. As an example of the power of such a system, it is possible to use watchpoints to accelerate read/write set checks in a software data race detector by nearly 9x.
- Published
- 2012
- Full Text
- View/download PDF
12. Demand-driven software race detection using hardware performance counters
- Author
-
Ramesh Peri, Zhiqiang Ma, Todd Austin, Matthew I. Frank, and Joseph L. Greathouse
- Subjects
Computer science ,business.industry ,Concurrency ,Thread (computing) ,General Medicine ,computer.software_genre ,Instruction set ,Software ,Embedded system ,Operating system ,Benchmark (computing) ,Cache ,business ,computer ,Computer hardware ,Cache coherence - Abstract
Dynamic data race detectors are an important mechanism for creating robust parallel programs. Software race detectors instrument the program under test, observe each memory access, and watch for inter-thread data sharing that could lead to concurrency errors. While this method of bug hunting can find races that are normally difficult to observe, it also suffers from high runtime overheads. It is not uncommon for commercial race detectors to experience 300x slowdowns, limiting their usage. This paper presents a hardware-assisted demand-driven race detector. We are able to observe cache events that are indicative of data sharing between threads by taking advantage of hardware available on modern commercial microprocessors. We use these to build a race detector that is only enabled when it is likely that inter-thread data sharing is occurring. When little sharing takes place, this demand-driven analysis is much faster than contemporary continuous-analysis tools without a large loss of detection accuracy. We modified the race detector in Intel(R) Inspector XE to utilize our hardware-based sharing indicator and were able to achieve performance increases of 3x and 10x in two parallel benchmark suites and 51x for one particular program.
- Published
- 2011
- Full Text
- View/download PDF
13. A Flexible Software-Based Framework for Online Detection of Hardware Defects
- Author
-
Valeria Bertacco, Onur Mutlu, Todd Austin, and Kypros Constantinides
- Subjects
business.industry ,Computer science ,Firmware ,media_common.quotation_subject ,Reliability (computer networking) ,Control reconfiguration ,Hardware_PERFORMANCEANDRELIABILITY ,computer.software_genre ,Theoretical Computer Science ,law.invention ,Microprocessor ,Software ,Computational Theory and Mathematics ,Debugging ,Hardware and Architecture ,law ,Embedded system ,business ,computer ,Computer hardware ,media_common ,Register-transfer level - Abstract
This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called access-control extensions (ACE), that can access and control the microprocessor's internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chip-multiprocessor based on Sun's Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip's overall power consumption.
- Published
- 2009
- Full Text
- View/download PDF
14. Locking down insecure indirection with hardware-based control-data isolation
- Author
-
Reetuparna Das, William Arthur, Todd Austin, and Sahil Madeka
- Subjects
Indirection ,Exploit ,business.industry ,Memoization ,Computer science ,Program transformation ,Parallel computing ,computer.software_genre ,Control flow ,Embedded system ,Code injection ,Compiler ,Cache ,Isolation (database systems) ,Programmer ,business ,computer - Abstract
Arbitrary code injection pervades as a central issue in computer security where attackers seek to exploit the software attack surface. A key component in many exploits today is the successful execution of a control-flow attack. Control-Data Isolation (CDI) has emerged as a work which eliminates the root cause of contemporary control-flow attacks: indirect control flow instructions. These instructions are replaced by direct control flow edges dictated by the programmer and encoded into the application by the compiler. By subtracting the root cause of control-flow attack, Control-Data Isolation sidesteps the vulnerabilities and restrictive threat models adopted by other solutions in this space (e.g., Control-Flow Integrity). The CDI approach, while eliminating contemporary control-flow attacks, introduces non-trivial overheads to validate indirect targets at runtime. In this work we introduce novel architectural support to accelerate the execution of CDI-compliant code. Through the addition of an edge cache, we are able to cache legal indirect target edges and eliminate nearly all execution overhead for indirection-free applications. We demonstrate that through memoization of compiler-confirmed control flow transitions, overheads are reduced from 19% to 0.5% on average for Control-Data Isolated applications. Additionally, we show that the edge cache can efficiently provide the double-duty of predicting multi-way branch targets, thus providing even speedups for some CDI-compliant executions, compared to an architecture with unsophisticated indirect control prediction (e.g., BTB).
- Published
- 2015
- Full Text
- View/download PDF
15. Keynote talk I: Ending the Tyranny of Amdahl's Law
- Author
-
Todd Austin
- Subjects
Value (ethics) ,symbols.namesake ,Amdahl's law ,Work (electrical) ,Computer science ,Scale (chemistry) ,Scalability ,symbols ,Operating system ,Computer security ,computer.software_genre ,computer - Abstract
If the computing industry wants to continue to make scalability the primary source of value in tomorrow's computing systems, we will have to quickly find new and productive ways to scale the serial portions of important applications. In this talk, I will highlight my work and the work of others to do just this through the application of heterogeneous parallel designs. Of course, we will want to address the scalability of sequential codes, but future scalability success will ultimately hinge on addressing how we address the scalability of future applications' through more affordable design and manufacturing techniques.
- Published
- 2015
- Full Text
- View/download PDF
16. On Architectural Support for Systems Security
- Author
-
Todd Austin and Mohit Tiwari
- Subjects
Architectural pattern ,Hardware and Architecture ,Computer science ,Covert channel ,Side channel attack ,Electrical and Electronic Engineering ,Systems modeling ,Architectural support ,Architectural technology ,Computer security ,computer.software_genre ,computer ,Software - Published
- 2016
- Full Text
- View/download PDF
17. SimpleScalar: an infrastructure for computer system modeling
- Author
-
Daniel J. Ernst, Eric D. Larson, and Todd Austin
- Subjects
Flexibility (engineering) ,Correctness ,General Computer Science ,Computer science ,business.industry ,Software performance testing ,Systems modeling ,computer.software_genre ,Microarchitecture ,Instruction set ,Software ,Computer architecture ,x86 ,business ,computer ,Interpreter - Abstract
Designers can execute programs on software models to validate a proposed hardware design's performance and correctness, while programmers can use these models to develop and test software before the real hardware becomes available. Three critical requirements drive the implementation of a software model: performance, flexibility, and detail. Performance determines the amount of workload the model can exercise given the machine resources available for simulation. Flexibility indicates how well the model is structured to simplify modification, permitting design variants or even completely different designs to be modeled with ease. Detail defines the level of abstraction used to implement the model's components. The SimpleScalar tool set provides an infrastructure for simulation and architectural modeling. It can model a variety of platforms ranging from simple unpipelined processors to detailed dynamically scheduled microarchitectures with multiple-level memory hierarchies. SimpleScalar simulators reproduce computing device operations by executing all program instructions using an interpreter. The tool set's instruction interpreters also support several popular instruction sets, including Alpha, PPC, x86, and ARM.
- Published
- 2002
- Full Text
- View/download PDF
18. Challenges in processor modeling and validation [Guest Editors' introduction]
- Author
-
T.M. Conte, Pradip Bose, and Todd Austin
- Subjects
Computer science ,Hardware description language ,Pipeline (software) ,Microarchitecture ,Instruction set ,Computer architecture ,Hardware and Architecture ,VHDL ,Verilog ,Electrical and Electronic Engineering ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Reference model ,computer ,Software ,Abstraction (linguistics) ,computer.programming_language - Abstract
The methodology for designing state-of-the-art microprocessors involves modeling at various levels of abstraction. In the pre-synthesis phase, this can range from early-stage (microarchitectural) performance-only models to final-stage, detailed register-transfer-level (RTL) models. Hierarchical modeling requires the use of an elaborate validation methodology to ensure inter- and intra-level model integrity. The RTL model, often coded in a hardware description language (e.g. Verilog or VHDL) captures the logical behavior of the entire chip: both in terms of function and cycle-by-cycle pipeline flow timing. It is this model that is subjected to simulation-based architectural validation prior to actual "tape-out" of the processor. The validated RTL specification is used as the source reference model for synthesizing the gate- and circuit-level descriptions of the processor.
- Published
- 1999
- Full Text
- View/download PDF
19. The SimpleScalar tool set, version 2.0
- Author
-
Doug Burger and Todd Austin
- Subjects
Computer science ,business.industry ,Re-order buffer ,Instruction window ,General Medicine ,computer.software_genre ,Set (abstract data type) ,Software portability ,Documentation ,Operating system ,Trace Cache ,Software engineering ,business ,computer - Abstract
This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors. The new release offers more tools and capabilities, precompiled binaries, cleaner interfaces, better documentation, easier installation, improved portability, and higher performance. This paper contains a complete description of the tool set, including retrieval and installation instructions, a description of how to use the tools, a description of the target SimpleScalar architecture, and many details about the internals of the tools and how to customize them. With this guide, the tool set can be brought up and generating results in under an hour (on supported platforms).
- Published
- 1997
- Full Text
- View/download PDF
20. Mobile supercomputers
- Author
-
Chaitali Chakrabarti, Scott Mahlke, Trevor Mudge, D. Blaauw, Wayne Wolf, and Todd Austin
- Subjects
Product (business) ,General Computer Science ,Computer science ,Operating system ,Mobile computing ,computer.software_genre ,Computer security ,computer - Abstract
We need mobile supercomputers that provide massive computational performance from the power in a battery. These supercomputers will make our personal devices much easier to use. They will perform real-time speech recognition, video transmission and analysis, and high bandwidth communication. And they will do so without us having to worry about where the next electrical outlet will be. But to achieve this functionality, we must rethink the way we design computers. Rather than worrying solely about performance, with the occasional nod to power consumption and cost, we need to judge computers by their performance-power-cost product. This new way of looking at processors will lead us to new computer architectures and new ways of thinking about computer system design.
- Published
- 2004
- Full Text
- View/download PDF
21. Schnauzer: scalable profiling for likely security bug sites
- Author
-
Valeria Bertacco, William Arthur, Todd Austin, Ricardo Rodriguez, and Biruk Mammo
- Subjects
Profiling (computer programming) ,Security bug ,Exploit ,Computer science ,business.industry ,Vulnerability management ,Computer security ,computer.software_genre ,Software ,Software bug ,Software security assurance ,Scalability ,business ,computer - Abstract
Software bugs comprise the greatest threat to computer security today. Though enormous effort has been expended on eliminating security exploits, contemporary testing techniques are insufficient to deliver software free of security vulnerabilities. In this paper we propose a novel approach to security vulnerability analysis: dynamic control frontier profiling. Security exploits are often buried in rarely executed code paths hidden just beyond the path space explored by end-users. Therefore, we develop Schnauzer, a distributed sampling technology to discover the dynamic control frontier, which forms the line of demarcation between dynamically executed and unseen paths. This frontier may then be used to direct tools (such as white-box fuzz testers) to attain a level of testing coverage currently unachievable. We further demonstrate that the dynamic control frontier paths are a rich source of security bugs, sensitizing many known security exploits.
- Published
- 2013
- Full Text
- View/download PDF
22. Efficient detection of all pointer and array access errors
- Author
-
Gurindar S. Sohi, Todd Austin, and Scott E. Breach
- Subjects
Computer science ,Programming language ,Interpreted language ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Bounds checking ,Dangling pointer ,Pointer (computer programming) ,Escape analysis ,Compiler ,Pointer analysis ,computer ,Memory safety ,Software - Abstract
We present a pointer and array access checking technique that provides complete error coverage through a simple set of program transformations. Our technique, based on an extended safe pointer representation, has a number of novel aspects. Foremost, it is the first technique that detects all spatial and temporal access errors. Its use is not limited by the expressiveness of the language; that is, it can be applied successfully to compiled or interpreted languages with subscripted and mutable pointers, local references, and explicit and typeless dynamic storage management, e.g., C. Because it is a source level transformation, it is amenable to both compile- and run-time optimization. Finally, its performance, even without compile-time optimization, is quite good. We implemented a prototype translator for the C language and analyzed the checking overheads of six non-trivial, pointer intensive programs. Execution overheads range from 130% to 540%; with text and data size overheads typically below 100%.
- Published
- 1994
- Full Text
- View/download PDF
23. The potential of sampling for dynamic analysis
- Author
-
Todd Austin and Joseph L. Greathouse
- Subjects
education.field_of_study ,Offset (computer science) ,Risk analysis (engineering) ,Computer science ,Research community ,Population ,Data mining ,education ,computer.software_genre ,computer ,Dynamic software - Abstract
This paper presents an argument for distributing dynamic software analyses to large populations of users in order to locate bugs that cause security flaws. We review a collection of dynamic analysis systems and show that, despite a great deal of effort from the research community, their performance is still too low to allow their use in the field. We then show that there are effective sampling mechanisms for accelerating a wide range of powerful dynamic analyses. These mechanisms reduce the rate at which errors are observed by individual analyses, but this loss can be offset by the subsequent increase in test population. Nevertheless, there are unsolved issues in this domain that deserve attention if this technique is to be widely utilized.
- Published
- 2011
- Full Text
- View/download PDF
24. What input-language is the best choice for high level synthesis (HLS)?
- Author
-
Steve Svoboda, Todd Austin, and D.D. Gajski
- Subjects
Point (typography) ,Programming language ,Computer science ,Hardware description language ,Control (management) ,SystemVerilog ,computer.software_genre ,SystemC ,High-level synthesis ,Key (cryptography) ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,computer ,Hardware_LOGICDESIGN ,computer.programming_language - Abstract
As of 2010, over 30 of the world's top semiconductor / systems companies have adopted HLS. In 2009, SOCs tape-outs containing IPs developed using HLS exceeded 50 for the first time. Now that the practicality and value of HLS is established, engineers are turning to the question of "what input-language works best?" The answer is critical because it drives key decisions regarding the tool/methodology infrastructure companies will create around this new flow. ANSI-C/C++ advocates cite ease-of-learning, simulation speed. SystemC advocates make similar claims, and point to SystemC's hardware-oriented features. Proponents of BSV (Bluespec SystemVerilog) claim that language enhances architectural transparency and control. To maximize the benefits of HLS, companies must consider many factors and tradeoffs.
- Published
- 2010
- Full Text
- View/download PDF
25. Using introspective software-based testing for post-silicon debug and repair
- Author
-
Todd Austin and Kypros Constantinides
- Subjects
business.industry ,Computer science ,Firmware ,Hardware_PERFORMANCEANDRELIABILITY ,Application software ,computer.software_genre ,Maintenance engineering ,Software quality ,law.invention ,Instruction set ,Microprocessor ,Software ,law ,Embedded system ,Microcode ,business ,computer - Abstract
As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common, to the point of threatening yield rates and product lifetimes. Introspective software mechanisms hold great promise to address these reliability challenges with both low cost and high coverage. To address these challenges, we have developed a novel instruction set enhancement, called Access-Control Extensions (ACE), that can access and control a microprocessor's internal state. Using ACE technology, special firmware can periodically probe the microprocessor during execution to locate run-time faults, repair design errors (even those discovered in the field), and streamline manufacturing tests.
- Published
- 2010
- Full Text
- View/download PDF
26. Session details: Computation in the post-Turing era
- Author
-
Todd Austin
- Subjects
Theoretical computer science ,Computer science ,Computation ,Session (computer science) ,Turing ,computer ,computer.programming_language - Published
- 2009
- Full Text
- View/download PDF
27. Online design bug detection: RTL analysis, flexible mechanisms, and evaluation
- Author
-
Onur Mutlu, Kypros Constantinides, and Todd Austin
- Subjects
Computer science ,business.industry ,Bebugging ,Multiprocessing ,Integrated circuit design ,computer.software_genre ,Logic synthesis ,Software bug ,Embedded system ,Operating system ,OpenSPARC ,Algorithm design ,business ,computer ,Formal verification - Abstract
Higher level of resource integration and the addition of new features in modern multi-processors put a significant pressure on their verification. Although a large amount of resources and time are devoted to the verification phase of modern processors, many design bugs escape the verification process and slip into processors operating in the field. These design bugs often lead to lower quality products, lower customer satisfaction, diminishing brand/company reputation, or even expensive product recalls. This paper proposes a flexible, low-overhead mechanism to detect the occurrence of design bugs during on-line operation. First, we analyze the actual design bugs found and fixed in a commercial chip- multiprocessor, Sun's OpenSPARC Tl, to understand the behavior and characteristics of design bugs. Our RTL analysis of design bugs shows that the number of signals that need to be monitored to detect design bugs is significantly larger than suggested by previous studies that analyzed design bugs at a higher level using processor errata sheets. Second, based on the insights obtained from our analyses, we propose a programmable, distributed online design bug detection mechanism that incorporates the monitoring of bugs into the flip-flops of the design. The key contribution of our mechanism is its ability to monitor all control signals in the design rather than a set of signals selected at design time. As a result, it is very flexible: when a bug is discovered after the processor is shipped, it can be detected by monitoring the set of control signals that trigger the design bug. We develop an RTL prototype implementation of our mechanism on the OpenSPARC Tl chip multiprocessor. We found its area overhead to be 10% and its power consumption overhead to be 3.5% over the whole OpenSPARC Tl chip.
- Published
- 2008
- Full Text
- View/download PDF
28. Testudo: Heavyweight security analysis via statistical sampling
- Author
-
Ilya Wagner, Joseph L. Greathouse, Gautam Bhatnagar, Todd Austin, Seth Pettie, Valeria Bertacco, and David A. Ramos
- Subjects
education.field_of_study ,Security analysis ,Computer science ,business.industry ,Dataflow ,media_common.quotation_subject ,Population ,computer.software_genre ,Taint checking ,Debugging ,Software bug ,Embedded system ,Operating system ,Instrumentation (computer programming) ,Cache ,business ,education ,computer ,media_common ,Data-flow analysis - Abstract
Heavyweight security analysis systems, such as taint analysis and dynamic type checking, are powerful technologies used to detect security vulnerabilities and software bugs. Traditional software implementations of these systems have high instrumentation overhead and suffer from significant performance impacts. To mitigate these slowdowns, a few hardware-assisted techniques have been recently proposed. However, these solutions incur a large memory overhead and require hardware platform support in the form of tagged memory systems and extended bus designs. Due to these costs and limitations, the deployment of heavyweight security analysis solutions is, as of today, limited to the research lab. In this paper, we describe Testudo, a novel hardware approach to heavyweight security analysis that is based on statistical sampling of a programpsilas dataflow. Our dynamic distributed debugging reduces the memory overhead to a small storage space by selectively sampling only a few tagged variables to analyze during any particular execution of the program. Our system requires only small hardware modifications: it adds a small sample cache to the main processor and extends the pipeline registers to propagate analysis tags. To gain high analysis coverage, we rely on a population of users to run the program, sampling a different random set of variables during each new run. We show that we can achieve high coverage analysis at virtually no performance impact, even with a reasonably-sized population of users. In addition, our approach even scales to heavyweight debugging techniques by keeping per-user runtime overheads low despite performing traditionally costly analyses. Moreover, the low hardware cost of our implementation allows it to be easily distributed across large user populations, leading to a higher level of security analysis coverage than previously.
- Published
- 2008
- Full Text
- View/download PDF
29. Session details: Instruction-set optimisations
- Author
-
Georgi Gaydadjiev and Todd Austin
- Subjects
Instruction set ,Multimedia ,Computer science ,Session (computer science) ,computer.software_genre ,computer - Published
- 2008
- Full Text
- View/download PDF
30. Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications
- Author
-
Valeria Bertacco, Todd Austin, Kypros Constantinides, and Onur Mutlu
- Subjects
business.industry ,Computer science ,Firmware ,Reliability (computer networking) ,Overhead (engineering) ,Process (computing) ,Hardware_PERFORMANCEANDRELIABILITY ,computer.software_genre ,Software quality ,law.invention ,Instruction set ,Microprocessor ,Software ,law ,Embedded system ,business ,computer ,Computer hardware - Abstract
As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common. Such de- fects are bound to hinder the correct operation of future processor systems, unless new online techniques become available to detect and to tolerate them while preserving the integrity of software applications running on the system. This paper proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instruc- tions, called Access-Control Extension (ACE), that can access and control the microprocessor's internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hard- ware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfigura- tion. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade off performance with reliability without requiring any change to the hardware. We evaluated our technique on a commercial chip-multiprocessor based on Sun's Niagara and found that it can provide very high coverage, with 99.22% of all silicon defects detected. Moreover, our results show that the average performance overhead of software- based testing is only 5.5%. Based on a detailed RTL-level imple- mentation of our technique, we find its area overhead to be quite modest, with only a 5.8% increase in total chip area.
- Published
- 2007
- Full Text
- View/download PDF
31. Efficient checker processor design
- Author
-
Todd Austin, Saugata Chatterjee, and Christopher T. Weaver
- Subjects
Multi-core processor ,Computer science ,business.industry ,Pipeline (computing) ,Reliability (computer networking) ,Processor design ,computer.software_genre ,law.invention ,Microprocessor ,law ,Computer data storage ,Operating system ,Cache ,business ,computer - Abstract
The design and implementation of a modern microprocessor creates many reliability challenges. Designers must verify the correctness of large complex systems and construct implementations that work reliably in varied (and occasionally adverse) operating conditions. In our previous work, we proposed a solution to these problems by adding a simple, easily verifiable checker processor at pipeline retirement. Performance analyses of our initial design were promising, overall slowdowns due to checker processor hazards were less than 3%. However slowdowns for some outlier programs were larger. The authors closely examine the operation of the checker processor. We identify the specific reasons why the initial design works well for some programs, but slows others. Our analyses suggest a variety of improvements to the checker processor storage system. Through the addition of a 4 k checker cache and eight entry store queue, our optimized design eliminates virtually all core processor slowdowns. Moreover, we develop insights into why the optimized checker processor performs well, insights that suggest it should perform well for any program.
- Published
- 2002
- Full Text
- View/download PDF
32. CryptoManiac
- Author
-
Christopher T. Weaver, Lisa Wu, and Todd Austin
- Subjects
Triple DES ,business.industry ,Computer science ,computer.internet_protocol ,Advanced Encryption Standard ,Cryptography ,General Medicine ,Cryptographic protocol ,Cipher ,Secure communication ,Embedded system ,IPsec ,business ,Communications protocol ,computer - Abstract
The growth of the Internet as a vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. This trend will be further underscored with the widespread adoption of secure protocols such as secure IP (IPSEC) and virtual private networks (VPNs). In this paper, we introduce the CryptoManiac processor, a fast and flexible co-processor for cryptographic workloads. Our design is extremely efficient; we present analysis of a 0.25um physical design that runs the standard Rijndael cipher algorithm 2.25 times faster than a 600MHz Alpha 21264 processor. Moreover, our implementation requires 1/100 th the area and power in the same technology. We demonstrate that the performance of our design rivals a state-of-the-art dedicated hardware implementation of the 3DES (triple DES) algorithm, while retaining the flexibility to simultaneously support multiple cipher algorithms. Finally, we define a scalable system architecture that combines CryptoManiac processing elements to exploit inter-session and inter-packet parallelism available in many communication protocols. Using I/O traces and detailed timing simulation, we show that chip multiprocessor configurations can effectively service high throughput applications including secure web and disk I/O processing.
- Published
- 2001
- Full Text
- View/download PDF
33. Compiler controlled value prediction using branch predictor based confidence
- Author
-
Todd Austin and Eric D. Larson
- Subjects
Speedup ,Computer science ,Value (computer science) ,Compiler ,Parallel computing ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,Arithmetic ,Branch predictor ,computer.software_genre ,Instruction-level parallelism ,Branch misprediction ,computer ,Critical path method - Abstract
Value prediction breaks data dependencies in a program thereby creating instruction level parallelism that can increase program performance. Hardware based value prediction techniques have been shown to increase speed, but at great cost as designs include prediction tables, selection logic, and a confidence mechanism. This paper proposes compiler-controlled value prediction optimizations that obtain good speedups while keeping hardware costs low. The branch predictor is used to estimate the confidence of the value predictor for speculated instructions. This technique obtains 4.6% speedup when completely implemented in software and 15.2% speedup when minimal hardware support (a 1 KB predictor table) is added. We also explore the use of critical path information to aid in the selection of value prediction candidates. The key result of our study is that programs with long dynamic dependence chains benefit with this technique while programs with shorter chains benefit more so from simple selection methods that favor optimization frequency. A new branch instruction that ignores innocuous value mispredictions is shown to eliminate unnecessary mispredictions when program semantics aren't violated by confidence branch mispredictions.
- Published
- 2000
- Full Text
- View/download PDF
34. Classifying load and store instructions for memory renaming
- Author
-
Brad Calder, Glenn Reinman, Todd Austin, Gary Tyson, and Dean M. Tullsen
- Subjects
Profiling (computer programming) ,Memory address ,Dependency (UML) ,Computer science ,Value (computer science) ,Table (database) ,Compiler ,Parallel computing ,computer.software_genre ,computer ,Bottleneck - Abstract
Memory operations remain a significant bottleneck in dynamically scheduled pipelined processors, due in part to the inability to statically determine the existence of memory address dependencies. Hardware memory renaming techniques have been proposed to predict which stores a load might be dependent upon. These prediction techniques can be used to speculatively forward a value from a predicted store dependency to a load through a value prediction table. However, these techniques require large, timeconsuming hardware tables. In this paper we propose a software-guided approach for identifying dependencies between store and load instructions and the Load Marking (LM) architecture to communicate these dependencies to the hardware. Compiler analysis and profiles are used to find important store/load relationships, and these relationships are identified during execution via hints or an n-bit tag. For those loads that are not marked for renaming, we then use additional profiling information to further classify the loads into those that have accurate value prediction and those that do not. These classifications allow the processor to individually apply the most appropriate aggressive form of execution for each load.
- Published
- 1999
- Full Text
- View/download PDF
35. Dynamic dependency analysis of ordinary programs
- Author
-
Gurindar S. Sohi and Todd Austin
- Subjects
Cellular architecture ,Programming language ,Computer science ,Fortran ,Data parallelism ,Task parallelism ,Parallel computing ,Scalable parallelism ,computer.software_genre ,Minimal instruction set computer ,Memory-level parallelism ,Concurrent computing ,Implicit parallelism ,Instruction-level parallelism ,computer ,computer.programming_language - Abstract
A quantitative analysis of program execution is essential to the computer architecture design process. With the current trend in architecture of enhancing the performance of uniprocessors by exploiting fine-grain parallelism, first-order metrics of program execution, such as operation frequencies, are not sufficient; characterizing the exact nature of dependencies between operations is essential.This paper presents a methodology for constructing the dynamic execution graph that characterizes the execution of an ordinary program (an application program written in an imperatibve language such as C or FORTRAN) from a serial execution trace of the program. It then uses the methodology to study parallelism in the SPEC benchmarks. We see that the prallelism can be bursty in nature (periods of lots of parallelism followed by periods of little parallelism), but the average parallelism is quite high, ranging from 13 to 23,302 operations per cycle. Exposing this parallelism requires renaming of both registers and memory, though renaming registers alone exposes much of this parallelism. We also see that fairly large windows of dynamic instructions would be required to expose this parallelism from a sequential instruction stream.
- Published
- 1992
- Full Text
- View/download PDF
36. Assessment of Low-Budget Targeted Cyberattacks Against Power Systems
- Author
-
Anastasis Keliris, Charalambos Konstantinou, XiaoRui Liu, Marios Sazos, Michail Maniatakos, Florida State University [Tallahassee] (FSU), Florida Agricultural and Mechanical University (FAMU), University of Florida [Gainesville] (UF), NYU Tandon School of Engineering, New York University [Abu Dhabi], NYU System (NYU), Nicola Bombieri, Graziano Pravadelli, Masahiro Fujita, Todd Austin, Ricardo Reis, TC 10, and WG 10.5
- Subjects
Public infrastructure ,Spoofing attack ,Computer science ,business.industry ,020209 energy ,Phasor ,02 engineering and technology ,Computer security ,computer.software_genre ,Units of measurement ,Electric power system ,Electric power transmission ,Assisted GPS ,0202 electrical engineering, electronic engineering, information engineering ,Global Positioning System ,[INFO]Computer Science [cs] ,business ,computer - Abstract
International audience; The security and well-being of societies and economies are tied to the reliable and resilient operation of power systems. In the next decades, power systems are expected to become more heavily loaded and operate closer to their stability limits and operating constraints. On top of that, in recent years, cyberattacks against computing systems and networks integrated in the power grid infrastructure are a real and growing threat. Such actions, especially in industrial environments such as power systems, are generally deemed feasible only by resource-wealthy nation state actors. This chapter challenges this perception and presents a methodology, named Open Source Exploitation (OSEXP), which utilizes information from public infrastructure to assess an advanced attack vector on power systems. The attack targets Phasor Measurement Units (PMUs) which depend on Global Positioning System (GPS) signals to provide time-stamped circuit quantities of power lines. Specifically, we present a GPS time spoofing attack using low-cost commercial devices and open source software. The necessary information for the instantiation of the OSEXP attack is extracted by developing a test case model of the power system in a digital real-time simulator (DRTS). DRTS is also employed to evaluate the effectiveness and impact of the developed OSEXP attack methodology. The presented targeted attack demonstrates that an actor with limited budget has the ability to cause significant disruption to a nation.
- Published
- 2018
- Full Text
- View/download PDF
37. Energy-Accuracy Scalable Deep Convolutional Neural Networks: A Pareto Analysis
- Author
-
Andrea Calimera, Valentino Peluso, Department of Computer Engineering (DAUIN), Politecnico di Torino = Polytechnic of Turin (Polito), Nicola Bombieri, Graziano Pravadelli, Masahiro Fujita, Todd Austin, Ricardo Reis, TC 10, and WG 10.5
- Subjects
Artificial neural network ,Heuristic (computer science) ,Computer science ,business.industry ,020208 electrical & electronic engineering ,Pareto principle ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,020202 computer hardware & architecture ,Keyword spotting ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Artificial intelligence ,Quantization (image processing) ,business ,Pareto analysis ,computer ,Assignment problem - Abstract
International audience; This work deals with the optimization of Deep Convolutional Neural Networks (ConvNets). It elaborates on the concept of Adaptive Energy-Accuracy Scaling through multi-precision arithmetic, a solution that allows ConvNets to be adapted at run-time and meet different energy budgets and accuracy constraints. The strategy is particularly suited for embedded applications made run at the “edge” on resource-constrained platforms. After the very basics that distinguish the proposed adaptive strategy, the paper recalls the software-to-hardware vertical implementation of precision scalable arithmetic for ConvNets, then it focuses on the energy-driven per-layer precision assignment problem describing a meta-heuristic that searches for the most suited representation of both weights and activations of the neural network. The same heuristic is then used to explore the optimal trade-off providing the Pareto points in the energy-accuracy space. Experiments conducted on three different ConvNets deployed in real-life applications, i.e. Image Classification, Keyword Spotting, and Facial Expression Recognition, show adaptive ConvNets reach better energy-accuracy trade-off w.r.t. conventional static fixed-point quantization methods.
- Published
- 2018
- Full Text
- View/download PDF
38. A 65 nm CMOS Synthesizable Digital Low-Dropout Regulator Based on Voltage-to-Time Conversion with 99.6% Current Efficiency at 10-mA Load
- Author
-
Tetsuya Iizuka, Kunihiro Asada, Toru Nakura, Naoki Ojima, The University of Tokyo (UTokyo), Fukuoka University, VLSI Design and Education Center, Nicola Bombieri, Graziano Pravadelli, Masahiro Fujita, Todd Austin, Ricardo Reis, TC 10, and WG 10.5
- Subjects
Physics ,Low-dropout regulator ,Comparator ,business.industry ,020208 electrical & electronic engineering ,Electrical engineering ,02 engineering and technology ,Phase detector ,020202 computer hardware & architecture ,CMOS ,0202 electrical engineering, electronic engineering, information engineering ,Inverter ,Verilog ,[INFO]Computer Science [cs] ,business ,computer ,Voltage reference ,computer.programming_language ,Voltage - Abstract
A synthesizable digital LDO implemented with standard-cell-based digital design flow is proposed. The difference between output and reference voltages is converted into delay difference using inverter chains as voltage-controlled delay lines, then compared in the time-domain. Since the time-domain difference is straightforwardly captured by a simple DFF-based phase detector, the proposed LDO does not need an analog voltage comparator, which requires careful manual design. All the components in the LDO can be described with Verilog codes based on their specifications, and placed-and-routed with a commercial EDA tool. This automated layout design relaxes the burden and time of implementation, and enhances process portability. The proposed LDO implemented in a 65 nm standard CMOS technology occupies 0.015 mm\(^2\) area. With 10.4 MHz internal clock, the tracking response of the LDO to 200 mV switching in the reference voltage is \(\sim \)4.5 \(\upmu \)s and the transient response to 5 mA change in the load current is \(\sim \)6.6 \(\upmu \)s. At 10 mA load current, the quiescent current consumed by the LDO core is as low as 35.2 \(\upmu \)A, which leads to 99.6% current efficiency.
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.