Author: "Todd Austin" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

1. PriMax

Author: Nicholas Wendt, Todd Austin, and Valeria Bertacco
Published: 2022
Full Text: View/download PDF

2. Cyclone

Author: Pranav Kumar, Mohit Tiwari, Todd Austin, Shijia Wei, Austin Harris, and Prateek Sahu
Subjects: 010302 applied physics, business.industry, Address space, Computer science, Bandwidth (signal processing), Speculative execution, 02 engineering and technology, Interference (wave propagation), 01 natural sciences, 020202 computer hardware & architecture, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, State (computer science), Isolation (database systems), Cache, business, Computer network, Communication channel
Abstract: Micro-architecture units like caches are notorious for leaking secrets across security domains. An attacker program can contend for on-chip state or bandwidth and can even use speculative execution in processors to drive this contention; and protecting against all contention-driven attacks is exceptionally challenging. Prior works can mitigate contention channels through caches by partitioning the larger, lower-level caches or by looking for anomalous performance or contention behavior. Neither scales to large number of fine-grained domains as required by browsers and web-services that place many domains within the same address space. We observe that cache contention channels have a unique property - contention leaks information only when it is cyclic, i.e., domain A interferes with domain B, followed by interference from B to A. We propose to use this cyclic interference property to detect micro-architectural attacks as anomalous cyclic interference. Unlike partitioning, our detection approach scales to many concurrent domains in a single address space; and unlike prior anomaly detectors, cyclic interference is robust to noise from benign interference. We track cyclic interference using non-intrusive detectors in an out-of-order core and stress test our prototype, Cyclone, with fine-grained isolation in browsers (against speculation-driven attacks) and coarse-grained isolation of cores (against covert-channels embedded in database and machine learning workloads). Full-system simulations on an ARM micro-architecture show close to perfect detection rates and 260 - 1000× lower false positives than using (state-of-the-art) contention alone, with slowdowns of only ~3.6%.
Published: 2019
Full Text: View/download PDF

3. Morpheus

Author: Salessawi Ferede Yitbarek, Baris Kasikci, Mohit Tiwari, Misiker Tadesse Aga, Austin Harris, Valeria Bertacco, Zhixing Xu, Todd Austin, Sharad Malik, Zelalem Birhanu Aweke, Mark Gallagher, Shibo Chen, and Lauren Biernacki
Subjects: Hardware architecture, business.industry, Computer science, Offensive, 02 engineering and technology, Encryption, Computer security, computer.software_genre, Security testing, 020202 computer hardware & architecture, 020204 information systems, Pointer (computer programming), 0202 electrical engineering, electronic engineering, information engineering, Systems design, Moving target defense, Architecture, business, computer
Abstract: Attacks often succeed by abusing the gap between program and machine-level semantics-- for example, by locating a sensitive pointer, exploiting a bug to overwrite this sensitive data, and hijacking the victim program's execution. In this work, we take secure system design on the offensive by continuously obfuscating information that attackers need but normal programs do not use, such as representation of code and pointers or the exact location of code and data. Our secure hardware architecture, Morpheus, combines two powerful protections: ensembles of moving target defenses and churn. Ensembles of moving target defenses randomize key program values (e.g., relocating pointers and encrypting code and pointers) which forces attackers to extensively probe the system prior to an attack. To ensure attack probes fail, the architecture incorporates churn to transparently re-randomize program values underneath the running system. With frequent churn, systems quickly become impractically difficult to penetrate. We demonstrate Morpheus through a RISC-V-based prototype designed to stop control-flow attacks. Each moving target defense in Morpheus uses hardware support to individually offer more randomness at a lower cost than previous techniques. When ensembled with churn, Morpheus defenses offer strong protection against control-flow attacks, with our security testing and performance studies revealing: i) high-coverage protection for a broad array of control-flow attacks, including protections for advanced attacks and an attack disclosed after the design of Morpheus, and ii) negligible performance impacts (1%) with churn periods up to 50 ms, which our study estimates to be at least 5000x faster than the time necessary to possibly penetrate Morpheus.
Published: 2019
Full Text: View/download PDF

4. Vulnerability-tolerant secure architectures

Author: Todd Austin, Sharad Malik, Mohit Tiwari, Valeria Bertacco, and Baris Kasikci
Subjects: Computer science, Offensive, Vulnerability, 020207 software engineering, 02 engineering and technology, Computer security, computer.software_genre, 020202 computer hardware & architecture, Immune system, Software bug, 0202 electrical engineering, electronic engineering, information engineering, Systems design, State (computer science), Speculation, computer
Abstract: Today, secure systems are built by identifying potential vulnerabilities and then adding protections to thwart the associated attacks. Unfortunately, the complexity of today's systems makes it impossible to prove that all attacks are stopped, so clever attackers find a way around even the most carefully designed protections. In this article, we take a sobering look at the state of secure system design, and ask ourselves why the "security arms race" never ends? The answer lies in our inability to develop adequate security verification technologies. We then examine an advanced defensive system in nature - the human immune system - and we discover that it does not remove vulnerabilities, rather it adds offensive measures to protect the body when its vulnerabilities are penetrated We close the article with brief speculation on how the human immune system could inspire more capable secure system designs.
Published: 2018
Full Text: View/download PDF

5. SWAN

Author: Pete Ehrett, Timothy Linscott, Todd Austin, and Valeria Bertacco
Subjects: Computer science, business.industry, media_common.quotation_subject, Overhead (engineering), 0211 other engineering and technologies, 02 engineering and technology, Ambiguity, 020202 computer hardware & architecture, Trojan, 0202 electrical engineering, electronic engineering, information engineering, Decoy, business, Computer hardware, 021106 design practice & management, media_common
Abstract: For the past decade, security experts have warned that malicious engineers could modify hardware designs to include hardware back-doors (trojans), which, in turn, could grant attackers full control over a system. Proposed defenses to detect these attacks have been outpaced by the development of increasingly small, but equally dangerous, trojans. To thwart trojan-based attacks, we propose a novel architecture that maps the security-critical portions of a processor design to a one-time programmable, LUT-free fabric. The programmable fabric is automatically generated by analyzing the HDL of targeted modules. We present our tools to generate the fabric and map functionally equivalent designs onto the fabric. By having a trusted party randomly select a mapping and configure each chip, we prevent an attacker from knowing the physical location of targeted signals at manufacturing time. In addition, we provide decoy options (canaries) for the mapping of security-critical signals, such that hardware trojans hitting a decoy are thwarted and exposed. Using this defense approach, any trojan capable of analyzing the entire configurable fabric must employ complex logic functions with a large silicon footprint, thus exposing it to detection by inspection. We evaluated our solution on a RISC-V BOOM processor and demonstrated that, by providing the ability to map each critical signal to 6 distinct locations on the chip, we can reduce the chance of attack success by an undetectable trojan by 99%, incurring only a 27% area overhead.
Published: 2018
Full Text: View/download PDF

6. Reducing the overhead of authenticated memory encryption using delta encoding and ECC memory

Author: Salessawi Ferede Yitbarek and Todd Austin
Subjects: 010302 applied physics, Random access memory, Hardware_MEMORYSTRUCTURES, Delta encoding, business.industry, Computer science, 02 engineering and technology, Encryption, 01 natural sciences, 020202 computer hardware & architecture, law.invention, ECC memory, Memory management, law, Embedded system, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Error detection and correction, business, Dram, Parity bit
Abstract: Data stored in an off-chip memory, such as DRAM or non-volatile main memory, can potentially be extracted or tampered by an attacker with physical access to a device. Protecting such attacks requires storing message authentication codes and counters - which incur a 22% storage overhead. In this work, we propose techniques for reducing these overheads. We first present a scheme that leverages ECC DRAMs to reduce MAC verification & storage overheads. We replace the parity bits in standard ECC by a combination of MAC and parity bits to provide both authentication and error correction. This eliminates the extra MAC storage and minimizes the verification overhead as MACs can be read in parallel with data through the ECC bus. Next, we use efficient integer encodings to reduce counter storage overhead by 6× while enhancing application performance.
Published: 2018
Full Text: View/download PDF

7. Regaining Lost Cycles with HotCalls

Author: Valeria Bertacco, Todd Austin, and Ofir Weisse
Subjects: 010302 applied physics, Hardware security module, Speedup, business.industry, Computer science, 020206 networking & telecommunications, Cryptography, Cloud computing, 02 engineering and technology, computer.software_genre, Encryption, 01 natural sciences, System call, Server, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Leverage (statistics), business, computer
Abstract: Intel's SGX secure execution technology allows running computations on secret data using untrusted servers. While recent work showed how to port applications and large-scale computations to run under SGX, the performance implications of using the technology remains an open question. We present the first comprehensive quantitative study to evaluate the performance of SGX. We show that straightforward use of SGX library primitives for calling functions add between 8,200 - 17,000 cycles overhead, compared to 150 cycles of a typical system call. We quantify the performance impact of these library calls and show that in applications with high system calls frequency, such as memcached, openVPN, and lighttpd, which all have high bandwidth network requirements, the performance degradation may be as high as 79%. We investigate the sources of this performance degradation by leveraging a new set of microbenchmarks for SGX-specific operations such as enclave entry-calls and out-calls, and encrypted memory I/O accesses. We leverage the insights we gain from these analyses to design a new SGX interface framework HotCalls. HotCalls are based on a synchronization spin-lock mechanism and provide a 13-27x speedup over the default interface. It can easily be integrated into existing code, making it a practical solution. Compared to a baseline SGX implementation of memcached, openVPN, and lighttpd - we show that using the new interface boosts the throughput by 2.6-3.7x, and reduces application latency by 62-74%.
Published: 2017
Full Text: View/download PDF

8. Locking down insecure indirection with hardware-based control-data isolation

Author: Reetuparna Das, William Arthur, Todd Austin, and Sahil Madeka
Subjects: Indirection, Exploit, business.industry, Memoization, Computer science, Program transformation, Parallel computing, computer.software_genre, Control flow, Embedded system, Code injection, Compiler, Cache, Isolation (database systems), Programmer, business, computer
Abstract: Arbitrary code injection pervades as a central issue in computer security where attackers seek to exploit the software attack surface. A key component in many exploits today is the successful execution of a control-flow attack. Control-Data Isolation (CDI) has emerged as a work which eliminates the root cause of contemporary control-flow attacks: indirect control flow instructions. These instructions are replaced by direct control flow edges dictated by the programmer and encoded into the application by the compiler. By subtracting the root cause of control-flow attack, Control-Data Isolation sidesteps the vulnerabilities and restrictive threat models adopted by other solutions in this space (e.g., Control-Flow Integrity). The CDI approach, while eliminating contemporary control-flow attacks, introduces non-trivial overheads to validate indirect targets at runtime. In this work we introduce novel architectural support to accelerate the execution of CDI-compliant code. Through the addition of an edge cache, we are able to cache legal indirect target edges and eliminate nearly all execution overhead for indirection-free applications. We demonstrate that through memoization of compiler-confirmed control flow transitions, overheads are reduced from 19% to 0.5% on average for Control-Data Isolated applications. Additionally, we show that the edge cache can efficiently provide the double-duty of predicting multi-way branch targets, thus providing even speedups for some CDI-compliant executions, compared to an architecture with unsophisticated indirect control prediction (e.g., BTB).
Published: 2015
Full Text: View/download PDF

9. Bridging the Moore's Law Performance Gap with Innovation Scaling

Author: Todd Austin
Subjects: Engineering, Moore's law, Dennard scaling, Bridging (networking), business.industry, media_common.quotation_subject, Performance gap, Industrial engineering, Scale (social sciences), business, Scaling, Simulation, media_common, Simple (philosophy)
Abstract: The end of Dennard scaling and the tyranny of Ahmdal's law have created significant barriers to system scaling, leading to a gap between today's system performance and where Moore's law predicted it should be.I believe the solution to this problem is to scale innovation. Finding better solutions to improve system performance and efficiency, and doing this more quickly than previously possible could address the growing performance gap.In this talk, I will highlight a number of simple (and not so simple) ideas to address this challenge.
Published: 2015
Full Text: View/download PDF

10. EFFEX

Author: Silvio Savarese, Robert Perricone, Andrew D. Jones, Jason Clemons, and Todd Austin
Subjects: Speedup, business.industry, Computer science, Feature extraction, Mobile computing, Scale-invariant feature transform, Pipeline (software), CUDA, Software, Computer architecture, Feature (computer vision), Embedded system, Memory architecture, Computer vision, Artificial intelligence, business
Abstract: The deployment of computer vision algorithms in mobile applications is growing at a rapid pace. A primary component of the computer vision software pipeline is feature extraction, which identifies and encodes relevant image features. We present an embedded heterogeneous multicore design named EFFEX that incorporates novel functional units and memory architecture support, making it capable of increasing mobile vision performance while balancing power and area. We demonstrate this architecture running three common feature extraction algorithms, and show that it is capable of providing significant speedups at low cost. Our simulations show a speedup of as much as 14× for feature extraction with a decrease in energy of 40× for memory accesses.
Published: 2011
Full Text: View/download PDF

11. The potential of sampling for dynamic analysis

Author: Todd Austin and Joseph L. Greathouse
Subjects: education.field_of_study, Offset (computer science), Risk analysis (engineering), Computer science, Research community, Population, Data mining, education, computer.software_genre, computer, Dynamic software
Abstract: This paper presents an argument for distributing dynamic software analyses to large populations of users in order to locate bugs that cause security flaws. We review a collection of dynamic analysis systems and show that, despite a great deal of effort from the research community, their performance is still too low to allow their use in the field. We then show that there are effective sampling mechanisms for accelerating a wide range of powerful dynamic analyses. These mechanisms reduce the rate at which errors are observed by individual analyses, but this loss can be offset by the subsequent increase in test population. Nevertheless, there are unsolved issues in this domain that deserve attention if this technique is to be widely utilized.
Published: 2011
Full Text: View/download PDF

12. What input-language is the best choice for high level synthesis (HLS)?

Author: Steve Svoboda, Todd Austin, and D.D. Gajski
Subjects: Point (typography), Programming language, Computer science, Hardware description language, Control (management), SystemVerilog, computer.software_genre, SystemC, High-level synthesis, Key (cryptography), Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION, computer, Hardware_LOGICDESIGN, computer.programming_language
Abstract: As of 2010, over 30 of the world's top semiconductor / systems companies have adopted HLS. In 2009, SOCs tape-outs containing IPs developed using HLS exceeded 50 for the first time. Now that the practicality and value of HLS is established, engineers are turning to the question of "what input-language works best?" The answer is critical because it drives key decisions regarding the tool/methodology infrastructure companies will create around this new flow. ANSI-C/C++ advocates cite ease-of-learning, simulation speed. SystemC advocates make similar claims, and point to SystemC's hardware-oriented features. Proponents of BSV (Bluespec SystemVerilog) claim that language enhances architectural transparency and control. To maximize the benefits of HLS, companies must consider many factors and tradeoffs.
Published: 2010
Full Text: View/download PDF

13. Using introspective software-based testing for post-silicon debug and repair

Author: Todd Austin and Kypros Constantinides
Subjects: business.industry, Computer science, Firmware, Hardware_PERFORMANCEANDRELIABILITY, Application software, computer.software_genre, Maintenance engineering, Software quality, law.invention, Instruction set, Microprocessor, Software, law, Embedded system, Microcode, business, computer
Abstract: As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common, to the point of threatening yield rates and product lifetimes. Introspective software mechanisms hold great promise to address these reliability challenges with both low cost and high coverage. To address these challenges, we have developed a novel instruction set enhancement, called Access-Control Extensions (ACE), that can access and control a microprocessor's internal state. Using ACE technology, special firmware can periodically probe the microprocessor during execution to locate run-time faults, repair design errors (even those discovered in the field), and streamline manufacturing tests.
Published: 2010
Full Text: View/download PDF

14. Session details: Computation in the post-Turing era

Author: Todd Austin
Subjects: Theoretical computer science, Computer science, Computation, Session (computer science), Turing, computer, computer.programming_language
Published: 2009
Full Text: View/download PDF

15. Session details: Microarchitecture analysis and optimisation

Author: Todd Austin and Georgi Gaydadjiev
Subjects: Computer architecture, Computer science, Session (computer science), Microarchitecture
Published: 2008
Full Text: View/download PDF

16. Session details: Instruction-set optimisations

Author: Georgi Gaydadjiev and Todd Austin
Subjects: Instruction set, Multimedia, Computer science, Session (computer science), computer.software_genre, computer
Published: 2008
Full Text: View/download PDF

17. Session details: Clocks, scheduling, and stores

Author: Todd Austin
Subjects: Computer science, business.industry, business, Scheduling (computing), Computer network
Published: 2007
Full Text: View/download PDF

18. Reliability-aware data placement for partial memory protection in embedded processors

Author: Todd Austin and Mojtaba Mehrara
Subjects: Page fault, Computer science, business.industry, General protection fault, Embedded system, Fault coverage, Overhead (computing), Fault tolerance, Fault injection, business, Segmentation fault, Memory protection
Abstract: Low cost protection of embedded systems against soft errors has recently become a major concern. This issue is even more critical in memory elements that are inherently more prone to transient faults. In this paper, we propose a reliability aware data placement technique in order to partially protect embedded memory systems. We show that by adopting this method instead of traditional placement schemes with complete memory protection, an acceptable level of fault tolerance can be achieved while incurring less area and power overhead. In this approach, each variable in the program is placed in either protected or non-protected memory area according to the profile-driven liveness analysis of all memory variables. In order to measure the level of fault coverage, we inject faults into the memory during the course of program execution in a Monte Carlo simulation framework. Subsequently, we calculate the coverage of partial protection scheme based on the number of protected, failed and crashed runs during the fault injection experiment.
Published: 2006
Full Text: View/download PDF

19. Robust low power computing in the nanoscale era

Author: Todd Austin
Subjects: Engineering, business.industry, Emerging technologies, Robustness (computer science), Embedded system, Dynamic demand, business, Speculation, Scaling, Microarchitecture, Reliability engineering
Abstract: This tutorial will present recent results in robust low power computing. The perspective will be microarchitectural: what limitations does it put on the microarchitecture and what can the microarchitect do to reduce the dependency on power and improve robustness? The tutorial will start with a technology overview that charts future trends in power and reliability. We will present a summary of prior research in dynamic power reduction in microarchitectures, and give some examples of industrial solutions. We will also review prior research in microarchitectural reduction of leakage performed by us and others. While the continued scaling that Moore's Law predicts is in many ways good for reducing power, scaling also reduces reliability by increasing uncertainty in device performance. Therefore, in order to take advantage of scaling, it will be necessary to compute in the presence of various types of silicon-related faults. Two that are particularly important are single-event upsets, and, even more serious, gates that will not meet their specifications. We will review techniques to provide robustness in light of these trends. In particular, we will revisit techniques developed by the fault-tolerant community as well as newer ideas in timing speculation, exemplified by our Razor research.The tutorial is intended for computer architects and circuit designers interested in a better understanding of current reliability challenges and emerging technologies to address them.
Published: 2006
Full Text: View/download PDF

20. A second-generation sensor network processor with application-driven memory optimizations and out-of-order execution

Author: M. Minuth, Leyla Nazhandali, Todd Austin, David Blaauw, Bo Zhai, and J. Olson
Subjects: Instruction prefetch, Out-of-order execution, business.industry, Computer science, Operand, law.invention, Instruction set, Microprocessor, law, Embedded system, Memory architecture, Systems design, business, Wireless sensor network, Computer hardware
Abstract: In this paper we present a second-generation sensor network processor which consumes less than one picoJoule per instruction (typical processors use 100's to 1000's of picoJoules per instruction). As in our first-generation design effort, we strive to build microarchitectures that minimize area to reduce leakage, maximize transistor utility to reduce the energy-optimal voltage, and optimize CPI for efficient processing. The new design builds on our previous work to develop a low-power subthreshold-voltage sensor processor, this time improving the design by focusing on ISA, memory system design, and microarchitectural optimizations that reduce overall design size and improve energy-per-instruction. The new design employs 8-bit datapaths and an ultra-compact 12-bit wide RISC instruction set architecture, which enables high code density via micro-operations and flexible operand modes. The design also features a unique memory architecture with prefetch buffer and predecoded address bits, which permits both faster access to the memory and smaller instructions due to few address bits. To achieve efficient processing, the design incorporates branch speculation and out-of-order execution, but in a simplified form for reduced area and leakage-energy overheads. Using SPICE-level timing and power simulation, we find that these optimizations produce a number of Pareto-optimal designs with varied performance-energy tradeoffs. Our most efficient design is capable of running at 142 kHz (0.1 MIPS) while consuming only 600 fJ/instruction, allowing the processor to run continuously for 41 years on the energy stored in a miniature 1g lithium-ion battery. Work is ongoing to incorporate this design into an intra-ocular pressure sensor.
Published: 2005
Full Text: View/download PDF

21. Microarchitectural power modeling techniques for deep sub-micron microprocessors

Author: Taeho Kgil, Nam Sung Kim, Valeria Bertacco, Todd Austin, and Trevor Mudge
Subjects: Computer engineering, Computer science, business.industry, Low-power electronics, Embedded system, Scalability, Constraint (computer-aided design), Logic simulation, Parameterized complexity, Permission, business, Microarchitecture, Power (physics)
Abstract: The need to perform early design studies that combine architectural simulation with power estimation has become critical as power has become a design constraint whose importance has moved to the fore. To satisfy this demand several microarchitectural power simulators have been developed around SimpleScalar, a widely used microarchitectural performance simulator They have proven to be very useful at providing insights into power/performance trade-offs. However, they are neither parameterized nor technology scalable. In this paper, we propose more accurate parameterized power modeling techniques reflecting the actual technology parameters as well as input switching-events for memory and execution units. Compared to HSPICE, the proposed techniques show 93% and 91% accuracies for those blocks, but with a much faster simulation time. We also propose a more realistic power modeling technique for external I/O. In general, our approach includes more detailed microarchitectural and circuit modeling than has been the case in earlier simulators, without incurring a significant simulation time overhead - it can be as small as a few percent.
Published: 2004
Full Text: View/download PDF

22. Designing robust microarchitectures

Author: Todd Austin
Subjects: Engineering, business.industry, Computation, Microarchitecture, law.invention, Microprocessor, Robustness (computer science), law, Embedded system, Systems design, Verifiable secret sharing, System on a chip, business, Error detection and correction
Abstract: A fault-tolerant approach to microprocessor design, developed at the University of Michigan, is presented. Our approach is based on the use of in-situ checker components that validate the functional and electrical characteristics of complex microprocessor designs. Two design techniques are highlighted: a low-cost double-sampling latch design capable of eliminating power-hungry voltage margins, and a formally verifiable checker co-processor that validates all computation produced by a complex microprocessor core. By adopting a "better than worst-case" approach to system design, it is possible to address reliability and uncertainty concerns that arise during design, manufacturing and system operation.
Published: 2004
Full Text: View/download PDF

23. Architectural optimizations for low-power, real-time speech recognition

Author: Scott Mahlke, Todd Austin, and Rajeev Krishna
Subjects: Exploit, Application domain, Computer science, Speech recognition, Interface (computing), Task parallelism, Energy consumption, Power domains, Domain (software engineering), Microarchitecture
Abstract: The proliferation of computing technology to low power domains such as hand--held devices has lead to increased interest in portable interface technologies, with particular interest in speech recognition. The computational demands of robust, large vocabulary speech recognition systems, however, are currently prohibitive for such low power devices. This work begins anexploration of domain specific characteristics of speech recognition that might be exploited to achieve the requisite performance within the power constraints of such devices. We focus primarily on architectural techniques to exploit the massive amounts of potential thread level parallelism apparent in this application domain, and consider the performance / power trade-offs of such architectures. Our results show that a simple, multi-threaded, multi-pipelined processor architecture can significantly improve the performance of the time-consuming search phase of modern speech recognition algorithms, and may reduce overall energy consumption by drastically reducing dissipation of static power. We also show that the primary hurdle to achieving these performance benefits is the data request rate into the memory system, and consider some initial solutions to this problem.
Published: 2003
Full Text: View/download PDF

24. Compiler controlled value prediction using branch predictor based confidence

Author: Todd Austin and Eric D. Larson
Subjects: Speedup, Computer science, Value (computer science), Compiler, Parallel computing, Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING, Arithmetic, Branch predictor, computer.software_genre, Instruction-level parallelism, Branch misprediction, computer, Critical path method
Abstract: Value prediction breaks data dependencies in a program thereby creating instruction level parallelism that can increase program performance. Hardware based value prediction techniques have been shown to increase speed, but at great cost as designs include prediction tables, selection logic, and a confidence mechanism. This paper proposes compiler-controlled value prediction optimizations that obtain good speedups while keeping hardware costs low. The branch predictor is used to estimate the confidence of the value predictor for speculated instructions. This technique obtains 4.6% speedup when completely implemented in software and 15.2% speedup when minimal hardware support (a 1 KB predictor table) is added. We also explore the use of critical path information to aid in the selection of value prediction candidates. The key result of our study is that programs with long dynamic dependence chains benefit with this technique while programs with shorter chains benefit more so from simple selection methods that favor optimization frequency. A new branch instruction that ignores innocuous value mispredictions is shown to eliminate unnecessary mispredictions when program semantics aren't violated by confidence branch mispredictions.
Published: 2000
Full Text: View/download PDF

25. Architectural support for fast symmetric-key cryptography

Author: Todd Austin, Jerome Burke, and John W. McDonald
Subjects: Triple DES, Speedup, Modular arithmetic, business.industry, Computer science, Cryptography, Encryption, Computer Graphics and Computer-Aided Design, Permutation, Secure communication, Cipher, Computer engineering, Symmetric-key algorithm, Embedded system, Strong cryptography, business, Software
Abstract: The emergence of the Internet as a trusted medium for commerce and communication has made cryptography an essential component of modern information systems. Cryptography provides the mechanisms necessary to implement accountability, accuracy, and confidentiality in communication. As demands for secure communication bandwidth grow, efficient cryptographic processing will become increasingly vital to good system performance.In this paper, we explore techinques to improve the performance of symmetric key cipher algorithms. Eight popular strong encryption algorithms are examined in detail. Analysis reveals the algorithms are computaionally complex and contain little parallelism. Overall throughput on high-end microprocessor is quite poor, a 600 Mhz processor is incapable of saturation a T3 communication line with 3DES (triple DES) encrypted data.We introduce new instructions taht improve the efficiency of the analyzed algorithms. Our approach adds instruction set support for fast substitutions, general permutations, rotates, and modular arithmetic. Performance analysis of the optimized ciphers shows an overall speedup of 59% over a baseline machine with rotate instructions and 74% speedup over a baseline without rotates. Even higher speedups are demonstrated with optimized subtitutions (SBOXes) and additional functional unit resources. our analyses of the original and optimized algorithms suggest future directions for the design of high-performance programmable cryptographic processors.
Published: 2000
Full Text: View/download PDF

26. Classifying load and store instructions for memory renaming

Author: Brad Calder, Glenn Reinman, Todd Austin, Gary Tyson, and Dean M. Tullsen
Subjects: Profiling (computer programming), Memory address, Dependency (UML), Computer science, Value (computer science), Table (database), Compiler, Parallel computing, computer.software_genre, computer, Bottleneck
Abstract: Memory operations remain a significant bottleneck in dynamically scheduled pipelined processors, due in part to the inability to statically determine the existence of memory address dependencies. Hardware memory renaming techniques have been proposed to predict which stores a load might be dependent upon. These prediction techniques can be used to speculatively forward a value from a predicted store dependency to a load through a value prediction table. However, these techniques require large, timeconsuming hardware tables. In this paper we propose a software-guided approach for identifying dependencies between store and load instructions and the Load Marking (LM) architecture to communicate these dependencies to the hardware. Compiler analysis and profiles are used to find important store/load relationships, and these relationships are identified during execution via hints or an n-bit tag. For those loads that are not marked for renaming, we then use additional profiling information to further classify the loads into those that have accurate value prediction and those that do not. These classifications allow the processor to individually apply the most appropriate aggressive form of execution for each load.
Published: 1999
Full Text: View/download PDF

27. Cache-conscious data placement

Author: Simmi John, Todd Austin, Brad Calder, and Chandra Krintz
Subjects: Snoopy cache, Hardware_MEMORYSTRUCTURES, Computer science, Cache coloring, CPU cache, MESI protocol, Pipeline burst cache, Global Assembly Cache, Parallel computing, Cache pollution, Cache-oblivious algorithm, MESIF protocol, Smart Cache, Virtual address space, Cache invalidation, Write-once, Bus sniffing, Locality of reference, Page cache, Cache, Least frequently used, Cache algorithms, Heap (data structure)
Abstract: As the gap between memory and processor speeds continues to widen, cache eficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache pet$ormance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache pedormance.In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates an address placement for the stack (local variables), global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24% on average.
Published: 1998
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

27 results on '"Todd Austin"'

1. PriMax

2. Cyclone

3. Morpheus

4. Vulnerability-tolerant secure architectures

5. SWAN

6. Reducing the overhead of authenticated memory encryption using delta encoding and ECC memory

7. Regaining Lost Cycles with HotCalls

8. Locking down insecure indirection with hardware-based control-data isolation

9. Bridging the Moore's Law Performance Gap with Innovation Scaling

10. EFFEX

11. The potential of sampling for dynamic analysis

12. What input-language is the best choice for high level synthesis (HLS)?

13. Using introspective software-based testing for post-silicon debug and repair

14. Session details: Computation in the post-Turing era

15. Session details: Microarchitecture analysis and optimisation

16. Session details: Instruction-set optimisations

17. Session details: Clocks, scheduling, and stores

18. Reliability-aware data placement for partial memory protection in embedded processors

19. Robust low power computing in the nanoscale era

20. A second-generation sensor network processor with application-driven memory optimizations and out-of-order execution

21. Microarchitectural power modeling techniques for deep sub-micron microprocessors

22. Designing robust microarchitectures

23. Architectural optimizations for low-power, real-time speech recognition

24. Compiler controlled value prediction using branch predictor based confidence

25. Architectural support for fast symmetric-key cryptography

26. Classifying load and store instructions for memory renaming

27. Cache-conscious data placement

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

27 results on '"Todd Austin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources