Author: "Farhad Mehdipour" / Topic: computer science - Searchworks@Jio Institute Digital Library Search Results

1. Intelligent Affect-Sensitive Tutoring Systems

Author: Hossein Sarrafzadeh and Farhad Mehdipour
Subjects: Computer science, Human–computer interaction, Affect (psychology)
Published: 2021
Full Text: View/download PDF

2. A Review of IoT Security Challenges and Solutions

Author: Farhad Mehdipour
Subjects: business.industry, Computer science, Resource constraints, Topology (electrical circuits), Computer security model, Internet of Things, business, Computer security, computer.software_genre, computer
Abstract: The Internet of Things (IoT) is exponentially growth, however privacy and security vulnerabilities are major concerns against rapid adoption of this technology. Due to mainly its decentralized topology and the resource constraints of the majority of its devices, conventional security and privacy approaches are inapplicable for IoT. Interaction and communication between a large number of devices in a secure way is possible, but it can be expensive, time-consuming, and complex. Thus, there is a need for new security models rather than the current ones which are mostly centralized. This paper provides an overview of IoT architectural layers and components, security issues and challenges at different layers, some solutions, and future directions.
Published: 2020
Full Text: View/download PDF

3. Novel Casestudy and Benchmarking of AlexNet for Edge AI: From CPU and GPU to FPGA

Author: Hewa Wts Nanayakkara, Thilina Doremure Gamage, Firas Al-Ali, Sayan Kumar Ray, and Farhad Mehdipour
Subjects: 021110 strategic, defence & security studies, Computer science, 0211 other engineering and technologies, 0202 electrical engineering, electronic engineering, information engineering, Graphics processing unit, 020201 artificial intelligence & image processing, 02 engineering and technology, Benchmarking, Parallel computing, Field-programmable gate array, Massively parallel, Rendering (computer graphics)
Abstract: Convolutional Neural Networks (CNNs) require massive parallelism due to the high-precision floating-point arithmetic operations they perform. So, demand of processing power in them is significantly higher than what a standard CPU can offer. This has traditionally made CNNs more suited for running on a Graphics Processing Unit (GPU). However, GPUs consume much more power than CPUs, rendering the former impractical for implementing CNNs in Edge AI (Artificial Intelligence), where restraining power consumption is paramount. On the other hand, FPGAs (Field Programmable Gate Arrays) are more suited for AI computing at the edge as they consume much lesser power than GPUs and even CPUs. Additionally, GPUs and CPUs are not suited for real-time AI applications, which require both high throughput and low latency at the same time and FPGAs excel at all these requirements. The purpose of this paper is to provide a study of the performance of FPGA as the most suitable platform for AI-based computing at the edge. To achieve this, we chose AlexNet, a popular CNN image classifier, for which we present a case study on four different platforms: CPU, GPU, embedded RISC core, and FPGA fabric. Then, we quantitatively measure and compare the performance in terms of inference time (time needed to classify an image) on all these platforms. Inference time using FPGA is reduced by almost 64, 1.6, and 1.1 times compared to a dual-core ARM, an i5-6400 CPU, and an Nvidia GPU, respectively.
Published: 2020
Full Text: View/download PDF

4. Supercomputer Networks in the Datacenter: Benchmarking the Evolution of Communication Granularity from Macroscale down to Nanoscale

Author: Sayan Kumar Ray, Hewa Wts Nanayakkara, Farhad Mehdipour, Thilina Doremure Gamage, and Firas Al-Ali
Subjects: Multi-core processor, Grid computing, Computer science, Server, Computer cluster, InfiniBand, Granularity, Parallel computing, computer.software_genre, Field-programmable gate array, Supercomputer, computer
Abstract: In the Datacenter, a supercomputer network refers to the interconnections between the clustered processing nodes within a single supercomputer. In this paper, we primarily aim to describe how in supercomputers, as they evolve, the granularity of this inter-node communication continues to scale down, as a direct result of the processing nodes scaling down from full-sized clustered computers (and servers) to interconnected processor cores and even smaller reconfigurable logic cells. Hence, we start by first describing our exploration of the four generations of supercomputing and how they have evolved over the years from macroscale packet-switched coarse-grained cluster computing and grid computing, to conventional supercomputing, and then to fine-grained supercomputer Networks-on-Chip (NoC), and finally, to emerging fine-grained nanoscale NoC FPGA (Field Programmable Gate Arrays) supercomputer-on-chip as we see today. Apart from this, in this work, we also aim to demonstrate and analyze the results of benchmarking the Mandelbrot Set performance on a 3rd generation supercomputer, which is the Adapteva's 16-core Epiphany supercomputer NoC. On the basis of our study we can come to an inference that the next-generation supercomputing-on-chip will more likely depend on the fine-tuning between multi-core NoCs and high-end FPGA co-processors built into these NoCs.
Published: 2019
Full Text: View/download PDF

5. Commercial Security Scanning: Point-on-Sale (POS) Vulnerability and Mitigation Techniques

Author: Rosario Del Pilar Soria Choque, Blake Mitchell Paul, Farhad Mehdipour, and Bahman A. Sassani Sarrafpour
Subjects: Point of sale, business.industry, Computer science, media_common.quotation_subject, Expense management, computer.software_genre, Computer security, Payment, Cash, Payment Card Industry Data Security Standard, Cash flow, business, computer, Database transaction, media_common, Vulnerability (computing)
Abstract: Point of Sale (POS) systems has become the technology of choice for most businesses and offering number of advantages over traditional cash registers. They manage staffs, customers, transaction, inventory, sale and labor reporting, price adjustment, as well as keeping track of cash flow, expense management, reducing human errors and more. Whether traditional on-premise POS, or Cloud-Bases POS, they help businesses to run more efficiently. However, despite all these advantages, POS systems are becoming targets of a number of cyber-attacks. Security of a POS system is a key requirement of the Payment Card Industry Data Security Standard (PCI DSS). This paper undertakes research into the PCI DSS and its accompanying standards, in an attempt to break or bypass security measures using varying degrees of vulnerability and penetration attacks in a methodological format. The resounding goal of this experimentation is to achieve a basis from which attacks can be made against a realistic networking environment from whence an intruder can bypass security measures thus exposing a vulnerability in the PCI DSS and potentially exposing confidential customer payment information.
Published: 2019
Full Text: View/download PDF

6. AgentPi: An IoT Enabled Motion CCTV Surveillance System

Author: Farhad Mehdipour, Xiaosong Li, Andrew David, and Bahman A. Sassani Sarrafpour
Subjects: Computer science, business.industry, Mobile broadband, 010401 analytical chemistry, Real-time computing, 020206 networking & telecommunications, 02 engineering and technology, 01 natural sciences, Motion (physics), 0104 chemical sciences, Tangible property, Server, 0202 electrical engineering, electronic engineering, information engineering, Wireless, Anomaly detection, business, Internet of Things, UMTS frequency bands
Abstract: In this paper, we introduce a prototype of surveillance system rigged with several vital modules and mobility as an optional functionality. The proposed cost-efficient system features motion and anomaly detection without physical sensors, instant trigger notices via SMS/text and e-mail, live streaming internally and remotely including recording upon anomaly detection. As a result, savings on resources such as storage space, personnel to monitor, time and costs are promoted. Mobility for this system is anticipated to be achieved through integration of the ever-evolving mobile broadband with the current vastly available 4G LTE and 3G UMTS technology. Most importantly, this proposed concept is affordable with the availability of open source resources and minimal hardware, so that everyone can proactively exercise security precaution for their premise or tangible property.
Published: 2019
Full Text: View/download PDF

7. Physical-aware predictive dynamic thermal management of multi-core processors

Author: Bagher Salami, Hamid Noori, Farhad Mehdipour, and Mohammadreza Baharani
Subjects: 020203 distributed computing, Multi-core processor, Computer Networks and Communications, business.industry, Computer science, 02 engineering and technology, Thermal management of electronic devices and systems, Parallel computing, 020202 computer hardware & architecture, Theoretical Computer Science, Scheduling (computing), Artificial Intelligence, Hardware and Architecture, Embedded system, Thermal, 0202 electrical engineering, electronic engineering, information engineering, business, Software
Abstract: The advances in silicon process technology have made it possible to have processors with larger number of cores. The increment of cores number has been hindered by increasing power consumption and heat dissipation due to high power expenditure in a small area die size. The high temperature can cause degradation in performance, reliability, transistor aging, transition speed and increase in leakage current. In this paper, we present a method which considers different thermal behavior of cores and uses both physical sensors and performance counters simultaneously to improve thermal management of both SMT multi-core processors with a physical sensor per core and Non-SMT multi-core processors with only one physical sensor for the processor. The experimental results indicate that our technique can significantly decrease the average and peak temperature in most cases compared to Linux standard scheduler, and two well-known thermal management techniques: PDTM, and TAS. Proposing a thermal-aware scheduling method for both SMT and Non-SMT multi-core processors based on the different thermal behavior of cores due to their core unique thermal behavior.Experimental results on commercial processors indicate that our proposed approach, under full workloads, outperforms the Linux standard scheduler and two existing DTM techniques.One of the unique features of the proposed algorithm is that it has an adaptive temperature threshold, unlike previous work in which all of them assume that temperature threshold is a fixed value.
Published: 2016
Full Text: View/download PDF

8. Adaptive low‐complexity motion estimation algorithm for high efficiency video coding encoder

Author: Ahmed Medhat, Farhad Mehdipour, Mohammed S. Sayed, Maha Elsabrouty, and Ahmed Shalaby
Subjects: Pixel, Computational complexity theory, Computer science, Computation, Real-time computing, 020207 software engineering, 02 engineering and technology, Search algorithm, Motion estimation, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Encoder, Image resolution, Algorithm, Software, Coding (social sciences)
Abstract: High quality videos became an essential requirement in recent applications. High efficiency video coding (HEVC) standard provides an efficient solution for high quality videos at lower bit rates. On the other hand, HEVC comes with much higher computational cost. In particular, motion estimation (ME) in HEVC, consumes the largest amount of computations. Therefore, fast ME algorithms and hardware accelerators are proposed in order to speed-up integer ME in HEVC. This study presents a fast centre search algorithm (FCSA) and an adaptive search window algorithm (ASWA) for integer pixel ME in HEVC. In addition, centre adaptive search algorithm, a combination of the two proposed algorithms FCSA and ASWA, is proposed in order to achieve the best performance. Experimental results show notable speed-up in terms of encoding time and bit rate saving with tolerable peak signal-to-noise ratio (PSNR) quality degradation. The proposed fast search algorithms reduce the computational complexity of the HEVC encoder by 57%. This improvement is accompanied with a modest average PSNR loss of 0.014 dB and an increase by 0.6385% in terms of bit rate when compared with related works.
Published: 2016
Full Text: View/download PDF

9. A design methodology and various performance and fabrication metrics evaluation of 3D Network-on-Chip with multiplexed Through-Silicon Vias

Author: Farhad Mehdipour, Mohamed El-Sayed, Mostafa Said, Morteza Biglari-Abhari, and Ahmed Shalaby
Subjects: Fabrication, Silicon, Computer Networks and Communications, Computer science, chemistry.chemical_element, Hardware_PERFORMANCEANDRELIABILITY, 02 engineering and technology, 01 natural sciences, Multiplexing, Reduction (complexity), Artificial Intelligence, 0103 physical sciences, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, Electronic engineering, Design methods, 010302 applied physics, business.industry, 020202 computer hardware & architecture, Power (physics), chemistry, Hardware and Architecture, Embedded system, Routing (electronic design automation), business, Software
Abstract: The use of short Through-Silicon Vias (TSVs) in 3D integration Technology introduces a significant reduction in routing area, power consumption, and delay. Although, there are still several challenges in 3D integration technology; mainly low yield, which is a direct result of extra fabrication steps of TSVs. Therefore, reducing TSV count has a considerable effect on improving yield and hence reducing cost. A TSV multiplexing technique called TSVBOX was introduced in Said et?al. (2013) to reduce the TSV count without affecting the direct benefits of TSVs. Although, the TSVBOX introduces some delay to the signals to be multiplexed, this delay effect of TSV multiplexing is not addressed yet. In this paper, we analyze the TSVBOX timing requirements and propose a design methodology for TSVBOX-based 3D Network-on-Chip (NoC). Then performance and power comparisons are conducted to investigate the direct effects of TSV multiplexing on these two metrics. After that the basic fabrication metrics are compared to investigate the effect of the proposed design methodology on yield and cost. We show that the TSVBOX extremely enhances the fabrication metrics at minimal degradation in performance and power consumption, especially for Hotspot-like traffic patterns.
Published: 2016
Full Text: View/download PDF

10. Fog Computing Realization for Big Data Analytics

Author: Farhad Mehdipour, Aniket Mahanti, Bahman Javadi, and Guillermo Ramirez-Prado
Subjects: Software, Vehicular ad hoc network, business.industry, Computer science, Software deployment, Distributed computing, Big data, Data analysis, Table (database), Cloud computing, business, Communications protocol
Abstract: This chapter provides background on big data analytics and describes how fog‐engine (FE) can be deployed in the traditional centralized data analytics platform and how it enhances existing system capabilities. The FE provides on‐premise data analytics as well as the capabilities for Internet‐of‐Things (IoT) devices to communicate with each other and with the cloud. The chapter provides an overview of a typical FE deployment. It explains the system prototype and the results of the evaluation of the proposed solution. Two case studies describing how the proposed idea works for different applications are described. A table shows the list of IoT solutions from five well‐known cloud providers. Data collection is one of the basic aspects of these solutions, which specify the communication protocols between the components of an IoT software platform. Researchers have proposed various applications of fog computing in diverse scenarios such as health monitoring, smart cities, and vehicular networks.
Published: 2019
Full Text: View/download PDF

11. A Reconfigurable Data-Path Accelerator Based on Single Flux Quantum Circuits

Author: Farhad Mehdipour, Nobuyuki Yoshikawa, Hiroshi Kataoka, Kazuaki Murakami, Naofumi Takagi, Hiroaki Honda, Akira Fujimaki, and Hiroyuki Akaike
Subjects: CMOS, Computer science, Logic gate, Rapid single flux quantum, Magnetic flux quantum, Electronic engineering, Key (cryptography), Electrical and Electronic Engineering, FLOPS, Electronic, Optical and Magnetic Materials, Power (physics), Electronic circuit
Abstract: The single flux quantum (SFQ) is expected to be a nextgeneration high-speed and low-power technology in the field of logic circuits. CMOS as the dominant technology for conventional processors cannot be replaced with SFQ technology due to the difficulty of implementing feedback loops and conditional branches using SFQ circuits. This paper investigates the applicability of a reconfigurable data-path (RDP) accelerator based on SFQ circuits. The authors introduce detailed specifications of the SFQ-RDP architecture and compare its performance and power/performance ratio with those of a graphics-processing unit (GPU). The results show at most 1600 times higher efficiency in terms of Flops/W (floating-point operations per second/Watt) for some high-performance computing application programs. key words: single flux quantum, reconfigurable data-path, accelerator
Published: 2014
Full Text: View/download PDF

12. Analysis of NTP DRDoS attacks' performance effects and mitigation techniques

Author: Ivan Pitton, Craig Young, Farhad Mehdipour, Bahman A. Sassani, and Charly Abarro
Subjects: Ping (video games), Computer science, business.industry, Network security, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Application layer DDoS attack, Denial-of-service attack, Computer security, computer.software_genre, ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS, Server, Network Time Protocol, Malware, business, Trinoo, computer, Computer network
Abstract: Denial of Service (DoS) attacks are a type of interruption (malicious and/or unintended) that restrict or completely deny services meant for legitimate users. One of the most relevant DoS attacks is Distributed Denial of Service (DDoS) attack which is a variant of DoS, but on a larger scale using previously compromised, malware infected computers known as “bots” or “zombies”. DDoS attack occurs by generating large amounts of traffic towards an intended victim. This paper focuses on analyzing a variant of DDoS attacks known as Network Time Protocol (NTP) Distributed Reflective Denial of Service (DRDoS) attack. The impact of the attack will be measured in the utilization of processor, memory, network and ping of most relevant devices. Further focus is on the host and network based layered “defense in-depth” of NTP DRDoS attack mitigation techniques.
Published: 2016
Full Text: View/download PDF

13. A design scheme for a reconfigurable accelerator implemented by single-flux quantum circuits

Author: Koji Inoue, Hiroshi Kataoka, Hiroaki Honda, Farhad Mehdipour, and Kazuaki Murakami
Subjects: Scheme (programming language), business.industry, Computer science, Process (computing), FLOPS, Data flow diagram, Software, Computer architecture, Hardware and Architecture, Magnetic flux quantum, business, computer, Computer hardware, Data-flow analysis, computer.programming_language, Electronic circuit
Abstract: A large-scale reconfigurable data-path processor (LSRDP) implemented by single-flux quantum (SFQ) circuits is introduced which is integrated to a general purpose processor to accelerate data flow graphs (DFGs) extracted from scientific applications. A number of applications are discovered and analyzed throughout the LSRDP design procedure. Various design steps and particularly the DFG mapping process are discussed and our techniques for optimizing the area of accelerator will be presented as well. Different design alternatives are examined through exploring the LSRDP design space and an appropriate architecture is determined for the accelerator. Primary experiments demonstrate capability of the designed architecture to achieve performance values up to 210 Gflops for attempted applications.
Published: 2011
Full Text: View/download PDF

14. Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

Author: Kazuaki Murakami, Hamid Noori, Farhad Mehdipour, and Koji Inoue
Subjects: Speedup, business.industry, Computer science, Extensibility, Theoretical Computer Science, Personalization, Instruction set, Software, Computer architecture, Hardware and Architecture, Embedded system, Basic block, business, Information Systems, Efficient energy use
Abstract: Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that the generated custom instructions can be mapped on the RFU (mappable custom instructions). Experimental results show that multi-exit custom instructions enhance the performance and energy efficiency by an average of 32% and 3% compared to custom instructions limited to one basic block, respectively. A maximum speedup of 4.9, compared to a single-issue embedded processor, and an average speedup of 1.9 was achieved on MiBench benchmark suite. The maximum and average energy saving are 56% and 22%, respectively. These performance and energy efficiency are obtained at the cost of 30% area overhead.
Published: 2010
Full Text: View/download PDF

15. A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor

Author: Farhad Mehdipour, Hamid Noori, Bahman Javadi, Hiroaki Honda, Koji Inoue, and Kazuaki Murakami
Subjects: Instruction set, chemistry, Computer engineering, Computer science, chemistry.chemical_element, Simulation based, Space exploration, Bismuth
Abstract: Performance evaluation is a serious challenge in designing or optimizing reconfigurable instruction set processors. The conventional approaches based on synthesis and simulations are very time consuming and need a considerable design effort. A combined analytical and simulation-based model (CAnSO.) is proposed and validated for performance evaluation of a typical reconfigurable instruction set processor. The proposed model consists of an analytical core that incorporates statistics gathered from cycle-accurate simulation to make a reasonable evaluation and provide a valuable insight. Compared to cycle-accurate simulation results, CAnSO proves almost 2% variation in the speedup measurement.
Published: 2009

16. Rapid Design Space Exploration of a Reconfigurable Instruction-Set Processor

Author: Koji Inoue, Farhad Mehdipour, Hamid Noori, and Kazuaki Murakami
Subjects: Instruction set, Signal processing, Computer architecture, Computer science, Design space exploration, Applied Mathematics, Signal Processing, Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design
Published: 2009
Full Text: View/download PDF

17. Enhancing Energy Efficiency of Processor-Based Embedded Systems through Post-Fabrication ISA Extension

Author: Hamid Noori, Farhad Mehdipour, Kazuaki Murakami, and Koji Inoue
Subjects: Flexibility (engineering), Custom Instruction, business.industry, Computer science, Register file, Energy consumption, Reconfigurable Functional Unit, Manufacturing cost, law.invention, Energy conservation, Instruction set, Microprocessor, Computer architecture, law, Embedded system, Low Energy Embedded Processor, business, Conditional Execution, Efficient energy use
Abstract: Application-specific instruction set extension is an effective technique for reducing accesses to components such as on- and off-chip memories, register file and enhancing the energy efficiency. However, the addition of custom functional units to the base processor is required for supporting custom instructions, which due to the increase of manufacturing and design costs in new nanometer-scale technologies and shorter time-to-market, is becoming an issue. To address above issues, in our proposed approach, an optimized reconfigurable functional unit is used instead, and instruction set customization is done after chip-fabrication. Therefore, while maintaining the flexibility of a conventional microprocessor, the low-energy feature of customization is applicable. Experimental results show that the maximum and average energy savings are 67% and 22%, respectively for our proposed architecture framework.
Published: 2008

18. A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions

Author: Farhad Mehdipour, Kazuaki Murakami, Hamid Noori, and Koji Inoue
Subjects: Instructions per cycle, Speedup, Computer science, business.industry, Suite, Extensibility, Electronic, Optical and Magnetic Materials, Instruction set, Embedded system, Basic block, Benchmark (computing), Electrical and Electronic Engineering, Engineering design process, business
Abstract: Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of these custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. A quantitative approach is utilized to propose an efficient architecture for the RFU and fix its constraints. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple exits custom instructions are proposed. Conditional execution has been added to the RFU to support the multi-exit feature of custom instructions. Experimental results show that multi-exit custom instructions enhance the performance by an average of 67% compared to custom instructions limited to one basic block. A maximum speedup of 4.7, compared to a general embedded processor, and an average speedup of 1.85 was achieved on MiBench benchmark suite.
Published: 2008
Full Text: View/download PDF

19. An architecture framework for an adaptive extensible processor

Author: Hamid Noori, Farhad Mehdipour, Morteza Saheb Zamani, Koji Inoue, and Kazuaki Murakami
Subjects: Profiling (computer programming), Reduced instruction set computing, Computer science, Computation, Parallel computing, Supercomputer, Theoretical Computer Science, Instruction set, Architecture framework, Computer architecture, Hardware and Architecture, Systems architecture, Software, Information Systems
Abstract: To improve the performance of embedded processors, an effective technique is collapsing critical computation subgraphs as application-specific instruction set extensions and executing them on custom functional units. The problem with this approach is the immense cost and the long times required to design a new processor for each application. As a solution to this issue, we propose an adaptive extensible processor in which custom instructions (CIs) are generated and added after chip-fabrication. To support this feature, custom functional units are replaced by a reconfigurable matrix of functional units (FUs). A systematic quantitative approach is used for determining the appropriate structure of the reconfigurable functional unit (RFU). We also introduce an integrated framework for generating mappable CIs on the RFU. Using this architecture, performance is improved by up to 1.33, with an average improvement of 1.16, compared to a 4-issue in-order RISC processor. By partitioning the configuration memory, detecting similar/subset CIs and merging small CIs, the size of the configuration memory is reduced by 40%.
Published: 2008
Full Text: View/download PDF

20. A gravity-directed temporal partitioning approach

Author: Hiroaki Honda, Farhad Mehdipour, Koji Inoue, Kazuaki Murakami, and Hamid Noori
Subjects: Data flow diagram, Computer science, business.industry, Embedded system, Control reconfiguration, Parallel computing, Electrical and Electronic Engineering, Latency (engineering), Condensed Matter Physics, business, Electronic, Optical and Magnetic Materials, Data-flow analysis
Abstract: Reconfiguration latency has a significant impact on the system performance in reconfigurable systems. A temporal partitioning approach is introduced for partitioning data flow graphs for a reconfigurable system comprising a partial programmable fine-grained hardware. Residing eligibility inspired from the Universal gravitation law is introduced to depict the eligibility of a node to stay in succeeding configurations (partitions) and to prohibit it from being swapped in/out. Partitioning based on residing eligibility causes fewer nodes with different functionalities to be assigned to subsequent partitions. Thus, reconfiguration overhead time and also unused hardware space decreases due to common parts in consecutive configurations.
Published: 2008
Full Text: View/download PDF

21. Energy-Efficient Big Data Analytics in Datacenters

Author: Bahman Javadi, Hamid Noori, and Farhad Mehdipour
Subjects: business.industry, Computer science, Big data, Volume (computing), 020206 networking & telecommunications, Cloud computing, 02 engineering and technology, Energy consumption, computer.software_genre, Data science, Virtual machine, 020204 information systems, Server, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Architecture, business, computer, Efficient energy use
Abstract: The volume of generated data increases by the rapid growth of Internet of Things, leading to the big data proliferation and more opportunities for datacenters. Highly virtualized cloud-based datacenters are currently considered for big data analytics. However, big data requires datacenters with promoted infrastructure capable of undertaking more responsibilities for handling and analyzing data. Also, as the scale of the datacenter is increasingly expanding, minimizing energy consumption and operational cost is a vital concern. Future datacenters infrastructure including interconnection network, storage, and servers should be able to handle big data applications in an energy-efficient way. In this chapter, we aim to explore different aspects of could-based datacenters for big data analytics. First, the datacenter architecture including computing and networking technologies as well as datacenters for cloud-based services will be illustrated. Then the concept of big data, cloud computing, and some of the existing cloud-based datacenter platforms including tools for big data analytics will be introduced. We later discuss the techniques for improving energy efficiency in the cloud-based datacenters for big data analytics. Finally, the current and future trends for datacenters in particular with respect to energy consumption to support big data analytics will be discussed.
Published: 2016
Full Text: View/download PDF

22. Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs

Author: Kazuaki Murakami, Hamid Noori, Farhad Mehdipour, Koji Inoue, and Morteza Saheb Zamani
Subjects: reconfigurable accelerator, conditional execution, Speedup, Computer science, business.industry, Parallel computing, Integrated circuit, control data flow graph, temporal partitioning, law.invention, reconfigurable processor, Software, Artificial Intelligence, Hardware and Architecture, law, Embedded system, Hardware acceleration, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, business, Data-flow analysis
Abstract: Extracting frequently executed (hot) portions of the application and executing their corresponding data flow graph (DFG) on the hardware accelerator brings about more speedup and energy saving for embedded systems comprising a base processor integrated with a tightly coupled accelerator. Extending DFGs to support control instructions and using Control DFGs (CDFGs) instead of DFGs results in more coverage of application code portion are being accelerated hence, more speedup and energy saving. In this paper, motivations for extending DFGs to CDFGs and handling control instructions are introduced. In addition, basic requirements for an accelerator with conditional execution support are proposed. Then, two algorithms are presented for temporal partitioning of CDFGs considering the target accelerator architectural constraints. To demonstrate effectiveness of the proposed ideas, they are applied to the accelerator of a reconfigurable processor called AMBER. Experimental results approve the remarkable effectiveness of covering control instructions and using CDFGs versus DFGs in the aspects of performance and energy reduction.
Published: 2007
Full Text: View/download PDF

23. Handling Control Data Flow Graphs for a Tightly Coupled Reconfigurable Accelerator

Author: Morteza Saheb Zamani, Koji Inoue, Hamid Noori, Kazuaki Murakami, and Farhad Mehdipour
Subjects: data flow graph (DFG), Speedup, Flow (mathematics), Computer science, Control data, Register file, Code (cryptography), Parallel computing, Base (topology), Extensibility, Data-flow analysis
Abstract: In an embedded system including a base processor integrated with a tightly coupled accelerator, extracting frequently executed portions of the code (hot portion) and executing their corresponding data flow graph (DFG) on the accelerator brings about more speedup. In this paper, we intend to present our motivations for handling control instructions in DFGs and extending them to Control DFGs (CDFGs). In addition, basic requirements for an accelerator with conditional execution support are proposed. Moreover, some algorithms are presented for temporal partitioning of CDFGs considering the target accelerator architectural specifications. To show the effectiveness of the proposed ideas, we applied them to the accelerator of an extensible processor called AMBER. Experimental results represent the effectiveness of covering control instructions and using CDFGs versus DFGs.
Published: 2007

24. Dynamic Task Priority Scaling for Thermal Management of Multi-core Processors with Heavy Workload

Author: Saadat Pour Mozafari, Hamid Noori, Ali Akbari, and Farhad Mehdipour
Subjects: Multi-core processor, Software, Computer science, business.industry, Dynamic frequency scaling, Workload, Parallel computing, Thermal management of electronic devices and systems, Load balancing (computing), business, Distributed File System, Scaling
Abstract: This paper presents a task priority scaling algorithm for dynamic thermal management of multi-core processors. The unique features of this algorithm include: 1) enabling task-level Dynamic Frequency Scaling (DFS) capability through software, 2) reducing task migration and provide load balancing using dynamic task priority scaling, 3) targeting DTM for systems with high workload. This algorithm is evaluated on a commercial quad-core processor. The experimental results indicate that the proposed approach can decrease the average and peak temperature by 9.73% and 7.1%, respectively, compared to Linux standard scheduler.
Published: 2015
Full Text: View/download PDF

25. Exploring Efficiency of Ring Oscillator-Based Temperature Sensor Networks on FPGAs (Abstract Only)

Author: Hamid Noori, Navid Rahmanikia, Amirali Amiri, and Farhad Mehdipour
Subjects: business.industry, Computer science, Hardware_PERFORMANCEANDRELIABILITY, Ring oscillator, Soft sensor, Bottleneck, Ring counter, Embedded system, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, Electronics, Sensitivity (control systems), Field-programmable gate array, business, Wireless sensor network
Abstract: Due to technology advances and complexity of designs, thermal issue is a bottleneck in electronics designs. Various dynamic thermal management techniques have been proposed to address this issue. To effectively apply thermal management techniques, providing an accurate thermal map of chips is highly required. For this goal, a network of temperature sensors ought to be provided. There are various implementations for temperature sensors and network of sensors on Field Programmable Gate Arrays (FPGAs). This work defines and formulates four metrics and criteria, in terms of area, thermal, and power overheads and thermal map accuracy for exploring and evaluating efficiency of different implementations of Ring Oscillator-based Temperature Sensor (ROTS) networks on FPGAs and reports the comparison results for 12 networks with various sensor configurations. According to our metrics and experiments, the sensor that it is composed of NOT gates with open latches and RNS ring counter has lower thermal and power overheads compared to other configurations. Moreover, in this work, a new ROTS is presented that occupies 25% less resources than the most compact temperature sensor. Also, it provides 1.72 times higher sensitivity than the best sensitive ROTS design.
Published: 2015
Full Text: View/download PDF

26. An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit

Author: Kazuaki Murakami, Koji Inoue, Mehdi Sedighi, Farhad Mehdipour, Hamid Noori, and Morteza Saheb Zamani
Subjects: Instruction set, Speedup, Computer architecture, Computer science, Control reconfiguration, Parallel computing, Critical path method, Extensibility, Partition (database), Data-flow analysis
Abstract: Extensible processors allow customization for an application by extending the core instruction set architecture. Extracting appropriate custom instructions is an important phase for implementing an application on an extensible processor with a reconfigurable functional unit. Custom instructions (CIs) usually are extracted from critical portions of applications. This paper presents approaches for CI generation with respect to the RFU constraints to improve speedup of the extensible processor. First, our proposed RFU architecture for an adaptive dynamic extensible processor called AMBER is described. Then, an integrated temporal partitioning and mapping framework is presented to partition and map the CIs on the RFU. In this framework, a mapping aware temporal partitioning algorithm is used to generate CIs which are mappable on the RFU. Temporal partitioning iterates and modifies partitions incrementally to generate CIs. In addition, a mapping algorithm is presented which supports CIs with critical path length more than the RFU depth.
Published: 2006

27. An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems

Author: Mehdi Sedighi, Morteza Saheb Zamani, and Farhad Mehdipour
Subjects: Iterative design, Computer Networks and Communications, Computer science, Control reconfiguration, computer.software_genre, Reconfigurable computing, Scheduling (computing), Computer architecture, Artificial Intelligence, Hardware and Architecture, Compiler, Physical design, computer, Software
Abstract: Lack of appropriate compilers for generating configurations and their scheduling is one of the main challenges in the development of reconfigurable computing systems. In this paper, a new iterative design flow for reconfigurable computing systems is proposed that integrates the synthesis and physical design phases to perform a static compilation process. We propose a new temporal partitioning algorithm for partitioning and scheduling, which attempts to decrease the time of reconfiguration on a partially reconfigurable hardware. In addition, we perform an incremental physical design process based on similar configurations produced in the partitioning stage. To validate the effectiveness of our methodology and algorithms, we developed a framework according to the proposed methodology.
Published: 2006
Full Text: View/download PDF

28. Fast center search algorithm with hardware implementation for motion estimation in HEVC encoder

Author: Farhad Mehdipour, Mohammed S. Sayed, Ahmed Medhat, Ahmed Shalaby, and Maha Elsabrouty
Subjects: Computer science, business.industry, Search algorithm, Integer motion estimation, Motion estimation, Real-time computing, business, Field-programmable gate array, Frame rate, Encoder, Computer hardware, Quarter-pixel motion, Coding (social sciences)
Abstract: This paper presents a Fast Center Search Algorithm (FCSA) and its hardware implementation design of integer Motion Estimation for High Efficiency Video Coding (HEVC). FCSA achieves average time saving ratio up to 40% for HD video sequences with respect to full search, with insignificant loss in terms of PSNR performance and bit rate. The proposed hardware implementation shows that it meets the requirements of 30 4K frame per second with ±16 search window at 550 MHz. The prototyped architecture utilizes 8% of the LUTs and 4% of the slice registers in Xilinx Virtex-6 XC6VLX-550T FPGA
Published: 2014
Full Text: View/download PDF

29. A highly parallel SAD architecture for motion estimation in HEVC encoder

Author: Ahmed Medhat, Ahmed Shalaby, Mohammed S. Sayed, Farhad Mehdipour, and Maha Elsabrouty
Subjects: 2K resolution, business.industry, Computer science, Motion estimation, Embedded system, Clock rate, business, Field-programmable gate array, Encoder, Transform coding, Computer hardware, Block (data storage), Quarter-pixel motion
Abstract: The high computational cost of the motion estimation module in the new HEVC standard raises the need for efficient hardware architectures that can meet the real-time processing constraint. In addition, targeting HD and UHD resolutions increases the motion estimation processing cost beyond the capabilities of the currently existing architectures. This paper presents a highly parallel sum of absolute difference (SAD) architecture for motion estimation in HEVC encoder. The proposed architecture has 64 PUs operating in parallel to calculate the SAD values of the prediction blocks. It processes block sizes from 4×4 up to 64×64. The proposed architecture has been prototyped, simulated and synthesized on Xilinx Virtix-7 XC7VX550T FPGA. At 458 MHz clock frequency, the proposed architecture processes 30 2K resolution fps with ±20 pixels search range. The prototyped architecture utilizes 7% of the LUTs and 5% of the slice registers in Xilinx Virtex-7 XC7VX550T FPGA.
Published: 2014
Full Text: View/download PDF

30. A neuro-fuzzy fan speed controller for dynamic thermal management of multi-core processors

Author: Farhad Mehdipour, Javad Mohebbi Najm Abad, Hamid Noori, Ali Soleimani, and Bagher Salami
Subjects: Electronic speed control, Control theory, Computer science, Thermal resistance, Active cooling, Overhead (computing), Hardware_PERFORMANCEANDRELIABILITY, Heat sink, Efficient energy use, Power (physics)
Abstract: Cooling equipments is a thermal management technique that reduces the thermal resistance of the heat sink without any performance degradation. However, higher fan speed produces a lower thermal resistance, but at the expense of higher power consumption. Our proposed Neuro-Fuzzy fan controller (NFSC), minimizes fan power consumption while avoiding the temperature increase above a certain threshold. The experimental results indicate that our proposed model can significantly decrease the average fan power with negligible temperature overhead compared to the traditional fan controller.
Published: 2014
Full Text: View/download PDF

31. Keep-Out-Zone analysis for three-dimensional ICs

Author: Nobuaki Miyakawa, Mohamed El-Sayed, Mostafa Said, and Farhad Mehdipour
Subjects: Fabrication, Yield (engineering), Accurate estimation, Computer science, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, Overhead (computing), Automotive engineering
Abstract: One of main challenges of 3D-integration is the area overhead which has two main causes: first the huge TSV diameter which is usually in the range of microns, and the second reason is the Keep-Out-Zone (KOZ) overhead due to the high induced thermal stresses during fabrication. The area overhead besides the fabrication process itself inversely affects the overall yield and fabrication cost, so the increase in area will reduce the yield and increase the fabrication cost. In this paper, the effect of KOZ overhead on the overall area, yield, and fabrication cost is investigated. Also various parameters that might change KOZ overhead are examined. We show that the share of area overhead caused by KOZ is considerably higher compared to that of TSVs. Further, the impact of KOZ is considered for obtaining more accurate estimation on W2W overall yield and fabrication cost of a 3D-IC.
Published: 2014
Full Text: View/download PDF

32. Special Issue on Emerging Many-Core Systems for Exascale Computing

Author: Farhad Mehdipour, Hannu Tenhunen, Masoud Daneshtalab, and Zhiyi Yu
Subjects: Many core, Computer engineering, Hardware and Architecture, Computer science, Systems engineering, Electrical and Electronic Engineering, Software, Exascale computing
Published: 2015
Full Text: View/download PDF

33. Physical-aware task migration algorithm for dynamic thermal management of SMT multi-core processors

Author: Mohammadreza Baharani, Farhad Mehdipour, Bagher Salami, and Hamid Noori
Subjects: Imagination, Multi-core processor, Search engine, Computer science, business.industry, Embedded system, media_common.quotation_subject, Parallel computing, Thermal management of electronic devices and systems, business, Algorithm, Scheduling (computing), media_common
Abstract: This paper presents a task migration algorithm for dynamic thermal management of Simultaneous Multi-Threading (SMT) multi-core processors. The unique features of this algorithm include: 1) considering SMT capability of processors for dynamic thermal management via task scheduling, 2) using adaptive task migration threshold, and 3) considering cores physical features. This algorithm is evaluated on a commercial SMT quad-core processor. The experimental results indicate that our technique can significantly decrease the average and peak temperature compared to Linux standard scheduler, and two well-known thermal management techniques.
Published: 2014
Full Text: View/download PDF

34. Improving Performance and Fabrication Metrics of Three-Dimensional ICs by Multiplexing Through-Silicon Vias

Author: Mohamed El-Sayed, Mostafa Said, and Farhad Mehdipour
Subjects: Yield (engineering), business.product_category, Fabrication, business.industry, Computer science, Overhead (engineering), Stacking, Multiplexing, Reduction (complexity), Low-power electronics, Embedded system, Electronic engineering, Die (manufacturing), business
Abstract: Three-dimensional (3D) integration using through-silicon vias (TSVs) offers advantages over traditional 2D integration, however there are still several challenges originated from stacking dies. The main challenges in 3D-ICs are the large area overhead of the TSVs, low yield due to stacking several dies, and the increased cost of fabrication. In this paper a TSV multiplexing technique using so called TSV-BOX is proposed, which substitutes two TSVs with one TSV plus some extra hardware, but totally resulting in smaller die area. However, it does not impact the performance of the circuit. The TSV-BOX increases the total yield and reduces power consumption and fabrication cost due to the reduced TSVs count. For a 100 mm2 die with 2x 105 TSV count and TSV diameter of 8 m, the TSV-BOX could achieve 10% reduction in area, a 78% reduction in cost, and finally the yield could be enhanced by 24 times the original yield.
Published: 2013
Full Text: View/download PDF

35. A Smart Cyber-physical Systems-Based Solution for Pest Control (Work in Progress)

Author: Farhad Mehdipour, Krishna Chaitanya Nunna, and Kazuaki Murakami
Subjects: Risk analysis (engineering), business.industry, Multidisciplinary approach, Computer science, Pest control, Cyber-physical system, Damages, Work in process, business, Wireless sensor network, Human being
Abstract: Rodents as widespread pests cause significant damages to the crops, stored foods, human being and properties. For rodent's control there has been no systematic high-tech solution so far. We aim to raise this problem as a new challenging and multidisciplinary research area. We will propose our solution based on cyber-physical systems (CPSs) and will elucidate enabling technologies and frontiers for this research.
Published: 2013
Full Text: View/download PDF

36. Totally self-checking (TSC) VLSI circuits using Scalable Error Detection Coding (SEDC) technique

Author: Natarajan Somasundaram, N. Ramadass, Y. V. Ramana Rao, Lee Jeong A, and Farhad Mehdipour
Subjects: Very-large-scale integration, law, Computer science, Logic gate, Electronic engineering, Overhead (computing), Fault tolerance, Hardware_PERFORMANCEANDRELIABILITY, Integrated circuit, Transient (oscillation), Latency (engineering), Error detection and correction, law.invention
Abstract: Integrated circuits fabricated in deep sub-micron technology are vulnerable to intermittent or transient faults which are the predominant cause of system failures. With continued scaling, operating voltage levels have reduced and resultant decrease in noise margins, the possibility of transient faults is likely to increase. Also, during operation in adverse environments, transient faults occur upon exposure to ionizing radiations and neutron effects. These faults manifest themselves as unidirectional errors. The ability to operate in the intended manner even in the presence of faults is an important objective of all electronic systems. Totally Self-checking (TSC) circuits permit online detection of hardware faults. The Scalable Error Detection Coding (SEDC) technique used to design self-checking circuits with faster execution and lesser latency overhead for use in fault-tolerant VLSI circuits is presented. SEDC technique is formulated and architecture is designed in such a way that for any input binary data length, only area is scaled, with a constant latency of two logic gates and requires only a single clock cycle for generating SEDC code. It is shown that the proposed SEDC technique is found to be significantly efficient than the existing unidirectional error detection techniques in terms of speed, latency, area and achieving 100% error detection.
Published: 2013
Full Text: View/download PDF

37. A Three-Dimensional Integrated Accelerator

Author: Kazuaki Murakami, Koji Inoue, Krishna Chaitanya Nunna, and Farhad Mehdipour
Subjects: Footprint (electronics), Data flow diagram, Stack (abstract data type), Computer science, Key (cryptography), Graph theory, Parallel computing, Routing (electronic design automation), Data mapping, Data-flow analysis
Abstract: We propose a three-dimensional (3D) reconfigurable data-path accelerator which is capable of running partitioned large data flow graphs (DFGs) on the layers of 3D stack, while inter-layer connections are implemented by means of through-silicon vias (TSVs). A tool for mapping data flow graphs has been developed, and a key 3D-specific problem namely routing nets on 3D architecture has been discussed in details as well. Conducted experiments demonstrate smaller footprint area and higher performance for the 3D accelerator comparing with 2D counterpart.
Published: 2012
Full Text: View/download PDF

38. A thermal-aware mapping algorithm for reducing peak temperature of an accelerator deployed in a 3D stack

Author: Kazuaki Murakami, Koji Inoue, Lovic Gauthier, Farhad Mehdipour, and Krishna Chaitanya Nunna
Subjects: Imagination, Chemical substance, Computer science, media_common.quotation_subject, Base (geometry), Integrated circuit, Computational science, law.invention, Data flow diagram, Reduction (complexity), Search engine, Stack (abstract data type), law, Electronic engineering, media_common
Abstract: Thermal management is one of the main concerns in three-dimensional integration due to difficulty of dissipating heat through the stack of the integrated circuit. In a 3D stack involving a data-path accelerator, a base processor and memory components, peak temperature reduction is targeted in this paper. A mapping algorithm has been devised in order to distribute operations of data flow graphs evenly over the processing elements of the target accelerator in two steps involving thermal-aware partitioning of input data flow graphs, and thermal-aware mapping of the partitions onto the processing elements. The efficiency of the proposed technique in reducing peak temperature is demonstrated throughout the experiments.
Published: 2012
Full Text: View/download PDF

39. Hardware and software requirements for implementing a high-performance superconductivity circuits-based accelerator

Author: Koji Inoue, Hiroaki Honda, Kazuaki Murakami, and Farhad Mehdipour
Subjects: business.industry, Computer science, Reconfigurable processors, 再構成可能アクセラレーター, placement and routing, single-flux quantum circuits, 配置ならびにルーティング, 単一磁束量子回路, データフローグラフ, Reconfigurable computing, data flow graph, Data flow diagram, Computer Science::Hardware Architecture, Systems analysis, Formal specification, Algorithm design, Software requirements, Routing (electronic design automation), business, Computer hardware, Data-flow analysis
Abstract: Single-Flux Quantum based large-scale data-path processor (SFQ-LSRDP) is a reconfigurable computing system which is implemented by means of superconductivity circuits. SFQ-LSRDP has a capability of accelerating data flow graphs (DFGs) extracted from scientific applications. Using an alternative technology instead of CMOS circuits for implementing such hardware entails considering particular constraints and conditions from the architecture and tools development perspectives. In this paper, we will introduce hardware specifications of the LSRDP and the tool chain developed for implementing applications. Placing and routing data flow graphs is a fundamental part to develop applications on the SFQ-LSRDP. Algorithms for placing DFG operations and routing nets corresponding to the edges of data flow graphs will be discussed in more details. These algorithms have been applied on a number of data flow graphs and the results demonstrate their efficiency. Further, simulation results demonstrates remarkable performance numbers in the range of hundreds of Gflops for the proposed architecture.
Published: 2011
Full Text: View/download PDF

40. Routing architecture and algorithms for a superconductivity circuits-based computing hardware

Author: Hiroshi Kataoka, Farhad Mehdipour, Koji Inoue, Kazuaki Murakami, and Hiroaki Honda
Subjects: Routing protocol, Static routing, Computer science, Equal-cost multi-path routing, business.industry, Routing table, Policy-based routing, Enhanced Interior Gateway Routing Protocol, Computer Science::Hardware Architecture, Routing domain, Link-state routing protocol, Multipath routing, business, Algorithm, Computer hardware, Triangular routing
Abstract: Dedicated tools for placing and routing data flow graphs extracted from computation-intensive applications are basic requirements for developing applications on a large-scale reconfigurable data-path processor (LSRDP) implemented by superconductivity circuits. Using an alternative technology instead of CMOS circuits for implementing such hardware entails considering particular constraints and conditions from the architecture and tools development perspectives. The main contribution of this work is to introduce an operand routing network (ORN) architecture as well as algorithms for routing the nets corresponding to the edges of the data flow graphs. Further, a micro-routing algorithm is proposed for routing and configuring the ORNs internally. These algorithms have been applied on a number of data flow graphs from target applications and the results demonstrate their efficacy.
Published: 2011
Full Text: View/download PDF

41. ALU-array based reconfigurable accelerator for energy efficient executions

Author: Kazuaki Murakami, Koji Inoue, Takaaki Hanada, Hamid Noori, and Farhad Mehdipour
Subjects: Computer science, business.industry, Branch predictor, Base (topology), Reconfigurable computing, law.invention, Microprocessor, Acceleration, law, Embedded system, Component (UML), Cache, Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING, business, Efficient energy use
Abstract: This paper introduces an energy efficient acceleration technique for embedded microprocessors. By means of supporting an ALU-array based coarse-grain reconfigurable functional unit, well customized special instructions are identified and executed for each application program. Since the reconfigurable functional unit can execute several dependent instructions (a sequence of instructions), simultaneously, the performance of the base microprocessor can dramatically be improved. In addition, this kind of direct execution is very energy efficient because it reduces activation counts of hardware components such as instruction cache, branch predictor, register-file accesses, and so on.
Published: 2009
Full Text: View/download PDF

42. An efficient Heterogeneous Reconfigurable Functional Unit for an Adaptive Dynamic Extensible Processor

Author: Hossein Pedram, Morteza Saheb Zamani, Farhad Mehdipour, Behnam Ghavami, and Arash Mehdizadeh
Subjects: Very-large-scale integration, Flexibility (engineering), Custom Instruction, Computer science, business.industry, Computation, Execution time, Extensibility, Custom instruction, Reconfigurable Functional Unit, Software, Computer architecture, Extensible Processor, Architecture, business
Abstract: Replacing functional units of an extensible processor with reconfigurable functional units enhances performance and flexibility of processors to execute custom instructions. That is due to the ability of reconfigurable functional units to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this paper, we develop a heterogeneous architecture for the reconfigurable functional unit of an extensible processor. To verify the efficiency of our architecture, we applied it to 8 applications of Mibench. Our experiments show that compared to the similar architectures, ours supports a wide range of custom instructions. In addition, use of the new architecture improves execution time of custom instructions by 20% to 30% on average. Moreover, compared with the previous architecture, area is reduced by 15%.
Published: 2007

43. Generating and Executing Multi-Exit Custom Instructions for an Adaptive Extensible Processor

Author: Farhad Mehdipour, Maziar Goudarzi, Hamid Noori, Kazuaki Murakami, and Koji Inoue
Subjects: Instruction set, Speedup, Application-specific integrated circuit, Reduced instruction set computing, business.industry, Computer science, Computation, Embedded system, Basic block, Benchmark (computing), Parallel computing, business
Abstract: To improve the performance of embedded processors, an effective technique is collapsing critical computation subgraphs as application-specific instruction set extensions and executing them on custom functional units. The problems of this approach are immense cost and long time of designing. To address these issues, an adaptive extensible processor was proposed in which custom instructions (CIs) are generated and added after chip-fabrication. To support this feature, custom functional units are replaced by a reconfigurable matrix of functional units with the capability of conditional execution. Unlike previous proposed CIs, it can include multiple exits. Experimental results show that multi-exit CIs enhance the performance by 46% in average compared to CIs limited to one basic block. A maximum speedup of 2.89 compared to a 4-issue in-order RISC processor, and a speedup of 1.66 in average, was achieved on MiBench benchmark suite
Published: 2007
Full Text: View/download PDF

44. A reconfigurable architecture for implementing multiple cipher algorithms

Author: B. Sadeghian, M. Saheb Zamani, A. Valizadeh, and Farhad Mehdipour
Subjects: Flexibility (engineering), Logic synthesis, Computer architecture, Cipher, Computer science, business.industry, Cryptography, PipeRench, business, Field-programmable gate array, Algorithm, Implementation, Reconfigurable computing
Abstract: Reconfigurable computing has grown to become an important and large field of research. It offers advantages over traditional hardware and software implementation of computational algorithms. In this paper, we implemented multiple cryptographic algorithms, namely DES, LOKI, DESX, Biham-DES, and S/sup n/DES on a reconfigurable hardware. Our implementation results in a high flexibility and similar ciphering rate in compare with the previous implementations reported in FPGAs.
Published: 2006
Full Text: View/download PDF

45. Custom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit

Author: Kazuaki Murakami, Mehdi Sedighi, Morteza Saheb Zamani, Koji Inoue, Farhad Mehdipour, and Hamid Noori
Subjects: Speedup, Computer architecture, Computer science, Critical path method, Custom instruction, Partition (database), Data-flow analysis
Abstract: Extracting appropriate custom instructions is an important phase for implementing an application on an extensible processor with a reconfigurable functional unit (RFU). Custom instructions (CIs) are usually extracted from critical portions of applications. It may not be possible to meet all of the RFU constraints when CIs are generated. This paper addresses the generation of mappable CIs on an RFU. In this paper, our proposed RFU architecture for an adaptive dynamic extensible processor is described. Then, an integrated framework for temporal partitioning and mapping is presented to partition and map the CIs on RFU. In this framework, two mapping aware temporal partitioning algorithms are used to generate CIs. Temporal partitioning iterates and modifies partitions incrementally to generate CIs. Using this framework brings about more speedup for the extensible processor.
Published: 2006
Full Text: View/download PDF

46. A high performance reconfigurable implementation of DES-like algorithms

Author: B. Najafi, M. Saheb Zamani, Farhad Mehdipour, A. Valizadeh, and B. Sadeghian
Subjects: Programmable logic device, Flexibility (engineering), Software, Computer architecture, Computer science, business.industry, Control reconfiguration, Overhead (computing), Encryption, business, Field-programmable gate array, Algorithm, Reconfigurable computing
Abstract: Reconfigurable computing has grown to become an important and large field of research. It offers advantages over traditional hardware and software implementation of computational algorithms. It is based on using field programmable gate arrays (FPGAs), which can be configured after fabrication to take advantage of a hardware design but still maintain the flexibility of software. Particular applications, including encryption, which involve repetitive computation, and have inherent parallelism, are specifically well suited to the use of FPGAs. In this paper, we implemented four DES-like algorithms namely DES, DESX, Biham-DES, and S/sup n/ DES on a reconfigurable hardware so that each algorithm could be replaced by another with low reconfiguration overhead time. This kind of implementation not only has high flexibility but also has an acceptable encryption rate compared with the fastest implementation of DES.
Published: 2005
Full Text: View/download PDF

47. Implementation of cellular learning automata on reconfigurable computing systems

Author: Mohammad Reza Meybodi, Farhad Mehdipour, and Morteza Saheb Zamani
Subjects: Speedup, Learning automata, Computer science, Control reconfiguration, Parallel computing, Application software, computer.software_genre, Cellular automaton, Reconfigurable computing, Automaton, Computer architecture, Hardware_ARITHMETICANDLOGICSTRUCTURES, Field-programmable gate array, computer
Abstract: Reconfigurable computing systems (RCS) use the flexibility of programmable devices and the speed of hardware to implement high performance systems. Implementation of RCS is normally made by means of programmable devices, such as FPGAs. On the other hand, recently, cellular learning automata (CLA) have been proposed as a combination of conventional cellular automaton and learning automaton. Software simulation of CLA has shown it to be successful for solving some hard problems. However, the process on conventional computers is slow. To overcome this problem, we implemented CLA in hardware. In addition, for some applications which necessitate run time changes for parameters, the ability of run-time reconfiguration (RTR) in hardware is a solution. In this paper, the design and implementation of CLA on a reconfigurable system are presented. Experimental results show considerable speedup gain of RCS version over the software version. Independence on CLA dimensions is another benefit of reconfigurable hardware implementation of CLA. In other words, by increasing the dimensions of CLA, the time needed for running reconfigurable CLA implemented on hardware remains constant.
Published: 2004
Full Text: View/download PDF

48. A survey on big data processing infrastructure: evolving role of FPGA

Author: Krishna Chaitanya Nunna, Kazuaki Murakami, Farhad Mehdipour, and Antoine Trouve
Subjects: Big data processing, Statement (computer science), Data processing, Resource (project management), Computer science, Real-time computing, Field-programmable gate array, Data science, Variety (cybernetics)
Abstract: In today's commercial world, information is becoming a major economic resource thus leading to a statement - information is wealth. It is a technical challenge for computer systems in managing and analysing the large volumes of data coming from a variety of resources continuously over a period. Experts are in a mood of moving towards alternative hardware platforms for achieving high-speed data processing and analysis especially for streaming applications. In this paper: a) existing trends in big data processing and the necessary systems involved are studied by performing a survey on available platforms; b) recommended features and suitable hardware systems are proposed based on the operations involved in the processing. Investigation shows that, in combination with CPU and along with GPU, FPGA is a possible alternative. It can be a part of the heterogeneous platform featuring parallelism, pipelining and high performance for the operations involved in big data processing.
Published: 2015
Full Text: View/download PDF

49. Performance Evaluation of a Reconfigurable Instruction Set Processor

Author: Koji Inoue, Farhad Mehdipour, Hamid Noori, Kazuaki Murakami, and Hiroaki Honda
Subjects: Speedup, A Combined Analytical and Simulation-Based Model, Computer science, business.industry, Application-specific instruction-set processor, Probability density function, analytical modeling, Space exploration, performance evaluation, RAC, Instruction set, Core (game theory), Software, Computer architecture, Calibration, Reconfigurable instruction set processors, business, CAnSO
Abstract: Performance evaluation is a serious challenge in designing or optimizing reconfigurable instruction set processors. A combined analytical and simulation-based model (CAnSO*) is proposed and validated for performance evaluation of a typical reconfigurable instruction set processor. The proposed model consists of an analytical core that incorporates statistics gathered from cycle-accurate simulation to make a reasonable evaluation. CAnSO has clear speed advantages and compared to cycle-accurate simulation, it proves almost 2% variation in the speedup measurement.
Published: 2008

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

49 results on '"Farhad Mehdipour"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources