Author: "Seungjin Lee" / Topic: computer science - Searchworks@Jio Institute Digital Library Search Results

1. Analysis of Efficiency for the Selection of Benchmarking Model to Prepare When a Korean Nuclear-powered Submarine Will be Acquired: Focusing on a Directional Distance Function Model

Author: Youngjae Choi, Miae Jeong, Seungjin Lee, and Jung Hwan Lee
Subjects: Computer science, Submarine, Data mining, Benchmarking, computer.software_genre, computer, Selection (genetic algorithm)
Published: 2021

2. Sensor-Based Deviant Behavior Detection System Using Deep Learning to Help Dementia Caregivers

Author: Kookjin Kim, Jaekeun Kim, Dongkyoo Shin, Sungjoong Kim, Seungjin Lee, and Dongil Shin
Subjects: General Computer Science, Computer science, Population, 02 engineering and technology, Machine learning, computer.software_genre, deep-learning, Data modeling, mental disorders, 0202 electrical engineering, electronic engineering, information engineering, medicine, Dementia, General Materials Science, Hidden Markov model, education, autoencoder, education.field_of_study, business.industry, Deep learning, General Engineering, Deviant detection, 020206 networking & telecommunications, medicine.disease, Autoencoder, long short-term memory models, Outlier, Unsupervised learning, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, computer
Abstract: The number of elderly people suffering from dementia, a senile disease, is increasing day by day due to the rapid aging of the population. As a result, social and economic costs are also gradually increasing. To prevent such monetary losses, a system that can operate at a low cost is needed to care for dementia patients. Therefore, this research proposes a sensor-based deviant behavior detection system that allows caregivers to easily manage dementia patients even if they are not in the same location as their dementia patients at a low cost. In this research, the autoencoder and the LSTM models were used together, because deviance behavior is difficult to obtain labeled data. The autoencoder model is a representative unsupervised learning model, which can be used to extract characteristics of data, and was used to learn characteristics of normal behavioral data. The LSTM model is used to determine the deviant behavior from output outlier data that exceeds the threshold in the autoencoder. As a result of the experiment, each model achieved more than 96% and more than 99% accuracy. This research is expected to help caregivers of dementia patients manage the elderly with dementia more inexpensively and efficiently.
Published: 2020

3. Direct Position Determination Method with Improved Accuracy for Estimating Static Source Position

Author: Seungjin Lee, Jae-Hoon Lee, Jong-In Song, Wonzoo Chung, and Jaehyuk Lim
Subjects: Computer science, Position (vector), business.industry, FDOA, Computer vision, Artificial intelligence, business, Multilateration
Published: 2018

4. Influence on Teachers’ Instructional Improvement Effort Inside the School Using Hierarchical Linear Modeling

Author: seungjin lee and Yu Nan Sook
Subjects: Computer science, Multilevel model, Industrial engineering
Published: 2017

5. An Optical Wireless Temperature Sensor

Author: Seungjin Lee, Xiaozhe Fan, and Walter D. Leon-Salas
Subjects: business.industry, Computer science, 020208 electrical & electronic engineering, Optical communication, Electrical engineering, 02 engineering and technology, Temperature measurement, law.invention, Transmission (telecommunications), law, Solar cell, Computer Science::Networking and Internet Architecture, 0202 electrical engineering, electronic engineering, information engineering, Optical wireless, Wireless, ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS, business, Energy harvesting
Abstract: This paper presents a wireless temperature sensor that uses a GaAs solar cell as a wireless transmitter of information. Transmission of information with a solar cell is possible by modulating the luminescent radiation emitted by the solar cell. This technique, dubbed Optical Frequency Identification or OFID, was recently reported in the literature and in this work is used to transmit temperature measurements wirelessly. The hardware design of an OFID temperature sensor tag and its corresponding reader is described. A prototype of the proposed sensor was built as a proof of concept. Experimental results demonstrate wireless data transmission at a distance of 1 m distance and at a bit rate of 1200 bps. The wireless temperature sensor has a maximum error of 0.39◦C (after calibration) with respect to a high-precision temperature meter.
Published: 2019

6. Sensor-based Abnormal Behavior Detection Using Autoencoder

Author: Dongil Shin, Dongkyoo Shin, and Seungjin Lee
Subjects: education.field_of_study, business.industry, Computer science, Deep learning, 010401 analytical chemistry, Population, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, 01 natural sciences, Autoencoder, 0104 chemical sciences, Support vector machine, Economic cost, 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, Artificial intelligence, Abnormality, education, business, Hidden Markov model, computer
Abstract: The population of elderly people is increasing, with the development of an aging society all over the world. As a result, the number of people who need to take care of themselves, such as elderly people living alone or suffering from dementia, is also increasing. Caring for these people requires not only social burdens but also economic costs. A system that manages their behavior is essential to reduce the cost of caring for them. In this study, we propose an abnormal behavior detection model using smart home sensor data to manage elderly people living alone and people with dementia. Previous studies have used probability models such as a hidden Markov model (HMM) or support vector machine (SVM) model. However, the HMM requires a process to estimate values such as the initial probability, or to define states. It is also possible to detect behavior using a classification model such as an SVM, but in this study, we used an autoencoder, which is a representative unsupervised learning model, to obtain a pattern from the behavior data. The autoencoder model can detect abnormal behavior by extracting the characteristics of the normal behavior data. The models used in this study were trained and tested with normal behavior data, showing an accuracy of more than 99%. For abnormal behavior data, a loss of about 10-30% was observed. This model is expected to assist in effectively managing elderly or demented patients and reduce the cost of caring for them.
Published: 2019

7. Classification of botnet attacks in IoT smart factory using honeypot combined with machine learning

Author: Seungjin Lee, Azween Abdullah, S. H. Kok, and Noor Zaman
Subjects: IoT, Service (systems architecture), Honeypot, General Computer Science, Computer Networks and Communications, Network security, Computer science, Data Mining and Machine Learning, Botnet, Denial-of-service attack, 02 engineering and technology, Machine learning, computer.software_genre, Process management (computing), Scientific Computing and Simulation, Resource (project management), 0202 electrical engineering, electronic engineering, information engineering, business.industry, Smart factory, Real-Time and Embedded Systems, Security and Privacy, 020206 networking & telecommunications, Botnets detection, Automation, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: The Industrial Revolution 4.0 began with the breakthrough technological advances in 5G, and artificial intelligence has innovatively transformed the manufacturing industry from digitalization and automation to the new era of smart factories. A smart factory can do not only more than just produce products in a digital and automatic system, but also is able to optimize the production on its own by integrating production with process management, service distribution, and customized product requirement. A big challenge to the smart factory is to ensure that its network security can counteract with any cyber attacks such as botnet and Distributed Denial of Service, They are recognized to cause serious interruption in production, and consequently economic losses for company producers. Among many security solutions, botnet detection using honeypot has shown to be effective in some investigation studies. It is a method of detecting botnet attackers by intentionally creating a resource within the network with the purpose of closely monitoring and acquiring botnet attacking behaviors. For the first time, a proposed model of botnet detection was experimented by combing honeypot with machine learning to classify botnet attacks. A mimicking smart factory environment was created on IoT device hardware configuration. Experimental results showed that the model performance gave a high accuracy of above 96%, with very fast time taken of just 0.1 ms and false positive rate at 0.24127 using random forest algorithm with Weka machine learning program. Hence, the honeypot combined machine learning model in this study was proved to be highly feasible to apply in the security network of smart factory to detect botnet attacks.
Published: 2021

8. Analysis of Factors Affecting Achievement in Maker Programming Education in the Age of Wireless Communication

Author: Seungjin Lee, Ja Mee Kim, and Won Gyu Lee
Subjects: Root (linguistics), Class (computer programming), Programming education, Computer science, 05 social sciences, 050301 education, 020206 networking & telecommunications, 02 engineering and technology, Academic achievement, Computer Science Applications, Information and Communications Technology, Informatics, ComputingMilieux_COMPUTERSANDEDUCATION, 0202 electrical engineering, electronic engineering, information engineering, Mathematics education, Electrical and Electronic Engineering, 0503 education, Curriculum
Abstract: Since 2010 countries around the world have been emphasizing the importance of informatics education based on computer science rather than ICT use. Korea devised an informatics curriculum that emphasizes SW education as part of its revised curriculum of 2015. The purpose of this study is to examine the factors affecting SW education achievement before implementation of the curriculum so that the revised curriculum could be efficiently established in schools. SW education was provided for 4221 elementary school students, and the factors affecting academic achievement were extracted. The results of the study showed that the achievement level of female students was higher than that of male students, and the level of understanding increased in higher grades. It was found that satisfaction with overall SW education influenced academic achievement. That is, the more satisfied students were with the key factors, such as the infrastructure required for providing SW education and interaction among teachers and students during class time, the higher the level of satisfaction. This study intended to find the key factors necessary for helping SW education take root in schools.
Published: 2016

9. Optimization and characterization of a wideband multimode Tonpilz transducer for underwater acoustical arrays

Author: Yongrae Roh, Seungjin Lee, Muhammad Shakeel Afzal, Hong-Woo Yoon, and Youngsub Lim
Subjects: 010302 applied physics, Multi-mode optical fiber, Computer science, Acoustics, Bandwidth (signal processing), Metals and Alloys, 02 engineering and technology, 021001 nanoscience & nanotechnology, Condensed Matter Physics, 01 natural sciences, Finite element method, Surfaces, Coatings and Films, Electronic, Optical and Magnetic Materials, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Transducer, 0103 physical sciences, ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS, Voltage response, Electrical and Electronic Engineering, Underwater, Wideband, 0210 nano-technology, Instrumentation
Abstract: A multimode underwater acoustical transducer is well-known for providing wider bandwidth than a conventional single-mode transducer. However, the design of a low-frequency multimode transducer to achieve the desired wideband characteristics for an underwater array is challenging when considering the array’s size and weight limitations. Therefore, this study focused on the development of a multimode transducer structure for superior wideband characteristics. The effect of various structural parameters on the performance of the multimode Tonpilz transducer was first analyzed with emphasis on its bandwidth using the finite element method (FEM). Then, the structure of the transducer was improved by analyzing the effect of the side acoustic window and incorporating realistic design considerations. Finally, the improved transducer structure was optimized to have the widest possible bandwidth while maintaining its transmitting voltage response (TVR) level over a typical power requirement. The final design was validated by fabricating a prototype transducer and evaluating its acoustical performance.
Published: 2020

10. Low-Energy Consumption Write Circuit using Comparing Operation in STT-MRAM

Author: Taegun Yim, Seungjin Lee, and Hongil Yoon
Subjects: 010302 applied physics, Consumption (economics), Magnetoresistive random-access memory, Hardware_MEMORYSTRUCTURES, business.industry, Computer science, Transistor, Electrical engineering, Process (computing), Energy consumption, 01 natural sciences, law.invention, Reliability (semiconductor), law, Transfer (computing), 0103 physical sciences, business, Energy (signal processing)
Abstract: In this paper, a method is proposed to reduce the write energy consumption in order to reduce the energy consumption of the circuit composed of spin-torque transfer magnetic RAM(STT-MRAM). Of the total energy consumption of the circuit, the write energy consumption occupies a large proportion. Omission of unnecessary write operations increases efficiency in terms of energy consumption. Compare the data to be written with the data to be written before proceeding with the write operation of the circuit to omit the unnecessary write operation and selectively execute the necessary write operation. As the number of MTJ increases, the gain in side of energy can be increased, so the simulation had proceeded by increasing the number of MTJ to 512. Simulation results show that the worst case energy consumption consumes 16.5% more energy than the conventional one, but it can reduce energy consumption by up to 90.6% at best. Reliability also increases through the process of comparing write data (WD) with data stored in MTJ.
Published: 2018

11. Energy-efficient write circuit in STT-MRAM based look-up table (LUT) using comparison write scheme

Author: Taegun Yim, Hongil Yoon, Kyungseon Cho, Choongkeun Lee, and Seungjin Lee
Subjects: 010302 applied physics, Magnetoresistive random-access memory, Hardware_MEMORYSTRUCTURES, Computer science, Energy consumption, 01 natural sciences, Non-volatile memory, Memory management, Transfer (computing), 0103 physical sciences, Lookup table, Arithmetic, Energy (signal processing), Efficient energy use
Abstract: In this paper, the circuit is proposed that selectively changes the memory required to perform a write operation in a LUT composed of several spin-torque transfer magnetic RAM(STT-MRAM)s. When write a new data in the STT-MRAM LUT, unnecessary energy is consumed because the write operation is performed even though the data is the equal as the previous one. To overcome this problem, a comparison write scheme is proposed. The energy consumption is smaller than previous circuit's in [1], when 45 or less data is written again based on the 6bit-LUT having 64 memory cells.
Published: 2017

12. Low power multi-context look-up table (LUT) using spin-torque transfer magnetic RAM for non-volatile FPGA

Author: Kyungseon Cho, Choongkeun Lee, Hongil Yoon, Taegun Yim, and Seungjin Lee
Subjects: 010302 applied physics, Magnetoresistive random-access memory, Hardware_MEMORYSTRUCTURES, business.industry, Computer science, Spin-transfer torque, Context (language use), 02 engineering and technology, 021001 nanoscience & nanotechnology, 01 natural sciences, Non-volatile memory, Reliability (semiconductor), 0103 physical sciences, Lookup table, Static random-access memory, 0210 nano-technology, business, Field-programmable gate array, Computer hardware
Abstract: The conventional magnetic RAM (MRAM) LUTs for non-volatile field programmable gate arrays (FPGA) have excellent overall power characteristics for read and static modes but sufficient reliability of data operation has not been met due to the large process variations in the cell process technology. The novel MRAM LUT proposed in this paper can serve to enhance the reliability, reduce the power significantly and reduce implementation size by structuring multi-context in a single MRAM LUT. In an efficient manner, based on the output transition rate of 15%, the proposed 6-input MRAM LUT with 8-context shows 49.6% smaller power consumption and 18.1% smaller area compared to those of the 8 6-input SRAM LUTs.
Published: 2017

13. Research on Performance Improvement for Wireless CCN

Author: Hong-Min Bae, Seungjin Lee, and Byung-Seo Kim
Subjects: Handshake, Computer science, business.industry, Wireless network, Network packet, Reliability (computer networking), ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Overhead (computing), Wireless, The Internet, Performance improvement, business, Computer network
Abstract: To resolve inefficient content delivery mechanism in conventional internet-based networks, Content-Centric Networks (CCN) has been proposed for wired and wireless networks. One of issues in wireless CCN-based networks is overhead to achieve reliability on content delivery because CCN uses end-to-end two-way handshake with Interest/content packets. In this paper, a novel protocol to reduce overhead and achieve reliability is proposed. The protocol allows Interest packet to request multiple data packets and multiple data packets to be sent in a row without a Interest packets. The protocol is evaluated through the simulations and the performance improvement is proved.
Published: 2015

14. A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets

Author: Flavio Bayer, Jimmy Wang, Seungjin Lee, Qing Tang, Chen Li, Kishore Narendran, Zuozhi Wang, and Xuxi Pan
Subjects: Data set, Search engine, Information extraction, Text mining, Relational database management system, Database, Computer science, business.industry, Scalability, computer.software_genre, business, computer, Graphical user interface
Abstract: We are developing TextDB, an open-source datamanagement system that supports text-centric operations in a declarative and efficient way using an algebraic approach as in relational DBMS. In this demonstration, we show scenarios where we can use TextDB to perform powerful information extraction easily and efficiently on text documents. Video: https://github.com/TextDB/textdb/wiki/Video.
Published: 2017

15. An 86 mW 98GOPS ANN-Searching Processor for Full-HD 30 fps Video Object Recognition With Zeroless Locality-Sensitive Hashing

Author: Gyeonghoon Kim, Hoi-Jun Yoo, Seungjin Lee, and Jinwook Oh
Subjects: business.industry, Computer science, Real-time computing, Cognitive neuroscience of visual object recognition, Cache-only memory architecture, Inter frame, k-nearest neighbors algorithm, Locality-sensitive hashing, Computer vision, Cache, Artificial intelligence, Electrical and Electronic Engineering, business, Throughput (business), Auxiliary memory
Abstract: Approximate nearest neighbor (ANN) searching is an essential task in object recognition. The ANN-searching stage, however, is the main bottleneck in the object recognition process due to increasing database size and massive dimensions of keypoint descriptors. In this paper, a high throughput ANN-searching processor is proposed for high-resolution (full-HD) and real-time (30 fps) video object recognition. The proposed ANN-searching processor adopts an interframe cache architecture as a hardware-oriented approach and a zeroless locality-sensitive-hashing (zeroless-LSH) algorithm as a software-oriented approach to reduce the external memory bandwidth required in nearest neighbor searching. A four-way set associative on-chip cache has a dedicated architecture to exploit data correlation at the frame-level. Zeroless-LSH minimizes data transactions from external memory at the vector-level. The proposed ANN-searching processor is fabricated as part of an object recognition SoC using a 0.13 μm 6 metal CMOS technology. It achieves 62 720 vectors/s throughput and 1140 GOPS/W power efficiency, which are 1.45 and 1.37 times higher than the state-of-the-art, respectively, enabling real-time object recognition for full-HD 30 fps video streams.
Published: 2013

16. 1.2-mW Online Learning Mixed-Mode Intelligent Inference Engine for Low-Power Real-Time Object Recognition Processor

Author: Hoi-Jun Yoo, Jinwook Oh, and Seungjin Lee
Subjects: Artificial neural network, Hardware and Architecture, Image processor, Control theory, Computer science, Real-time computing, Cognitive neuroscience of visual object recognition, Process (computing), Inference, Electrical and Electronic Engineering, Inference engine, Fuzzy logic, Software
Abstract: Object recognition is computationally intensive and it is challenging to meet 30-f/s real-time processing demands under sub-watt low-power constraints of mobile platforms even for heterogeneous many-core architecture. In this paper, an intelligent inference engine (IIE) is proposed as a hardware controller for a many-core processor to satisfy the requirements of low-power real-time object recognition. The IIE exploits learning and inference capabilities of the neurofuzzy system by adopting the versatile adaptive neurofuzzy inference system (VANFIS) with the proposed hardware-oriented learning algorithm. Using the programmable VANFIS, the IIE can configure its hardware topology adaptively for different target classifications. Its architecture contains analog/digital mixed-mode neurofuzzy circuits for updating online parameters to increase attention efficiency of object recognition process. It is implemented in 0.13-μm CMOS process and achieves 1.2-mW power consumption with 94% average classification accuracy within 1-μs operation delay. The 0.765-mm2 IIE achieves 76% attention efficiency and reduces power and processing delay of the 50-mm2 image processor by up to 37% and 28%, respectively, when 96% recognition accuracy is achieved.
Published: 2013

17. A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams

Author: Gyeonghoon Kim, Hoi-Jun Yoo, Jinwook Oh, Injoon Hong, Jeong-Ho Woo, Jun-Young Park, Joo-Young Kim, and Seungjin Lee
Subjects: Power management, Computer science, Multithreading, Feature extraction, Real-time computing, Cache, Thread (computing), Electrical and Electronic Engineering, Simultaneous multithreading, Efficient energy use
Abstract: A heterogeneous multi-core processor is proposed to achieve real-time dynamic object recognition on HD 720p video streams. The context-aware visual attention model is proposed to reduce the required computing power for HD object recognition based on enhanced attention accuracy. In order to realize real-time execution of the proposed algorithm, the processor adopts a 5-stage task-level pipeline that maximizes the utilization of its 31 heterogeneous cores, comprising four simultaneous multithreading feature extraction clusters, a cache-based feature matching processor and a machine learning engine. Dynamic resource management is applied to adaptively tune thread allocation and power management during execution based on the detected amount of tasks and hardware utilization to increase energy efficiency. As a result, the 32 mm2 chip, fabricated in 0.13 μm CMOS technology, achieves 30 frame/sec with 342 8-bit GOPS peak performance and 320 mW average power dissipation, which are a 2.72 times performance improvement and 2.54 times per-pixel energy reduction compared to the previous state-of-the-art.
Published: 2013

18. A 92-mW Real-Time Traffic Sign Recognition System With Robust Illumination Adaptation and Support Vector Machine

Author: Joo-Young Kim, Hoi-Jun Yoo, Joonsoo Kwon, Jinwook Oh, Jun-Young Park, and Seungjin Lee
Subjects: Support vector machine, Adaptive neuro fuzzy inference system, Contextual image classification, Color constancy, business.industry, Computer science, Real-time computing, Traffic sign recognition, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Memory controller
Abstract: A low-power real-time traffic sign recognition system that is robust under various illumination conditions is proposed. It is composed of a Retinex preprocessor and an SVM processor. The Retinex preprocessor performs the Multi-Scale Retinex (MSR) algorithm for robust light and dark adaptation under harsh illumination environments. In the Retinex preprocessor, the recursive Gaussian engine (RGE) and reflectance engine (RE) exploit parallelism of the MSR tasks with a two-stage pipeline, and a mixed-mode scale generator (SG) with adaptive neuro-fuzzy inference system (ANFIS) performs parameter optimizations for various scene conditions. The SVM processor performs the SVM algorithm for robust traffic sign classification. The proposed algorithm-optimized small-sized kernel cache and memory controller reduce power consumption and memory redundancy by 78% and 35%, respectively. The proposed system is implemented as two separated ICs in a 0.13-μm CMOS process, and the two chips are connected using network-on-chip off-chip gateway. The system achieves robust sign recognition operation with 90% sign recognition accuracy under harsh illumination conditions while consuming just 92 mW at 1.2 V.
Published: 2012

19. Low-Power, Real-Time Object-Recognition Processors for Mobile Vision Systems

Author: Joo-Young Kim, Jun-Young Park, Hoi-Jun Yoo, Seungjin Lee, Injoon Hong, Gyeonghoon Kim, Jeong-Ho Woo, and Jinwook Oh
Subjects: Multi-core processor, Pixel, Computer science, Real-time computing, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Chip, Energy conservation, Network on a chip, Hardware and Architecture, Robustness (computer science), Low-power electronics, Electrical and Electronic Engineering, Software, Efficient energy use
Abstract: A new low-power object-recognition processor achieves real-time robust recognition, satisfying modern mobile vision systems' requirements. The authors introduce an attention-based object-recognition algorithm for energy efficiency, a heterogeneous multicore architecture for data- and thread-level parallelism, and a network on a chip for high on-chip bandwidth. The fabricated chip achieves 30 frames/second throughput and an average 320 mW power consumption on test 720p video sequences, yielding 640 GOPS/W and 10.5 NJ/pixel energy efficiency.
Published: 2012

20. 24-GOPS 4.5-<formula formulatype='inline'><tex Notation='TeX'>${\rm mm}^{2}$</tex></formula> Digital Cellular Neural Network for Rapid Visual Attention in an Object-Recognition SoC

Author: Hoi-Jun Yoo, Kwanho Kim, Minsu Kim, Joo-Young Kim, and Seungjin Lee
Subjects: Speedup, Computer Networks and Communications, business.industry, Computer science, Frame (networking), Cognitive neuroscience of visual object recognition, General Medicine, Frame rate, Computer Science Applications, Artificial Intelligence, Embedded system, Cellular neural network, System on a chip, Static random-access memory, business, Software, Computer hardware, Shift register
Abstract: This paper presents the Visual Attention Engine (VAE), which is a digital cellular neural network (CNN) that executes the VA algorithm to speed up object-recognition. The proposed time-multiplexed processing element (TMPE) CNN topology achieves high performance and small area by integrating 4800 (8060) cells and 120 PEs. Pipelined operation of the PEs and single-cycle global shift capability of the cells result in a high PE utilization ratio of 93%. The cells are implemented by 6T static random access memory-based register files and dynamic shift registers to enable a small area of 4.5 . The bus connections between PEs and cells are optimized to minimize power consumption. The VAE is integrated within an object-recognition system-on-chip (SoC) fabricated in the 0.13- complementary metal-oxide-semiconductor process. It achieves 24 GOPS peak performance and 22 GOPS sustained performance at 200 MHz enabling one CNN iteration on an 8060 pixel image to be completed in just 4.3 . With VA enabled using the VAE, the workload of the object-recognition SoC is significantly reduced, resulting in 83% higher frame rate while consuming 45% less energy per frame without degradation of recognition accuracy.
Published: 2011

21. A 345 mW Heterogeneous Many-Core Processor With an Intelligent Inference Engine for Robust Object Recognition

Author: Minsu Kim, Hoi-Jun Yoo, Joonsoo Kwon, Jinwook Oh, Seungjin Lee, and Jun-Young Park
Subjects: Artificial neural network, Computer science, business.industry, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Scale-invariant feature transform, Machine learning, computer.software_genre, Visualization, Robustness (computer science), Salience (neuroscience), Salient, Clutter, Saliency map, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, Inference engine, business, computer
Abstract: Fast and robust object recognition of cluttered scenes presents two main challenges: (1) the large number of features to process requires high computational power, and (2) false matches from background clutter can degrade recognition accuracy. Previously, saliency based bottom-up visual attention [1,2] increased recognition speed by confining the recognition processing only to the salient regions. But these schemes had an inherent problem: the accuracy of the attention itself. If attention is paid to the false region, which is common when saliency cannot distinguish between clutter and object, recognition accuracy is degraded. In order to improve the attention accuracy, we previously reported an algorithm, the Unified Visual Attention Model (UVAM) [3], which incorporates the familiarity map on top of the saliency map for the search of attentive points. It can cross-check the accuracy of attention deployment by combining top-down attention, searching for “meaningful objects”, and bottom-up attention, just looking for conspicuous points. This paper presents a heterogeneous many-core (note: we use the term “many-core” instead of “multi-core” to emphasize the large number of cores) processor that realizes the UVAM algorithm to achieve fast and robust object recognition of cluttered video sequences.
Published: 2011

22. Performance analysis of OBS networks using the effective bandwidth method

Author: Jun Kyun Choi, Seungjin Lee, Yonghoon Choi, and Yonggyu Lee
Subjects: Scheme (programming language), Computer Networks and Communications, Computer science, business.industry, Optical burst switching, Blocking (statistics), Atomic and Molecular Physics, and Optics, Hardware and Architecture, Server, Bandwidth (computing), Electrical and Electronic Engineering, business, computer, Computer communication networks, Software, computer.programming_language, Computer network
Abstract: This article provides a new scheme for the blocking probability evaluation for optical burst switching networks. While several previous articles used mainly links as servers, we consider switches as servers. In order to evaluate the blocking probability at switches, we use the effective bandwidth method. The method shows more accurate results and the accuracy of the method is proven by simulation and numerical analyses.
Published: 2010

23. An attention controlled multi-core architecture for energy efficient object recognition

Author: Hoi-Jun Yoo, Seungjin Lee, Sejong Oh, Joo-Young Kim, Minsu Kim, and Jinwook Oh
Subjects: business.industry, Computer science, Cognitive neuroscience of visual object recognition, Dynamic priority scheduling, Chip, Object detection, law.invention, Scheduling (computing), law, Signal Processing, Internet Protocol, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Task manager, business, Software, Simulation, Computer hardware, Efficient energy use
Abstract: In this paper, an attention controlled multi-core architecture is proposed for energy efficient object recognition. The proposed architecture employs two IP layers having different roles for energy efficient recognition processing: the attention/control IPs compute regions-of-interest (ROIs) of the entire image and control the multiple processing cores to perform local object recognition processing on selected area. To this end, a task manager is proposed to perform dynamic scheduling of various ROI tasks from the attention IP to multiple cores in a unit of small-sized grid-tile. Thanks to a number of grid-tile threads generated by the task manager, the utilization of the multiple cores amounts to 92% on average. As a result, the proposed architecture achieves 2.1x energy reduction in multi-core recognition system by indicating processing cores to focus on critical area of the image with a 0.87mJ attention processing. Finally, the proposed architecture is implemented in 0.13@mm CMOS technology and the fabricated chip verifies 3.2x lower energy dissipation per frame than the state-of-the-art object recognition processor.
Published: 2010

24. Visual Image Processing RAM: Memory Architecture With 2-D Data Location Search and Data Consistency Management for a Multicore Object Recognition Processor

Author: Hoi-Jun Yoo, Dong-Hyun Kim, Joo-Young Kim, Seungjin Lee, and Kwanho Kim
Subjects: Random access memory, Multi-core processor, Computer science, Real-time computing, 32-bit, Non-volatile memory, Read-write memory, Data access, Memory bank, Memory architecture, Media Technology, Data synchronization, Electrical and Electronic Engineering, Bitwise operation
Abstract: Visual image processing random access memory (VIP-RAM) is proposed for a real-time multicore object recognition processor. It has two key features for the overall processor: 1) single cycle local maximum location search (LMLS) for fast key-point localization in object recognition, and 2) data consistency management (DCM) for producer-consumer data transactions among the processors. To achieve single cycle LMLS operation for a 3 x 3 window, the VIP-RAM adopts a hierarchical three-bank architecture that finds the maximum of each row in each bank first, then finds the final maximum of the window and its address in the top level. To this end, each memory bank embeds specialized logic blocks, such as three successive data read logic and bitwise competition logic comparator. With the single cycle LMLS operation, the key-point localization task is accelerated by 2.6 ? with a 27% reduction of power. For the DCM function, the VIP-RAM includes a valid check unit (VCU) that automatically manages the validity of each 32-bit data. It dynamically updates/checks the validity of the shared data when the producer processor writes the data or the consumer processor reads data. With a customized single-ended memory cell and multibit-line selection logic, the VCU can provide a validity check not only for single data access, but also for multiple data accesses such as burst and LMLS operation. Eliminating data synchronization overhead with the DCM, the VIP-RAM reduces the amount of on-chip data transactions and execution time in producer-consumer data transactions by 22.6% and 15.4%, respectively. The overall object recognition processor that includes eight VIP-RAMs and ten processors is fabricated in 0.18/im complementary metal-oxide-semiconductor technology with the chip size of 7.7 mm ? 5 mm. The VIP-RAM occupies a 1.09 mm ? 0.83 mm die area and dissipates 113.2 mW when it performs the LMLS operation in every cycle at 200 MHz frequency and 1.8-V supply.
Published: 2010

25. Familiarity based unified visual attention model for fast and robust object recognition

Author: Hoi-Jun Yoo, Seungjin Lee, Kwanho Kim, Joo-Young Kim, and Minsu Kim
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Scale-invariant feature transform, Object (computer science), Object detection, Artificial Intelligence, Salience (neuroscience), Robustness (computer science), Signal Processing, Pattern recognition (psychology), Metric (mathematics), Visual attention, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: Even though visual attention models using bottom-up saliency can speed up object recognition by predicting object locations, in the presence of multiple salient objects, saliency alone cannot discern target objects from the clutter in a scene. Using a metric named familiarity, we propose a top-down method for guiding attention towards target objects, in addition to bottom-up saliency. To demonstrate the effectiveness of familiarity, the unified visual attention model (UVAM) which combines top-down familiarity and bottom-up saliency is applied to SIFT based object recognition. The UVAM is tested on 3600 artificially generated images containing COIL-100 objects with varying amounts of clutter, and on 126 images of real scenes. The recognition times are reduced by 2.7x and 2x, respectively, with no reduction in recognition accuracy, demonstrating the effectiveness and robustness of the familiarity based UVAM.
Published: 2010

26. A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine

Author: Hoi-Jun Yoo, Joo-Young Kim, Seungjin Lee, Sejong Oh, Jeong-Ho Woo, Dong-Hyun Kim, Kwanho Kim, Minsu Kim, and Jinwook Oh
Subjects: Hardware architecture, Visual perception, Artificial neural network, business.industry, Computer science, media_common.quotation_subject, Frame (networking), Feature extraction, Real-time computing, Cognitive neuroscience of visual object recognition, Network on a chip, Perception, Visual attention, SIMD, Electrical and Electronic Engineering, business, Computer hardware, media_common
Abstract: A 201.4 GOPS real-time multi-object recognition processor is presented with a three-stage pipelined architecture. Visual perception based multi-object recognition algorithm is applied to give multiple attentions to multiple objects in the input image. For human-like multi-object perception, a neural perception engine is proposed with biologically inspired neural networks and fuzzy logic circuits. In the proposed hardware architecture, three recognition tasks (visual perception, descriptor generation, and object decision) are directly mapped to the neural perception engine, 16 SIMD processors including 128 processing elements, and decision processor, respectively, and executed in the pipeline to maximize throughput of the object recognition. For efficient task pipelining, proposed task/power manager balances the execution times of the three stages based on intelligent workload estimations. In addition, a 118.4 GB/s multi-casting network-on-chip is proposed for communication architecture with incorporating overall 21 IP blocks. For low-power object recognition, workload-aware dynamic power management is performed in chip-level. The 49 mm2 chip is fabricated in a 0.13 ?m 8-metal CMOS process and contains 3.7 M gates and 396 KB on-chip SRAM. It achieves 60 frame/sec multi-object recognition up to 10 different objects for VGA (640 × 480) video input while dissipating 496 mW at 1.2 V. The obtained 8.2 mJ/frame energy efficiency is 3.2 times higher than the state-of-the-art recognition processor.
Published: 2010

27. A Configurable Heterogeneous Multicore Architecture With Cellular Neural Network for Real-Time Object Recognition

Author: Joo-Young Kim, Kwanho Kim, Hoi-Jun Yoo, Seungjin Lee, and Minsu Kim
Subjects: Multi-core processor, Network architecture, Speedup, Cellular architecture, Computer architecture simulator, Computer science, business.industry, Cognitive neuroscience of visual object recognition, Image processing, Processor array, Object detection, Computer architecture, Cellular neural network, Embedded system, Media Technology, SIMD, Electrical and Electronic Engineering, business, Massively parallel
Abstract: As object recognition requires huge computation power to deal with complex image processing tasks, it is very challenging to meet real-time processing demands under low-power constraints for embedded systems. In this paper, a configurable heterogeneous multicore architecture with a dual-mode linear processor array and a cellular neural network on the network-on-chip platform is presented for real-time object recognition. The bio-inspired attention-based object recognition algorithm is devised to reduce computational complexity of the object recognition. The cellular neural network is utilized to accelerate the visual attention algorithm for selecting salient image regions rapidly. The dual-mode parallel processor is configured into single instruction, multiple data (SIMD) or multiple-instruction-multiple-data modes to perform data-intensive image processing operations while exploiting pixel-level and feature-level parallelisms required for the attention-based object recognition. The algorithm's hybrid parallelization strategy on the proposed architecture is adopted to obtain maximum performance improvement. The performance analysis results, using a cycle-accurate architecture simulator, show that the proposed architecture achieves a speedup of 2.8 times for the target algorithm over conventional massively parallel SIMD architecture at low hardware cost overhead. A prototype chip of the proposed architecture, fabricated in 0.13 mum complementary metal-oxide-semiconductor technology, achieves 22 frames/s real-time object recognition with less than 600 mW power consumption.
Published: 2009

28. Real-Time Object Recognition with Neuro-Fuzzy Controlled Workload-Aware Task Pipelining

Author: Hoi-Jun Yoo, Jinwook Oh, Seungjin Lee, Sejong Oh, Joo-Young Kim, and Minsu Kim
Subjects: Multi-core processor, Neuro-fuzzy, business.industry, Computer science, Pipeline (computing), Real-time computing, Cognitive neuroscience of visual object recognition, Workload, Fuzzy control system, Scheduling (computing), Hardware and Architecture, Embedded system, Human visual system model, Electrical and Electronic Engineering, business, Software
Abstract: A proposed object recognition processor lightens its workload by estimating global region-of-interest features. A neuro-fuzzy controller performs intelligent ROI estimation by mimicking the human visual system, then manages the processor's overall pipeline stages using workload-aware task scheduling and applied database size control. The NFC performs workload-aware dynamic power management to reduce the proposed processor's power consumption.
Published: 2009

29. 81.6 GOPS Object Recognition Processor Based on a Memory-Centric NoC

Author: Joo-Young Kim, Hoi-Jun Yoo, Donghyun Kim, Se-Joong Lee, Seungjin Lee, and Kwanho Kim
Subjects: Computer science, Data parallelism, business.industry, Pipeline (computing), Multiprocessing, Network on a chip, Parallel processing (DSP implementation), Computer architecture, Hardware and Architecture, Embedded system, System on a chip, SIMD, Electrical and Electronic Engineering, business, Instruction-level parallelism, Software
Abstract: For mobile intelligent robot applications, an 81.6 GOPS object recognition processor is implemented. Based on an analysis of the target application, the chip architecture and hardware features are decided. The proposed processor aims to support both task-level and data-level parallelism. Ten processing elements are integrated for the task-level parallelism and single instruction multiple data (SIMD) instruction is added to exploit the data-level parallelism. The memory-centric network-on-chip (NoC) is proposed to support efficient pipelined task execution using the ten processing elements. It also provides coherence and consistency schemes tailored for 1-to-N and M-to-1 data transactions in a task-level pipeline. For further performance gain, the visual image processing memory is also implemented. The chip is fabricated in a 0.18-mum CMOS technology and computes the key-point localization stage of the SIFT object recognition twice faster than the 2.3 GHz Core 2 Duo processor.
Published: 2009

30. A 125 GOPS 583 mW Network-on-Chip Based Parallel Processor With Bio-Inspired Visual Attention Engine

Author: Hoi-Jun Yoo, Joo-Young Kim, Kwanho Kim, Minsu Kim, and Seungjin Lee
Subjects: Random access memory, business.industry, Computer science, Image processing, 32-bit, Chip, Network on a chip, Embedded system, Cellular neural network, Hardware_INTEGRATEDCIRCUITS, System on a chip, SIMD, Static random-access memory, Electrical and Electronic Engineering, business, Computer hardware, Shift register
Abstract: A network-on-chip (NoC) based parallel processor is presented for bio-inspired real-time object recognition with visual attention algorithm. It contains an ARM10-compatible 32-bit main processor, 8 single-instruction multiple-data (SIMD) clusters with 8 processing elements in each cluster, a cellular neural network based visual attention engine (VAE), a matching accelerator, and a DMA-like external interface. The VAE with 2-D shift register array finds salient objects on the entire image rapidly. Then, the parallel processor performs further detailed image processing within only the pre-selected attention regions. The low-latency NoC employs dual channel, adaptive switching and packet-based power management, providing 76.8 GB/s aggregated bandwidth. The 36 mm2 chip contains 1.9 M gates and 226 kB SRAM in a 0.13 mum 8-metal CMOS technology. The fabricated chip achieves a peak performance of 125 GOPS and 22 frames/sec object recognition while dissipating 583 mW at 1.2 V.
Published: 2009

31. Coordination of wavelength and time-window assignment in WDM-based TDM hybrid-PONs

Author: Nam-Uk Kim, Seungjin Lee, Minho Kang, and Tae-Yeon Kim
Subjects: Ethernet, Access network, Computer Networks and Communications, Computer science, business.industry, Passive optical network, Multiplexing, Atomic and Molecular Physics, and Optics, Arrayed waveguide grating, law.invention, Bandwidth allocation, Transmission (telecommunications), Hardware and Architecture, law, Wavelength-division multiplexing, Electrical and Electronic Engineering, business, Software, Computer network
Abstract: In passive optical networks (PONs), the low effectiveness in terms of service utilization and network evolution have been important design issues. In this article, we introduce a hybrid access network architecture, so called scalable WDM-based Ethernet hybrid-PON (SWE-PON), which features a wavelength-division-multiplexed (WDM) feeder network using a combination of tunable laser device (TLD) and cyclic arrayed waveguide grating (AWG) and time-division-multiplexed (TDM) distribution network based on a reflective transmission mode. Necessary conditions needed to guarantee flawless packet transmission through normal WDM/TDM hierarchical PONs including the SWE-PON, are analyzed. We also propose a hierarchical fair time-window allocation mechanism which coordinates wavelength assignment and time-window bandwidth allocation so that high link utilization and fair bandwidth allocation are guaranteed in every multiplexing level.
Published: 2008

32. Cost-effective low-power graphics processing unit for handheld devices

Author: Seungjin Lee, Kwanho Kim, Hoi-Jun Yoo, Jeabin Lee, and Byeong-Gyu Nam
Subjects: Power management, Hardware architecture, Memory management, Computer Networks and Communications, business.industry, Computer science, Graphics processing unit, Bandwidth (computing), Memory bandwidth, Electrical and Electronic Engineering, business, Computer hardware, Computer Science Applications
Abstract: Cost-effective handheld graphics processing units are discussed in the aspects of performance, memory bandwidth, power, and area requirements. The proposed RamP architecture has special features of cost-effective low-power arithmetic units, memory bandwidth reduction, and dynamic power management schemes for handheld GPUs. The detailed design of RamP- VI is explained as an example of the RamP architecture. It adopts logarithmic arithmetic for power and area efficiency, and has a triple- domain power management scheme to minimize power consumption at a given performance level. The proposed GPU shows peak performance of 141 Mvertices/s and 52.4 mW power consumption when it operates at 60 frames/s. It shows 17.5 percent performance improvement and 50.5 percent power reduction compared to the latest work.
Published: 2008

33. Performance Improvement Using Self-Link-Breakage Announcement in Wireless Ad-hoc Networks

Author: Hyun-Ho Shin, Byung-Seo Kim, and Seungjin Lee
Subjects: Routing protocol, business.industry, Computer science, Wireless ad hoc network, Node (networking), Distributed computing, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Physical layer, Wireless, Performance improvement, Routing (electronic design automation), business, Communication channel, Computer network
Abstract: Functions of wireless routing protocols are classified two: finding a route and maintaining the route. The route maintenance function also has two functions: detecting a link-breakage and recovering the broken route. This paper proposes a method to actively detect link breakages unlike conventional methods such as using a hello messages in the routing layer, using transmission failure notification from MAC layer, and using channel condition changes in the physical layer. This paper considers scenarios that nodes can anticipates the link breakages such as the link breakage caused by nodes' physical damages or power off or power save mode. For the cases, this paper proposes that a node causing a link breakage actively sends a built-in message to its neighbors, allowing them to instantly detect the upcoming link-breakage. Therefore, it quickly starts link recovery processes from the neighbor nodes. This paper proposes brief system architecture of a node as well as a protocol for performing the method. Delay reduction in detecting link breakage by using the proposed method is compared with MAC layer-based detecting method.
Published: 2013

34. Enhancements for Local Repair in AODV-Based Ad-Hoc Networks

Author: Hyun-Ho Shin, Seungjin Lee, and Byung-Seo Kim
Subjects: Distance-vector routing protocol, business.industry, Overhead (business), Computer science, Ad hoc On-Demand Distance Vector Routing, Wireless ad hoc network, Node (networking), Process (computing), business, Protocol (object-oriented programming), Computer network
Abstract: Route recovery process of Ad-hoc On-demand Distance Vector (AODV) protocol has been extensively studied. However, the recovery process still requires long delays and overheads. In this paper, an enhanced method to perform quick local recovery process is proposed. In the proposed method, when a link is broken, a node detecting a link-break asks to neighbor nodes who can be a substitute for a node causing the link– break. If there is such a node, then the recovery is quickly and locally completed. The proposed method does not increase overhead to find the substitute comparing to the conventional AODV protocol. This paper provides only the idea at this time, but the performance evaluations for the proposed method will be provided in the upcoming works.
Published: 2013

35. Online Reinforcement Learning NoC for portable HD object recognition processor

Author: Hoi-Jun Yoo, Seungjin Lee, Gyeonghoon Kim, Junyoung Park, Jinwook Oh, and Injoon Hong
Subjects: Feature detection (web development), Network on a chip, Computer science, business.industry, Embedded system, Feature extraction, Cognitive neuroscience of visual object recognition, Reinforcement learning, business, Chip, Throughput (business), Electrical efficiency
Abstract: Heterogeneous multi-core object recognition processor with Reinforcement Learning (RL) NoC is proposed for efficient portable HD object recognition. RL NoC automatically learns management policies in the network of heterogeneous system without an explicit modeling. By adopting RL NoC, the throughput performances of feature detection and description are increased by 20.4% and 11.5%, respectively. As a result, the overall execution time of the object recognition is reduced by 38%. The implemented chip achieves 121mW power consumption with 1.24 TOPS/W power efficiency.
Published: 2012

36. A simultaneous multithreading heterogeneous object recognition processor with machine learning based dynamic resource management

Author: Seungjin Lee, Jinwook Oh, Hoi-Jun Yoo, Joo-Young Kim, Gyeonghoon Kim, Jun-Young Park, and Injoon Hong
Subjects: Multi-core processor, Computer architecture, Computer science, Multithreading, Dynamic priority scheduling, Simultaneous multithreading, Resource management (computing), Throughput (business), Temporal multithreading, Pipeline (software)
Abstract: A simultaneous multithreading multicore processor is proposed to accelerate object recognition for 720p HD video streams. The multithreading architecture with Q-learning based dynamic resource management enables concurrent processing of 8 region-of-interests with 5-stage fine grained recognition pipeline outperforming previous object recognition processors with 342GOPS computing power. In addition, the dynamic resource management contributes to increase of energy efficiency by applying the on-line learning DVFS and dynamic tile allocation based on task variance and hardware utilization to achieve 9.6mJ/frame with 1280×720 pixel image. It achieves 2.72× throughput and 3.7× energy efficiency compared to previous recognition processors.
Published: 2012

37. A 92mW real-time traffic sign recognition system with robust light and dark adaptation

Author: Junyoung Park, Hoi-Jun Yoo, Joonsoo Kwon, Jinwook Oh, and Seungjin Lee
Subjects: Adaptive neuro fuzzy inference system, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Support vector machine, Statistical classification, Robustness (computer science), Memory architecture, Preprocessor, Traffic sign recognition, Computer vision, Artificial intelligence, Cache, business
Abstract: A traffic sign recognition system that is robust under various lighting condition is proposed with an image enhancement preprocessor and a recognition processor. The image enhancement preprocessor performs the Multi-scale Retinex (MSR) algorithm for robust light and dark adaptation. It includes a mixed-mode Adaptive Neuro-Fuzzy Inference System (ANFIS) engine that performs online optimizations for various scenes. The recognition processor performs the Support Vector Machine (SVM) algorithm for robust sign recognition. Its proposed algorithm-optimized kernel cache and memory architecture reduces the power consumption and memory redundancy by 78% and 35%, respectively. The proposed system is implemented in a 0.13μm CMOS process and is connected using Network-on-Chip (NoC) communication. As a result, the system achieves robust sign recognition under various lighting conditions while consuming just 92mW at 1.2V.
Published: 2011

38. A low-energy hybrid radix-4/-8 multiplier for portable multimedia applications

Author: Seungjin Lee, Hoi-Jun Yoo, Gyeonghoon Kim, and Jun-Young Park
Subjects: Multimedia, business.industry, Computer science, Propagation delay, Operand, computer.software_genre, Low energy, Logic gate, Radix, Multiplier (economics), Carry-save adder, Hardware_ARITHMETICANDLOGICSTRUCTURES, business, Encoder, computer, Computer hardware
Abstract: A hybrid radix-4/-8 multiplier is proposed for portable multimedia applications that demand high speed and low energy operation. Depending on the input pattern, the multiplier operates in the radix-8 mode in 56% of the input cases for low power, but reverts to the radix-4 mode in 44% of the slower input cases for high speed. For this, a mode detection circuit determines the mode signal from the input operand in just 2 gate delays. Based on the mode signal, the radix-4/-8 dual Booth encoder generates encoding signals in a hardware efficient way. Moreover, the carry save adder block is selectively activated to reduce power consumption. Compared to a conventional radix-4 multiplier, the proposed hybrid multiplier architecture consumes 33.5% less power at the expense of just 3.3% additional propagation delay, resulting in 31.3% less energy per operation.
Published: 2011

39. A 57mW embedded mixed-mode neuro-fuzzy accelerator for intelligent multi-core processor

Author: Hoi-Jun Yoo, Jun-Young Park, Jinwook Oh, Gyeonghoon Kim, and Seungjin Lee
Subjects: Multi-core processor, Speedup, Neuro-fuzzy, Artificial neural network, business.industry, Computer science, Fuzzy control system, Porting, Software, Application-specific integrated circuit, Embedded system, General-purpose computing on graphics processing units, business, Computer hardware
Abstract: Artificial intelligence (AI) functions are becoming important in smartphones, portable game consoles, and robots for such intelligent applications as object detection, recognition, and human-computer interfaces (HCI). Most of these functions are realized in software with neural networks (NN) and fuzzy systems (FS), but due to power and speed limitations, a hardware solution is needed. For example, software implementations of object-recognition algorithms like SIFT consume ∼10W and ∼1s delay even on a 2.4GHz PC CPU. Previously, GPGPUs or ASICs were used to realize AI functions [1–2]. But GPGPUs just emulate NN/FS with many processing elements to speed up the software, while still consuming a large amount of power. On the other hand, low-power ASICs have been mostly dedicated stand-alone processors, not suitable to be ported into many different systems [2].
Published: 2011

40. A 92m W 76.8GOPS vector matching processor with parallel Huffman decoder and query re-ordering buffer for real-time object recognition

Author: Hoi-Jun Yoo, Jun-Young Park, Joonsoo Kwon, Jinwook Oh, and Seungjin Lee
Subjects: Matching (statistics), symbols.namesake, Computer science, Feature extraction, Bandwidth (computing), symbols, Memory bandwidth, Parallel computing, Huffman coding, Bottleneck, Locality-sensitive hashing, Data compression
Abstract: A vector matching processor with memory bandwidth optimizations is proposed to achieve real-time matching of 128 dimensional SIFT features extracted from VGA video. The main bottleneck of feature-vector matching is the off-chip database access. We employ the locality sensitive hashing (LSH) algorithm which reduces the number of database comparisons required to match each query. In addition, database compression using Huffman coding increases the effective external bandwidth. Dedicated parallel Huffman decoder hardware ensures fast decompression of the database. A flexible query re-ordering buffer exploits overlapping accesses between queries by enabling out-of-order query processing to minimize redundant off-chip access. As a result, the 76.8 GOPS feature matching processor implemented in a 0.13um CMOS process achieves 43200 queries/second on a 100 object database while consuming peak power of 92mW.
Published: 2010

41. Intelligent NoC with neuro-fuzzy bandwidth regulation for a 51 IP object recognition processor

Author: Hoi-Jun Yoo, Minsu Kim, Joonsoo Kwon, Jinwook Oh, Seungjin Lee, Joo-Young Kim, and Jun-Young Park
Subjects: Multi-core processor, Weighted round robin, Computer science, Network packet, business.industry, Embedded system, Synchronization (computer science), Real-time computing, Bandwidth (computing), Overhead (computing), Inference engine, business, Block (data storage)
Abstract: Balancing the execution times of concurrent tasks in a multi-core processor is critical to achieving good performance scaling with increasing core count. However, this is difficult when the tasks' execution times are not known in advance. In this work, we propose an intelligent Network-on-Chip that performs bandwidth regulation using weighted round robin packet arbitration to balance the execution times of 4 Feature Extraction Clusters whose workloads vary depending on the input content. A neuro-fuzzy inference block, named the Intelligent Inference Engine, predicts the workload of each FEC, and assigns a priority weight to each FEC channel. As a result, 34% reduction in synchronization overhead due to unbalanced execution time was achieved, and the overall execution time was reduced by 11.5%.
Published: 2010

42. A 1.2mW on-line learning mixed mode intelligent inference engine for robust object recognition

Author: Hoi-Jun Yoo, Jun-Young Park, Minsu Kim, Joonsoo Kwon, Jinwook Oh, Joo-Young Kim, and Seungjin Lee
Subjects: Computer science, Robustness (computer science), business.industry, Real-time computing, Cognitive neuroscience of visual object recognition, Process control, Inference engine, Mixed mode, Cmos process, business, Processing delay, Computer hardware, Electronic circuit
Abstract: An intelligent inference engine (IIE) is proposed as a controller for low power high speed robust object recognition processor. It contains analog digital mixed mode neuro-fuzzy circuits for the on-line learning to increase attention efficiency. It is implemented in 0.13um CMOS process and achieves 1.2mW power consumption with 94% average classification accuracy within 1us operation. The 0.765mm2 IIE achieves 76% attention efficiency, and reduces power and processing delay of the 50mm2 recognition processor by up to 37% and 28%, respectively, with 96 % recognition accuracy.
Published: 2010

43. A 30fps stereo matching processor based on belief propagation with disparity-parallel PE array architecture

Author: Seungjin Lee, Junyoung Park, and Hoi-Jun Yoo
Subjects: Memory management, CMOS, Pixel, Computer science, Pipeline (computing), Stereo matching, Parallel computing, Architecture, Belief propagation
Abstract: In this paper, we propose a real-time stereo matching processor based on the belief propagation algorithm. Computationally complex message construction is accelerated by a disparity-parallel PE array architecture, which calculates messages for all disparity levels (1–32) in parallel. A tile-based belief propagation approach reduces the on-chip memory requirements by 95.4% compared to the previous works. In addition, a two-level on-chip buffer and memory access pipelining enable high PE utilization of 89%. As a result, the message construction rate of the PEs is increased by 6.45x compared to previous works. The fabricated processor in a 0.18um CMOS process achieves 30 fps performance for QVGA (320×240) video inputs at 200 MHz operating frequency.
Published: 2010

44. A 118.4GB/s multi-casting network-on-chip for real-time object recognition processor

Author: Seungjin Lee, Hoi-Jun Yoo, Joo-Young Kim, Minsu Kim, Jinwook Oh, and Kwanho Kim
Subjects: Computer science, business.industry, Cognitive neuroscience of visual object recognition, Ring network, Topology (electrical circuits), Hardware_PERFORMANCEANDRELIABILITY, Energy consumption, Network topology, Synchronization, Network on a chip, CMOS, Embedded system, Hardware_INTEGRATEDCIRCUITS, business
Abstract: A 118.4GB/s multi-casting network-on-chip (MC-NoC) is developed as communication platform for a real-time object recognition processor. To support application-specific data transactions, the MC-NoC adopts the combination of hierarchical star and ring topology with the multi-casting capability. As a result, the proposed MC-NoC improves data transaction time and energy consumption by 20% and 23%, respectively, under target object recognition traffic. The 350k gates MC-NoC, fabricated in a 0.13µm CMOS process, consumes 48mW at 400MHz, 1.2V.
Published: 2009

45. A 54GOPS 51.8mW analog-digital mixed mode Neural Perception Engine for fast object detection

Author: Joo-Young Kim, Minsu Kim, Hoi-Jun Yoo, Jinwook Oh, and Seungjin Lee
Subjects: Computational complexity theory, Computer science, Feature extraction, Real-time computing, Ode, Frame rate, Chip, Object detection, Power (physics), Electronic circuit
Abstract: A mixed mode Neural Perception Engine (NPE) is proposed as the pre-processing accelerator of multi-object recognition processor to reduce the computational complexity and increase its efficiency. It consists of Motion Estimator (ME), Visual Attention Engine (VAE) and Object Detection Engine (ODE). The fabricated chip achieves 54 GOPS 51.8mW NPE. By implementing a fast and robust neuro-fuzzy algorithm in analog-digital mixed circuits, the area and power of the ODE is reduced by 59% and 44%, respectively, compared to those of all digital implementation. The NPE can increase the frame rate by 2.09x and reduce power consumption by 38% of the multi-object recognition processor.
Published: 2009

46. A 60fps 496mW multi-object recognition processor with workload-aware dynamic power management

Author: Jinwook Oh, Hoi-Jun Yoo, Minsu Kim, Seungjin Lee, and Joo-Young Kim
Subjects: Software, Video Graphics Array, business.industry, Computer science, Embedded system, Frame (networking), Clock gating, Integrated circuit design, business, Chip, Computer hardware, Power (physics), Efficient energy use
Abstract: An energy efficient object recognition processor is proposed for real-time visual applications. Its energy efficiency is improved by lowering average power consumption while sustaining high frame rate. To this end, the proposed processor features from all levels of chip design. In architecture level, it performs 3-stage task pipelining for high frame rate operation and workload-aware dynamic power management for low power consumption. In block level, energy efficient special purposed engines are employed while software controlled clock gating is exploited for fine-grained clock control. In circuit level, analog-digital mixed design is used to reduce power with the same performance. As a result, the 49mm2 chip in a 0.13mm technology achieves 60fps object recognition for VGA (640x480) input with 496mW power at the supply of 1.2V. It means only 8.2mJ is dissipated per frame, which is 3.2X more energy efficient than the state of the art.
Published: 2009

47. An area efficient shared synapse cellular neural network for low power image processing

Author: Joo-Young Kim, Seungjin Lee, Jinwook Oh, and Hoi-Jun Yoo
Subjects: Very-large-scale integration, Capacitor, law, Computer science, Cellular neural network, Transistor, Electronic engineering, Image processing, Energy consumption, Sample and hold, law.invention, Power (physics)
Abstract: This paper presents an area and power efficient cellular neural network (CNN) that enables real-time image processing. The proposed shared synapse architecture halves the number of required synapse multipliers, which are the main contributor to area and power consumption of CNNs. For this, a current holder circuit is used to sample and hold the currents of non-changing synaptic circuit outputs. Compared to the conventional architecture of CNNs, power and area are reduced by 46% and 41%, respectively.
Published: 2009

48. A 66fps 3 8mW nearest neighbor matching processor with hierarchical VQ algorithm for real-time object recognition

Author: Joo-Young Kim, Kwanho Kim, Hoi-Jun Yoo, Minsu Kim, and Seungjin Lee
Subjects: Reduction (complexity), Pixel, Reduced instruction set computing, Matching (graph theory), Computer science, Vector quantization, Cognitive neuroscience of visual object recognition, Frame rate, Algorithm, k-nearest neighbors algorithm
Abstract: A 66 fps 38 mW nearest neighbor matching processor for real-time object recognition has been fabricated in 0.13 mum CMOS technology. It consists of RISC processing core, pre-fetch DMA, and two independent sets of logic merged memories. Based on hierarchical vector quantization (H-VQ) algorithm, implemented processor achieves 22.5X cycle time reduction in matching process without any accuracy loss in VQ operation. As a result, 66 fps frame rate is obtained for QVGA (320times240 pixels) video images with 5632-entry database.
Published: 2008

49. A 76.8 GB/s 46 mW low-latency network-on-chip for real-time object recognition processor

Author: Hoi-Jun Yoo, Joo-Young Kim, Seungjin Lee, Minsu Kim, and Kwanho Kim
Subjects: Power management, business.industry, Data parallelism, Computer science, Clock gating, Hardware_PERFORMANCEANDRELIABILITY, Network topology, Network on a chip, Embedded system, Low-power electronics, Hardware_INTEGRATEDCIRCUITS, Crossbar switch, Latency (engineering), business
Abstract: A 76.8 GB/s 46 mW low-latency network-on-chip (NoC) provides a communication platform for a real-time object recognition processor. The tree-based topology NoC with three crossbar switches is designed for low-latency by adopting dual-channel and adaptive switching. The NoC can be dynamically configured to exploit both data-level and object-level parallelism on the object recognition processor. FLIT-level clock gating and packet-based power management scheme are employed for low power consumption. The NoC is implemented in 0.13 mum CMOS process and provides 76.8 GB/s aggregated bandwidth at 400 MHz with 2-clock cycle latency while dissipating 46 mW at 1.2 V.
Published: 2008

50. A 211 GOPS/W dual-mode real-time object recognition processor with Network-on-Chip

Author: Hoi-Jun Yoo, Seungjin Lee, Minsu Kim, Kwanho Kim, and Joo-Young Kim
Subjects: Power management, MIMD, Network on a chip, Parallel processing (DSP implementation), CMOS, Computer science, Very long instruction word, Hardware_INTEGRATEDCIRCUITS, Parallel computing, SIMD, ComputerSystemsOrganization_PROCESSORARCHITECTURES, Chip
Abstract: This paper presents a 211 GOPS/W real-time object recognition processor with network-on-chip (NoC). The chip integrates 8 linearly connected SIMD clusters with 8 4-way VLIW processing elements (PEs) per cluster. The SIMD/MIMD dual-mode object recognition processor exploits both data-level and object-level parallelism based on the NoC configuration. The 8-way SIMD PE cluster is optimized for data-intensive object recognition tasks. Packet-based power management scheme is employed for low power consumption. The proposed processor takes 36 mm2 in 0.13 mum CMOS process and achieves a peak performance of 96 GOPS at 200 MHz with 392 mW power consumption.
Published: 2008

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

79 results on '"Seungjin Lee"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources