Author: "Ki-Seung Lee" / Topic: electrical and electronic engineering - Searchworks@Jio Institute Digital Library Search Results

1. Silent Speech Interface Using Ultrasonic Doppler Sonar

Author: Ki-Seung Lee
Subjects: Silent speech interface, Artificial Intelligence, Hardware and Architecture, Computer science, Acoustics, Ultrasonic doppler, Deep neural networks, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Sonar, Software
Published: 2020
Full Text: View/download PDF

2. Automatic Estimation of Food Intake Amount Using Visual and Ultrasonic Signals

Author: Ki-Seung Lee
Subjects: Food intake, Audio signal, Modality (human–computer interaction), TK7800-8360, Computer Networks and Communications, Computer science, business.industry, Continuous monitoring, Ultrasonic doppler, food intake estimation, chewing sound detection, food image recognition, External noise, Identification (information), Hardware and Architecture, Control and Systems Engineering, Signal Processing, Computer vision, Ultrasonic sensor, Artificial intelligence, Electrical and Electronic Engineering, Electronics, business
Abstract: The continuous monitoring and recording of food intake amount without user intervention is very useful in the prevention of obesity and metabolic diseases. I adopted a technique that automatically recognizes food intake amount by combining the identification of food types through image recognition and a technique that uses acoustic modality to recognize chewing events. The accuracy of using audio signal to detect eating activity is seriously degraded in a noisy environment. To alleviate this problem, contact sensing methods have conventionally been adopted, wherein sensors are attached to the face or neck region to reduce external noise. Such sensing methods, however, cause dermatological discomfort and a feeling of cosmetic unnaturalness for most users. In this study, a noise-robust and non-contact sensing method was employed, wherein ultrasonic Doppler shifts were used to detect chewing events. The experimental results showed that the mean absolute percentage errors (MAPEs) of an ultrasonic-based method were comparable with those of the audio-based method (15.3 vs. 14.6) when 30 food items were used for experiments. The food intake amounts were estimated for eight subjects in several noisy environments (cafeterias, restaurants, and home dining rooms). For all subjects, the estimation accuracy of the ultrasonic method was not degraded (the average MAPE was 15.02) even under noisy conditions. These results show that the proposed method has the potential to replace the manual logging method.
Published: 2021
Full Text: View/download PDF

3. Food Intake Detection Using Ultrasonic Doppler Sonar

Author: Ki-Seung Lee
Subjects: Acoustics, 0206 medical engineering, Ultrasonic doppler, 02 engineering and technology, 01 natural sciences, Sonar, symbols.namesake, stomatognathic system, Swallowing, otorhinolaryngologic diseases, medicine, Electrical and Electronic Engineering, Instrumentation, business.industry, digestive, oral, and skin physiology, 010401 analytical chemistry, Ultrasound, Continuous monitoring, 020601 biomedical engineering, Chin, 0104 chemical sciences, medicine.anatomical_structure, symbols, Ultrasonic sensor, business, Doppler effect
Abstract: Reliable, user-friendly and convenient sensing is highly desirable when the continuous monitoring of food intake is necessary. In this paper, food intake monitoring was during the processes of chewing and swallowing. Acoustic Doppler sonar (ADS) detected chewing and swallowing events that were non-contact and free from acoustic interference. When a 40 kHz ultrasonic beam was focused on the lower jaw and neck, movements of the chin and neck cause Doppler frequency shifts and an amplitude envelope modulation of ultrasonic signals. Hence, it was possible to detect chewing and swallowing events using Doppler frequency shifts in the received ultrasound signals. To prevent suspicious chew events caused by talking from being recognized as food intake events, the log-filter bank energy of the voice band was also taken into consideration. Automatic detection of chewing and swallowing events was achieved via an artificial neural network. The experimental results showed that the proposed ADS-based food intake detection method yielded promising results with maximum recognition rates of 91.4% and 78.4% for chewing and swallowing, respectively. As a result, it was confirmed that the proposed food intake detection method using ultrasonic Doppler yielded high rates of recognition without discomfort to the user from continuous skin contact.
Published: 2017
Full Text: View/download PDF

4. Restricted Boltzmann Machine-Based Voice Conversion for Nonparallel Corpus

Author: Ki-Seung Lee
Subjects: Restricted Boltzmann machine, Training set, business.industry, Computer science, Applied Mathematics, Speech recognition, Feature extraction, 020206 networking & telecommunications, Pattern recognition, Speech corpus, Probability density function, 02 engineering and technology, Conditional probability distribution, 030507 speech-language pathology & audiology, 03 medical and health sciences, Distribution (mathematics), Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Electrical and Electronic Engineering, 0305 other medical science, business
Abstract: A large amount of parallel training corpus is necessary for robust, high-quality voice conversion. However, such parallel data may not always be available. This letter presents a new voice conversion method that needs no parallel speech corpus, and adopts a restricted Boltzmann machine (RBM) to represent the distribution of the spectral features derived from a target speaker. A linear transformation was employed to convert the spectral and delta features. A conversion function was obtained by maximizing the conditional probability density function with respect to the target RBM. A feasibility test was carried out on the OGI VOICES corpus. Results from the subjective listening tests and the objective results both showed that the proposed method outperforms the conventional GMM-based method.
Published: 2017
Full Text: View/download PDF

5. Joint Audio-Ultrasound Food Recognition for Noisy Environments

Author: Ki-Seung Lee
Subjects: Adult, Male, Computer science, A-weighting, Signal-To-Noise Ratio, 01 natural sciences, Pattern Recognition, Automated, Set (abstract data type), 030507 speech-language pathology & audiology, 03 medical and health sciences, Eating, Young Adult, Signal-to-noise ratio, Health Information Management, Feature (machine learning), Humans, Ultrasonics, Electrical and Electronic Engineering, Linear combination, Noise measurement, business.industry, 010401 analytical chemistry, Pattern recognition, Signal Processing, Computer-Assisted, Equipment Design, Middle Aged, 0104 chemical sciences, Computer Science Applications, Noise, Food, Female, Artificial intelligence, Neural Networks, Computer, 0305 other medical science, business, Joint (audio engineering), Biotechnology
Abstract: Continuous recognition of ingested foods without user intervention is very useful for the pre-screening of obesity and diet-related disease. An automatic food recognition method that combines the two modalities of audio and ultrasonic signals (US) is proposed in this study. Under a noise-free environment, classification accuracy of an audio-only recognizer is generally higher than that of US-only recognizers, but the performance of US recognizers is unaffected by acoustic noise levels. In the recognition system presented herein, the likelihood score of the audio-US feature was given by a linear combination of class-conditional observation log-likelihoods for two classifiers, using the appropriate weights. We developed a weighting process adaptive to signal-to-noise ratios (SNRs). The main objective here involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. A feasibility test was conducted to verify the usefulness of the proposed method by conducting recognition experiments on seven types of food. The performance was compared with conventional methods that use in-ear and throat microphones. The proposed method yielded remarkable levels of recognition performance of 90.13% for artificially added noise and 89.67% under actual noisy environments, when the SNR ranged from 0 to 20 dB.
Published: 2019

6. HMM-Based Maximum Likelihood Frame Alignment for Voice Conversion from a Nonparallel Corpus

Author: Ki-Seung Lee
Subjects: Computer science, Maximum likelihood, Speech recognition, Frame (networking), 020206 networking & telecommunications, 02 engineering and technology, 030507 speech-language pathology & audiology, 03 medical and health sciences, Artificial Intelligence, Hardware and Architecture, Maximum likelihood criterion, 0202 electrical engineering, electronic engineering, information engineering, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, 0305 other medical science, Hidden Markov model, Software
Published: 2017
Full Text: View/download PDF

7. Compensation for Shot-to-Shot Variations in Laser Pulse Energy for Photoacoustic Imaging

Author: Ki-Seung Lee
Subjects: Photoacoustic effect, Materials science, business.industry, Photoacoustic imaging in biomedicine, 02 engineering and technology, Laser, Electronic, Optical and Magnetic Materials, Compensation (engineering), law.invention, Photoacoustic Doppler effect, 020210 optoelectronics & photonics, Optics, law, Shot (pellet), 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, business, Pulse energy
Published: 2017
Full Text: View/download PDF

8. Field-free switching of perpendicular magnetization through spin–orbit torque in antiferromagnet/ferromagnet/oxide structures

Author: Hyun-Woo Lee, Kyoung-Whan Kim, Seung-heon Chris Baek, Gyungchoon Go, Chang Geun Yang, Young Wan Oh, Ki-Seung Lee, Y. M. Kim, Byong-Guk Park, Eun Sang Park, Hae Yeon Lee, Kyung Jin Lee, Jong-Ryul Jeong, Byoung-Chul Min, and Kyeong Dong Lee
Subjects: Coupling, Physics, Field (physics), Spintronics, Condensed matter physics, Biomedical Engineering, Spin-transfer torque, Bioengineering, 02 engineering and technology, 021001 nanoscience & nanotechnology, Condensed Matter Physics, 01 natural sciences, Atomic and Molecular Physics, and Optics, Magnetic field, Exchange bias, Ferromagnetism, 0103 physical sciences, Antiferromagnetism, Condensed Matter::Strongly Correlated Electrons, General Materials Science, Astrophysics::Earth and Planetary Astrophysics, Electrical and Electronic Engineering, 010306 general physics, 0210 nano-technology
Abstract: Spin-orbit torques arising from the spin-orbit coupling of non-magnetic heavy metals allow electrical switching of perpendicular magnetization. However, the switching is not purely electrical in laterally homogeneous structures. An extra in-plane magnetic field is indeed required to achieve deterministic switching, and this is detrimental for device applications. On the other hand, if antiferromagnets can generate spin-orbit torques, they may enable all-electrical deterministic switching because the desired magnetic field may be replaced by their exchange bias. Here we report sizeable spin-orbit torques in IrMn/CoFeB/MgO structures. The antiferromagnetic IrMn layer also supplies an in-plane exchange bias field, which enables all-electrical deterministic switching of perpendicular magnetization without any assistance from an external magnetic field. Together with sizeable spin-orbit torques, these features make antiferromagnets a promising candidate for future spintronic devices. We also show that the signs of the spin-orbit torques in various IrMn-based structures cannot be explained by existing theories and thus significant theoretical progress is required.
Published: 2016
Full Text: View/download PDF

9. Position-Dependent Crosstalk Cancellation Using Space Partitioning

Author: Ki-Seung Lee
Subjects: Acoustics and Ultrasonics, Artificial neural network, Computer science, Acoustics, Filter (signal processing), computer.software_genre, Correlation, Computer Science::Sound, Position (vector), Active listening, Electrical and Electronic Engineering, Space partitioning, Audio signal processing, Algorithm, computer, Communication channel
Abstract: The present study tested a new stereo playback system that effectively cancels cross-talk signals at an arbitrary listening position. Such a playback system was implemented by integrating listener position tracking techniques and crosstalk cancellation techniques. The entire listening space was partitioned into a number of non-overlapped cells and a crosstalk cancellation filter was assigned to each cell. The listening space partitions and the corresponding crosstalk cancellation filters were constructed by maximizing the average channel separation ratio (CSR). Since the proposed method employed cell-based crosstalk cancellation, estimation of the exact position of the listener was not necessary. Instead, it was only necessary to determine the cell in which the listener was located. This was achieved by simply employing an artificial neural network (ANN) where the time delay to each pair of microphones was used as the ANN input and the ANN output corresponded to the index of cells. The experimental results showed that more than 95% of the experimental listening space had a CSR ≥ 10 dB when the number of clusters exceeded 12. Under these conditions, the correlation between the true directions of the virtual sound sources and the directions recognized by the subjects was greater than 0.9.
Published: 2013
Full Text: View/download PDF

10. A Relevant Distance Criterion for Interpolation of Head-Related Transfer Functions

Author: Seok-Pil Lee and Ki-Seung Lee
Subjects: Acoustics and Ultrasonics, Human head, Distortion, Acoustics, Mel-frequency cepstrum, Electrical and Electronic Engineering, Horizontal plane, Transfer function, Head-related transfer function, Binaural recording, Mathematics, Interpolation
Abstract: In binaural synthesis, in order to realize more precise and accurate spatial sound, it would be desirable to measure a large number of the head-related transfer functions (HRTFs) in various directions. To reduce the size of the HRTFs, interpolation is often employed, where the HRTF for any direction can be obtained by a limited number of the representative HRTFs. In this paper, it is determined which distortion measure for interpolation of the HRTFs in the horizontal plane is most suitable for predicting audible differences in sound location. Four kinds of HRTF sets, measured using three human heads and one mannequin (KEMAR), were prepared for this study. Using various objective distortion criteria, the differences between interpolated and measured HRTFs were computed. These were then related to the results from the listening tests through receiver operator characteristic (ROC) curves. The results of the present study indicated that for the HRTF sets measured from three human heads, the best predictor of performance was obtained using the distortion measurement computed from the mel-cepstral coefficients, whereas the distortion measurement associated with interaural time delay predicted audible differences in sound location reasonably well for the KEMAR HRTF set. A feasibility test was conducted to verify the usefulness of the selected distortion measurement.
Published: 2011
Full Text: View/download PDF

11. A real-time audio system for adjusting the sweet spot to the listener's position

Author: Seok-Pil Lee and Ki-Seung Lee
Subjects: Engineering, Reverberation, business.industry, Acoustics, Speech recognition, Direction of arrival, computer.software_genre, Rendering (computer graphics), law.invention, Microprocessor, Stereophonic sound, law, Media Technology, Loudspeaker, Electrical and Electronic Engineering, business, Audio signal processing, computer, Digital signal processing
Abstract: In the present study, a new stereophonic playback system was proposed, where the cross-talk signals would be reasonably cancelled at an arbitrary listener position. The system was composed of two major parts: the listener position tracking part and the sound rendering part. The position of the listener was estimated using acoustic signals from the listener (i.e. voice or hand-clapping signals). A direction of arrival (DOA) algorithm was adopted to estimate the directions of acoustic sources where the room reverberation effects were taken into consideration. A Crosstalk cancellation filter was designed using a free-field model. To determine the maximum tolerable shift of the listener position, a quantitative analysis of the channel separation ratio according to the displacement of the listener position was performed. Prototype hardware was implemented using a microprocessor board, a DSP board, a multi-channel ADC board and an analog frontend. The results showed that the average mean square error between the true direction of a listener and the estimated direction was about 5 degrees. More than 80% of the tested subjects indicated that better stereo images were obtained by the proposed system, compared with the non-processed signals.
Published: 2010
Full Text: View/download PDF

12. Statistical Approach for Voice Personality Transformation

Author: Ki-Seung Lee
Subjects: Probabilistic classification, Acoustics and Ultrasonics, business.industry, Speech recognition, Vector quantization, Pattern recognition, Linear predictive coding, Speech processing, Speaker diarisation, Transformation (function), Cepstrum, Artificial intelligence, Electrical and Electronic Engineering, Prosody, business, Mathematics
Abstract: A voice transformation method which changes the source speaker's utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker's voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods
Published: 2007
Full Text: View/download PDF

13. Robust Recognition of Fast Speech

Author: Ki-Seung Lee
Subjects: Signal processing, Voice activity detection, Degree (graph theory), Computer science, Speech recognition, Maximum likelihood, Word error rate, Speech processing, Artificial Intelligence, Hardware and Architecture, Cepstrum, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Software, Utterance
Abstract: This letter describes a robust speech recognition system for recognizing fast speech by stretching the length of the utterance in the cepstrum domain. The degree of stretching for an utterance is determined by its rate of speech (ROS), which is based on a maximum likelihood (ML) criterion. The proposed method was evaluated on 10-digits mobile phone numbers. The results of the simulation show that the overall error rate was reduced by 17.8% when the proposed method was employed.
Published: 2006
Full Text: View/download PDF

14. MLP-based phone boundary refining for a TTS database

Author: Ki-Seung Lee
Subjects: Acoustics and Ultrasonics, Artificial neural network, Database, Computer science, business.industry, Speech recognition, Speech synthesis, Pattern recognition, Speech corpus, computer.software_genre, Speech processing, Viterbi algorithm, symbols.namesake, ComputingMethodologies_PATTERNRECOGNITION, Phone, Multilayer perceptron, symbols, Artificial intelligence, Electrical and Electronic Engineering, Hidden Markov model, business, computer
Abstract: The automatic labeling of a large speech corpus plays an important role in the development of a high-quality Text-To-Speech (TTS) synthesis system. This paper describes a method for the automatic labeling of speech signals, which mainly involves the construction of a large database for a TTS synthesis system. The main objective of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model. A multilayer perceptron (MLP) was employed to refine the phone boundaries. To increase the accuracy of phoneme segmentation, several specialized MLPs were individually trained based on phonetic transition. The optimum partitioning of the entire phonetic transition space and the corresponding MLPs were constructed from the standpoint of minimizing the overall deviation from the hand-labeling position. The experimental results showed that more than 93% of all phone boundaries have a boundary deviation from a reference position smaller than 20 ms. We also confirmed that the database constructed using the proposed method produced results that were perceptually comparable to a hand-labeled database, based on subjective listening tests.
Published: 2006
Full Text: View/download PDF

15. Context-adaptive smoothing for concatenative speech synthesis

Author: Ki-Seung Lee and Sang-Ryong Kim
Subjects: business.industry, Computer science, Applied Mathematics, Speech recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Adaptive smoothing, Context (language use), Speech synthesis, Pattern recognition, Classification of discontinuities, computer.software_genre, Signal Processing, Artificial intelligence, Electrical and Electronic Engineering, business, computer, Smoothing, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: In text-to-speech synthesis, spectral smoothing is often employed to reduce artifacts at unit-joining points. A context-adaptive smoothing method is proposed in this letter, where the amount of smoothing is determined according to context information. Discontinuities at unit boundaries are predicted by a regression tree, and smoothing factors are computed by using predicted discontinuities and real discontinuities at unit boundaries. Experimental results are presented to demonstrate the effectiveness of the proposed method.
Published: 2002
Full Text: View/download PDF

16. A very low bit rate speech coder based on a recognition/synthesis paradigm

Author: R.V. Cox and Ki-Seung Lee
Subjects: Acoustics and Ultrasonics, Computer science, Speech recognition, Concatenation, Speech coding, Speech synthesis, Data_CODINGANDINFORMATIONTHEORY, Intelligibility (communication), computer.software_genre, Speech processing, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Prosody, computer, Encoder, Software, Pitch contour
Abstract: Previous studies have shown that a concatenative speech synthesis system with a large database produces more natural sounding speech. We apply this paradigm to the design of improved very low bit rate speech coders (sub 1000 b/s). The proposed speech coder consists of unit selection, prosody coding, prosody modification and waveform concatenation. The encoder selects the best unit sequence from a large database and compresses the prosody information. The transmitted parameters include unit indices and the prosody information. To increase naturalness as well as intelligibility, two costs are considered in the unit selection process: an acoustic target cost and a concatenation cost. A rate-distortion-based piecewise linear approximation is proposed to compress the pitch contour. The decoder concatenates the set of units, and then synthesizes the resultant sequence of speech frames using the harmonic+noise model (HNM) scheme. Before concatenating units, prosody modification which includes pitch shifting and gain modification is applied to match those of the input speech. With single speaker stimuli, a comparison category rating (CCR) test shows that the performance of the proposed coder is close to that of the 2400-b/s MELP coder at an average bit rate of about 800-b/s during talk spurts.
Published: 2001
Full Text: View/download PDF

17. Temporal Decomposition Based on a Rate-Distortion Criterion

Author: Ki-Seung Lee
Subjects: business.industry, Applied Mathematics, Data_MISCELLANEOUS, Pattern recognition, Spectral distortion, Amplitude distortion, Speech processing, Rate–distortion theory, Signal Processing, Bit rate, Spectral analysis, Artificial intelligence, Electrical and Electronic Engineering, Rate distortion, business, Algorithm, Mathematics
Abstract: This letter addresses a temporal decomposition (TD) technique that is based on a rate-distortion criterion. In the proposed TD scheme, a set of interpolation functions is constructed from a given training corpus, and the optimum target points are found in the sense of minimizing, not only spectral distortion, but also bit rates. The results of the simulation show that an average spectral distortion of about 1.4 dB can be achieved at an average bit rate of about 8 bits/frame.
Published: 2004
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Ki-Seung Lee"'

1. Silent Speech Interface Using Ultrasonic Doppler Sonar

2. Automatic Estimation of Food Intake Amount Using Visual and Ultrasonic Signals

3. Food Intake Detection Using Ultrasonic Doppler Sonar

4. Restricted Boltzmann Machine-Based Voice Conversion for Nonparallel Corpus

5. Joint Audio-Ultrasound Food Recognition for Noisy Environments

6. HMM-Based Maximum Likelihood Frame Alignment for Voice Conversion from a Nonparallel Corpus

7. Compensation for Shot-to-Shot Variations in Laser Pulse Energy for Photoacoustic Imaging

8. Field-free switching of perpendicular magnetization through spin–orbit torque in antiferromagnet/ferromagnet/oxide structures

9. Position-Dependent Crosstalk Cancellation Using Space Partitioning

10. A Relevant Distance Criterion for Interpolation of Head-Related Transfer Functions

11. A real-time audio system for adjusting the sweet spot to the listener's position

12. Statistical Approach for Voice Personality Transformation

13. Robust Recognition of Fast Speech

14. MLP-based phone boundary refining for a TTS database

15. Context-adaptive smoothing for concatenative speech synthesis

16. A very low bit rate speech coder based on a recognition/synthesis paradigm

17. Temporal Decomposition Based on a Rate-Distortion Criterion

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

17 results on '"Ki-Seung Lee"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources