17 results on '"Ki-Seung Lee"'
Search Results
2. Automatic Estimation of Food Intake Amount Using Visual and Ultrasonic Signals
- Author
-
Ki-Seung Lee
- Subjects
Food intake ,Audio signal ,Modality (human–computer interaction) ,TK7800-8360 ,Computer Networks and Communications ,Computer science ,business.industry ,Continuous monitoring ,Ultrasonic doppler ,food intake estimation ,chewing sound detection ,food image recognition ,External noise ,Identification (information) ,Hardware and Architecture ,Control and Systems Engineering ,Signal Processing ,Computer vision ,Ultrasonic sensor ,Artificial intelligence ,Electrical and Electronic Engineering ,Electronics ,business - Abstract
The continuous monitoring and recording of food intake amount without user intervention is very useful in the prevention of obesity and metabolic diseases. I adopted a technique that automatically recognizes food intake amount by combining the identification of food types through image recognition and a technique that uses acoustic modality to recognize chewing events. The accuracy of using audio signal to detect eating activity is seriously degraded in a noisy environment. To alleviate this problem, contact sensing methods have conventionally been adopted, wherein sensors are attached to the face or neck region to reduce external noise. Such sensing methods, however, cause dermatological discomfort and a feeling of cosmetic unnaturalness for most users. In this study, a noise-robust and non-contact sensing method was employed, wherein ultrasonic Doppler shifts were used to detect chewing events. The experimental results showed that the mean absolute percentage errors (MAPEs) of an ultrasonic-based method were comparable with those of the audio-based method (15.3 vs. 14.6) when 30 food items were used for experiments. The food intake amounts were estimated for eight subjects in several noisy environments (cafeterias, restaurants, and home dining rooms). For all subjects, the estimation accuracy of the ultrasonic method was not degraded (the average MAPE was 15.02) even under noisy conditions. These results show that the proposed method has the potential to replace the manual logging method.
- Published
- 2021
- Full Text
- View/download PDF
3. Food Intake Detection Using Ultrasonic Doppler Sonar
- Author
-
Ki-Seung Lee
- Subjects
Acoustics ,0206 medical engineering ,Ultrasonic doppler ,02 engineering and technology ,01 natural sciences ,Sonar ,symbols.namesake ,stomatognathic system ,Swallowing ,otorhinolaryngologic diseases ,medicine ,Electrical and Electronic Engineering ,Instrumentation ,business.industry ,digestive, oral, and skin physiology ,010401 analytical chemistry ,Ultrasound ,Continuous monitoring ,020601 biomedical engineering ,Chin ,0104 chemical sciences ,medicine.anatomical_structure ,symbols ,Ultrasonic sensor ,business ,Doppler effect - Abstract
Reliable, user-friendly and convenient sensing is highly desirable when the continuous monitoring of food intake is necessary. In this paper, food intake monitoring was during the processes of chewing and swallowing. Acoustic Doppler sonar (ADS) detected chewing and swallowing events that were non-contact and free from acoustic interference. When a 40 kHz ultrasonic beam was focused on the lower jaw and neck, movements of the chin and neck cause Doppler frequency shifts and an amplitude envelope modulation of ultrasonic signals. Hence, it was possible to detect chewing and swallowing events using Doppler frequency shifts in the received ultrasound signals. To prevent suspicious chew events caused by talking from being recognized as food intake events, the log-filter bank energy of the voice band was also taken into consideration. Automatic detection of chewing and swallowing events was achieved via an artificial neural network. The experimental results showed that the proposed ADS-based food intake detection method yielded promising results with maximum recognition rates of 91.4% and 78.4% for chewing and swallowing, respectively. As a result, it was confirmed that the proposed food intake detection method using ultrasonic Doppler yielded high rates of recognition without discomfort to the user from continuous skin contact.
- Published
- 2017
- Full Text
- View/download PDF
4. Restricted Boltzmann Machine-Based Voice Conversion for Nonparallel Corpus
- Author
-
Ki-Seung Lee
- Subjects
Restricted Boltzmann machine ,Training set ,business.industry ,Computer science ,Applied Mathematics ,Speech recognition ,Feature extraction ,020206 networking & telecommunications ,Pattern recognition ,Speech corpus ,Probability density function ,02 engineering and technology ,Conditional probability distribution ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Distribution (mathematics) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Electrical and Electronic Engineering ,0305 other medical science ,business - Abstract
A large amount of parallel training corpus is necessary for robust, high-quality voice conversion. However, such parallel data may not always be available. This letter presents a new voice conversion method that needs no parallel speech corpus, and adopts a restricted Boltzmann machine (RBM) to represent the distribution of the spectral features derived from a target speaker. A linear transformation was employed to convert the spectral and delta features. A conversion function was obtained by maximizing the conditional probability density function with respect to the target RBM. A feasibility test was carried out on the OGI VOICES corpus. Results from the subjective listening tests and the objective results both showed that the proposed method outperforms the conventional GMM-based method.
- Published
- 2017
- Full Text
- View/download PDF
5. Joint Audio-Ultrasound Food Recognition for Noisy Environments
- Author
-
Ki-Seung Lee
- Subjects
Adult ,Male ,Computer science ,A-weighting ,Signal-To-Noise Ratio ,01 natural sciences ,Pattern Recognition, Automated ,Set (abstract data type) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Eating ,Young Adult ,Signal-to-noise ratio ,Health Information Management ,Feature (machine learning) ,Humans ,Ultrasonics ,Electrical and Electronic Engineering ,Linear combination ,Noise measurement ,business.industry ,010401 analytical chemistry ,Pattern recognition ,Signal Processing, Computer-Assisted ,Equipment Design ,Middle Aged ,0104 chemical sciences ,Computer Science Applications ,Noise ,Food ,Female ,Artificial intelligence ,Neural Networks, Computer ,0305 other medical science ,business ,Joint (audio engineering) ,Biotechnology - Abstract
Continuous recognition of ingested foods without user intervention is very useful for the pre-screening of obesity and diet-related disease. An automatic food recognition method that combines the two modalities of audio and ultrasonic signals (US) is proposed in this study. Under a noise-free environment, classification accuracy of an audio-only recognizer is generally higher than that of US-only recognizers, but the performance of US recognizers is unaffected by acoustic noise levels. In the recognition system presented herein, the likelihood score of the audio-US feature was given by a linear combination of class-conditional observation log-likelihoods for two classifiers, using the appropriate weights. We developed a weighting process adaptive to signal-to-noise ratios (SNRs). The main objective here involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. A feasibility test was conducted to verify the usefulness of the proposed method by conducting recognition experiments on seven types of food. The performance was compared with conventional methods that use in-ear and throat microphones. The proposed method yielded remarkable levels of recognition performance of 90.13% for artificially added noise and 89.67% under actual noisy environments, when the SNR ranged from 0 to 20 dB.
- Published
- 2019
6. HMM-Based Maximum Likelihood Frame Alignment for Voice Conversion from a Nonparallel Corpus
- Author
-
Ki-Seung Lee
- Subjects
Computer science ,Maximum likelihood ,Speech recognition ,Frame (networking) ,020206 networking & telecommunications ,02 engineering and technology ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Artificial Intelligence ,Hardware and Architecture ,Maximum likelihood criterion ,0202 electrical engineering, electronic engineering, information engineering ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,0305 other medical science ,Hidden Markov model ,Software - Published
- 2017
- Full Text
- View/download PDF
7. Compensation for Shot-to-Shot Variations in Laser Pulse Energy for Photoacoustic Imaging
- Author
-
Ki-Seung Lee
- Subjects
Photoacoustic effect ,Materials science ,business.industry ,Photoacoustic imaging in biomedicine ,02 engineering and technology ,Laser ,Electronic, Optical and Magnetic Materials ,Compensation (engineering) ,law.invention ,Photoacoustic Doppler effect ,020210 optoelectronics & photonics ,Optics ,law ,Shot (pellet) ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,business ,Pulse energy - Published
- 2017
- Full Text
- View/download PDF
8. Field-free switching of perpendicular magnetization through spin–orbit torque in antiferromagnet/ferromagnet/oxide structures
- Author
-
Hyun-Woo Lee, Kyoung-Whan Kim, Seung-heon Chris Baek, Gyungchoon Go, Chang Geun Yang, Young Wan Oh, Ki-Seung Lee, Y. M. Kim, Byong-Guk Park, Eun Sang Park, Hae Yeon Lee, Kyung Jin Lee, Jong-Ryul Jeong, Byoung-Chul Min, and Kyeong Dong Lee
- Subjects
Coupling ,Physics ,Field (physics) ,Spintronics ,Condensed matter physics ,Biomedical Engineering ,Spin-transfer torque ,Bioengineering ,02 engineering and technology ,021001 nanoscience & nanotechnology ,Condensed Matter Physics ,01 natural sciences ,Atomic and Molecular Physics, and Optics ,Magnetic field ,Exchange bias ,Ferromagnetism ,0103 physical sciences ,Antiferromagnetism ,Condensed Matter::Strongly Correlated Electrons ,General Materials Science ,Astrophysics::Earth and Planetary Astrophysics ,Electrical and Electronic Engineering ,010306 general physics ,0210 nano-technology - Abstract
Spin-orbit torques arising from the spin-orbit coupling of non-magnetic heavy metals allow electrical switching of perpendicular magnetization. However, the switching is not purely electrical in laterally homogeneous structures. An extra in-plane magnetic field is indeed required to achieve deterministic switching, and this is detrimental for device applications. On the other hand, if antiferromagnets can generate spin-orbit torques, they may enable all-electrical deterministic switching because the desired magnetic field may be replaced by their exchange bias. Here we report sizeable spin-orbit torques in IrMn/CoFeB/MgO structures. The antiferromagnetic IrMn layer also supplies an in-plane exchange bias field, which enables all-electrical deterministic switching of perpendicular magnetization without any assistance from an external magnetic field. Together with sizeable spin-orbit torques, these features make antiferromagnets a promising candidate for future spintronic devices. We also show that the signs of the spin-orbit torques in various IrMn-based structures cannot be explained by existing theories and thus significant theoretical progress is required.
- Published
- 2016
- Full Text
- View/download PDF
9. Position-Dependent Crosstalk Cancellation Using Space Partitioning
- Author
-
Ki-Seung Lee
- Subjects
Acoustics and Ultrasonics ,Artificial neural network ,Computer science ,Acoustics ,Filter (signal processing) ,computer.software_genre ,Correlation ,Computer Science::Sound ,Position (vector) ,Active listening ,Electrical and Electronic Engineering ,Space partitioning ,Audio signal processing ,Algorithm ,computer ,Communication channel - Abstract
The present study tested a new stereo playback system that effectively cancels cross-talk signals at an arbitrary listening position. Such a playback system was implemented by integrating listener position tracking techniques and crosstalk cancellation techniques. The entire listening space was partitioned into a number of non-overlapped cells and a crosstalk cancellation filter was assigned to each cell. The listening space partitions and the corresponding crosstalk cancellation filters were constructed by maximizing the average channel separation ratio (CSR). Since the proposed method employed cell-based crosstalk cancellation, estimation of the exact position of the listener was not necessary. Instead, it was only necessary to determine the cell in which the listener was located. This was achieved by simply employing an artificial neural network (ANN) where the time delay to each pair of microphones was used as the ANN input and the ANN output corresponded to the index of cells. The experimental results showed that more than 95% of the experimental listening space had a CSR ≥ 10 dB when the number of clusters exceeded 12. Under these conditions, the correlation between the true directions of the virtual sound sources and the directions recognized by the subjects was greater than 0.9.
- Published
- 2013
- Full Text
- View/download PDF
10. A Relevant Distance Criterion for Interpolation of Head-Related Transfer Functions
- Author
-
Seok-Pil Lee and Ki-Seung Lee
- Subjects
Acoustics and Ultrasonics ,Human head ,Distortion ,Acoustics ,Mel-frequency cepstrum ,Electrical and Electronic Engineering ,Horizontal plane ,Transfer function ,Head-related transfer function ,Binaural recording ,Mathematics ,Interpolation - Abstract
In binaural synthesis, in order to realize more precise and accurate spatial sound, it would be desirable to measure a large number of the head-related transfer functions (HRTFs) in various directions. To reduce the size of the HRTFs, interpolation is often employed, where the HRTF for any direction can be obtained by a limited number of the representative HRTFs. In this paper, it is determined which distortion measure for interpolation of the HRTFs in the horizontal plane is most suitable for predicting audible differences in sound location. Four kinds of HRTF sets, measured using three human heads and one mannequin (KEMAR), were prepared for this study. Using various objective distortion criteria, the differences between interpolated and measured HRTFs were computed. These were then related to the results from the listening tests through receiver operator characteristic (ROC) curves. The results of the present study indicated that for the HRTF sets measured from three human heads, the best predictor of performance was obtained using the distortion measurement computed from the mel-cepstral coefficients, whereas the distortion measurement associated with interaural time delay predicted audible differences in sound location reasonably well for the KEMAR HRTF set. A feasibility test was conducted to verify the usefulness of the selected distortion measurement.
- Published
- 2011
- Full Text
- View/download PDF
11. A real-time audio system for adjusting the sweet spot to the listener's position
- Author
-
Seok-Pil Lee and Ki-Seung Lee
- Subjects
Engineering ,Reverberation ,business.industry ,Acoustics ,Speech recognition ,Direction of arrival ,computer.software_genre ,Rendering (computer graphics) ,law.invention ,Microprocessor ,Stereophonic sound ,law ,Media Technology ,Loudspeaker ,Electrical and Electronic Engineering ,business ,Audio signal processing ,computer ,Digital signal processing - Abstract
In the present study, a new stereophonic playback system was proposed, where the cross-talk signals would be reasonably cancelled at an arbitrary listener position. The system was composed of two major parts: the listener position tracking part and the sound rendering part. The position of the listener was estimated using acoustic signals from the listener (i.e. voice or hand-clapping signals). A direction of arrival (DOA) algorithm was adopted to estimate the directions of acoustic sources where the room reverberation effects were taken into consideration. A Crosstalk cancellation filter was designed using a free-field model. To determine the maximum tolerable shift of the listener position, a quantitative analysis of the channel separation ratio according to the displacement of the listener position was performed. Prototype hardware was implemented using a microprocessor board, a DSP board, a multi-channel ADC board and an analog frontend. The results showed that the average mean square error between the true direction of a listener and the estimated direction was about 5 degrees. More than 80% of the tested subjects indicated that better stereo images were obtained by the proposed system, compared with the non-processed signals.
- Published
- 2010
- Full Text
- View/download PDF
12. Statistical Approach for Voice Personality Transformation
- Author
-
Ki-Seung Lee
- Subjects
Probabilistic classification ,Acoustics and Ultrasonics ,business.industry ,Speech recognition ,Vector quantization ,Pattern recognition ,Linear predictive coding ,Speech processing ,Speaker diarisation ,Transformation (function) ,Cepstrum ,Artificial intelligence ,Electrical and Electronic Engineering ,Prosody ,business ,Mathematics - Abstract
A voice transformation method which changes the source speaker's utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker's voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods
- Published
- 2007
- Full Text
- View/download PDF
13. Robust Recognition of Fast Speech
- Author
-
Ki-Seung Lee
- Subjects
Signal processing ,Voice activity detection ,Degree (graph theory) ,Computer science ,Speech recognition ,Maximum likelihood ,Word error rate ,Speech processing ,Artificial Intelligence ,Hardware and Architecture ,Cepstrum ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Software ,Utterance - Abstract
This letter describes a robust speech recognition system for recognizing fast speech by stretching the length of the utterance in the cepstrum domain. The degree of stretching for an utterance is determined by its rate of speech (ROS), which is based on a maximum likelihood (ML) criterion. The proposed method was evaluated on 10-digits mobile phone numbers. The results of the simulation show that the overall error rate was reduced by 17.8% when the proposed method was employed.
- Published
- 2006
- Full Text
- View/download PDF
14. MLP-based phone boundary refining for a TTS database
- Author
-
Ki-Seung Lee
- Subjects
Acoustics and Ultrasonics ,Artificial neural network ,Database ,Computer science ,business.industry ,Speech recognition ,Speech synthesis ,Pattern recognition ,Speech corpus ,computer.software_genre ,Speech processing ,Viterbi algorithm ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Phone ,Multilayer perceptron ,symbols ,Artificial intelligence ,Electrical and Electronic Engineering ,Hidden Markov model ,business ,computer - Abstract
The automatic labeling of a large speech corpus plays an important role in the development of a high-quality Text-To-Speech (TTS) synthesis system. This paper describes a method for the automatic labeling of speech signals, which mainly involves the construction of a large database for a TTS synthesis system. The main objective of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model. A multilayer perceptron (MLP) was employed to refine the phone boundaries. To increase the accuracy of phoneme segmentation, several specialized MLPs were individually trained based on phonetic transition. The optimum partitioning of the entire phonetic transition space and the corresponding MLPs were constructed from the standpoint of minimizing the overall deviation from the hand-labeling position. The experimental results showed that more than 93% of all phone boundaries have a boundary deviation from a reference position smaller than 20 ms. We also confirmed that the database constructed using the proposed method produced results that were perceptually comparable to a hand-labeled database, based on subjective listening tests.
- Published
- 2006
- Full Text
- View/download PDF
15. Context-adaptive smoothing for concatenative speech synthesis
- Author
-
Ki-Seung Lee and Sang-Ryong Kim
- Subjects
business.industry ,Computer science ,Applied Mathematics ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Adaptive smoothing ,Context (language use) ,Speech synthesis ,Pattern recognition ,Classification of discontinuities ,computer.software_genre ,Signal Processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Smoothing ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In text-to-speech synthesis, spectral smoothing is often employed to reduce artifacts at unit-joining points. A context-adaptive smoothing method is proposed in this letter, where the amount of smoothing is determined according to context information. Discontinuities at unit boundaries are predicted by a regression tree, and smoothing factors are computed by using predicted discontinuities and real discontinuities at unit boundaries. Experimental results are presented to demonstrate the effectiveness of the proposed method.
- Published
- 2002
- Full Text
- View/download PDF
16. A very low bit rate speech coder based on a recognition/synthesis paradigm
- Author
-
R.V. Cox and Ki-Seung Lee
- Subjects
Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Concatenation ,Speech coding ,Speech synthesis ,Data_CODINGANDINFORMATIONTHEORY ,Intelligibility (communication) ,computer.software_genre ,Speech processing ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Prosody ,computer ,Encoder ,Software ,Pitch contour - Abstract
Previous studies have shown that a concatenative speech synthesis system with a large database produces more natural sounding speech. We apply this paradigm to the design of improved very low bit rate speech coders (sub 1000 b/s). The proposed speech coder consists of unit selection, prosody coding, prosody modification and waveform concatenation. The encoder selects the best unit sequence from a large database and compresses the prosody information. The transmitted parameters include unit indices and the prosody information. To increase naturalness as well as intelligibility, two costs are considered in the unit selection process: an acoustic target cost and a concatenation cost. A rate-distortion-based piecewise linear approximation is proposed to compress the pitch contour. The decoder concatenates the set of units, and then synthesizes the resultant sequence of speech frames using the harmonic+noise model (HNM) scheme. Before concatenating units, prosody modification which includes pitch shifting and gain modification is applied to match those of the input speech. With single speaker stimuli, a comparison category rating (CCR) test shows that the performance of the proposed coder is close to that of the 2400-b/s MELP coder at an average bit rate of about 800-b/s during talk spurts.
- Published
- 2001
- Full Text
- View/download PDF
17. Temporal Decomposition Based on a Rate-Distortion Criterion
- Author
-
Ki-Seung Lee
- Subjects
business.industry ,Applied Mathematics ,Data_MISCELLANEOUS ,Pattern recognition ,Spectral distortion ,Amplitude distortion ,Speech processing ,Rate–distortion theory ,Signal Processing ,Bit rate ,Spectral analysis ,Artificial intelligence ,Electrical and Electronic Engineering ,Rate distortion ,business ,Algorithm ,Mathematics - Abstract
This letter addresses a temporal decomposition (TD) technique that is based on a rate-distortion criterion. In the proposed TD scheme, a set of interpolation functions is constructed from a given training corpus, and the optimum target points are found in the sense of minimizing, not only spectral distortion, but also bit rates. The results of the simulation show that an average spectral distortion of about 1.4 dB can be achieved at an average bit rate of about 8 bits/frame.
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.