Author: "Kim, Taesu" / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Kim, Taesu"' showing total 7 results

Start Over Author "Kim, Taesu" Topic fos: computer and information sciences

7 results on '"Kim, Taesu"'

1. Cross-Speaker Emotion Transfer by Manipulating Speech Style Latents

Author: Jo, Suhee, Lee, Younggun, Shin, Yookyung, Hwang, Yeongtae, and Kim, Taesu
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic in latent style space. By leveraging only a few labeled samples, we generate emotional speech from reading-style speech without losing the speaker identity. Furthermore, emotion strength is readily controllable using a scalar value, providing an intuitive way for users to manipulate speech. Experimental results show the proposed method affords superior performance in terms of expressiveness, naturalness, and controllability, preserving speaker identity., accepted to ICASSP 2023
Published: 2023

2. Squeezing Large-Scale Diffusion Models for Mobile

Author: Choi, Jiwoong, Kim, Minkyu, Ahn, Daehyun, Kim, Taesu, Kim, Yulhwa, Jo, Dongwon, Jeon, Hyesung, Kim, Jae-Joon, and Kim, Hyungjun
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more than one billion parameters to mobile devices poses distinctive challenges due to the limited computational and memory resources, which may vary according to the device. In this paper, we present the challenges and solutions for deploying Stable Diffusion on mobile devices with TensorFlow Lite framework, which supports both iOS and Android devices. The resulting Mobile Stable Diffusion achieves the inference latency of smaller than 7 seconds for a 512x512 image generation on Android devices with mobile GPUs., Comment: 7 pages, 8 figures, ICML 2023 Workshop on Challenges in Deployable Generative AI
Published: 2023
Full Text: View/download PDF

3. Affective responses to chromatic ambient light in a vehicle

Author: Kim, Taesu, Choi, Kyungah, and Suk, Hyeon-Jeong
Subjects: FOS: Computer and information sciences, Computer Science - Human-Computer Interaction, Human-Computer Interaction (cs.HC)
Abstract: This study investigates the emotional responses to the color of vehicle interior lighting using self-assessment and electroencephalography (EEG). The study was divided into two sessions: the first session investigated the potential of ambient lighting colors, and the second session was used to develop in-vehicle lighting color guidelines. Every session included thirty subjects. In the first session, four lighting colors were assessed using seventeen adjectives. As a result, 'Preference, Softness, Brightness, and Uniqueness were found to be the four factors that best characterize the atmospheric properties of interior lighting in vehicles. Ambient illumination, according to EEG data, increased people's arousal and lowered their alpha waves. The following session investigated a wider spectrum of colors using four factors extracted from the previous session. As a result, bluish and purplish lighting colors had the highest preference and uniqueness among ten lighting colors. Green received an intermediate preference and a high uniqueness score. With its great brightness and softness, Neutral White also achieved an intermediate preference rating. Despite receiving a low preference rating, warm colors were considered to be soft. Red was the least preferred color, but its uniqueness and roughness were highly rated. This study is expected to provide a basic theory on emotional lighting guidelines in the vehicle context, providing manufacturers with objective rationale.
Published: 2022

4. GP22: A Car Styling Dataset for Automotive Designers

Author: Lee, Gyunpyo, Kim, Taesu, and Suk, Hyeon-Jeong
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: An automated design data archiving could reduce the time wasted by designers from working creatively and effectively. Though many datasets on classifying, detecting, and instance segmenting on car exterior exist, these large datasets are not relevant for design practices as the primary purpose lies in autonomous driving or vehicle verification. Therefore, we release GP22, composed of car styling features defined by automotive designers. The dataset contains 1480 car side profile images from 37 brands and ten car segments. It also contains annotations of design features that follow the taxonomy of the car exterior design features defined in the eye of the automotive designer. We trained the baseline model using YOLO v5 as the design feature detection model with the dataset. The presented model resulted in an mAP score of 0.995 and a recall of 0.984. Furthermore, exploration of the model performance on sketches and rendering images of the car side profile implies the scalability of the dataset for design purposes., 5th CVFAD workshop, CVPR2022
Published: 2022

5. Affective Role of the Future Autonomous Vehicle Interior

Author: Kim, Taesu, Lee, Gyunpyo, Hong, Jiwoo, and Suk, Hyeon-Jeong
Subjects: FOS: Computer and information sciences, Computer Science - Human-Computer Interaction, Human-Computer Interaction (cs.HC)
Abstract: Recent advancements in autonomous technology allow for new opportunities in vehicle interior design. Such a shift in in-vehicle activity suggests vehicle interior spaces should provide an adequate manner by considering users' affective desires. Therefore, this study aims to investigate the affective role of future vehicle interiors. Thirty one participants in ten focus groups were interviewed about challenges they face regarding their current vehicle interior and expectations they have for future vehicles. Results from content analyses revealed the affective role of future vehicle interiors. Advanced exclusiveness and advanced convenience were two primary aspects identified. The identified affective roles of each aspect are a total of eight visceral levels, four visceral levels each, including focused, stimulating, amused, pleasant, safe, comfortable, accommodated, and organized. We expect the results from this study to lead to the development of affective vehicle interiors by providing the fundamental knowledge for developing conceptual direction and evaluating its impact on user experiences., Comment: 15 pages, 4 figures, 2 tables
Published: 2022
Full Text: View/download PDF

6. Voice Imitating Text-to-Speech Neural Networks

Author: Lee, Younggun, Kim, Taesu, and Lee, Soo-Young
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We propose a neural text-to-speech (TTS) model that can imitate a new speaker's voice using only a small amount of speech sample. We demonstrate voice imitation using only a 6-seconds long speech sample without any other information such as transcripts. Our model also enables voice imitation instantly without additional training of the model. We implemented the voice imitating TTS model by combining a speaker embedder network with a state-of-the-art TTS model, Tacotron. The speaker embedder network takes a new speaker's speech sample and returns a speaker embedding. The speaker embedding with a target sentence are fed to Tacotron, and speech is generated with the new speaker's voice. We show that the speaker embeddings extracted by the speaker embedder network can represent the latent structure in different voices. The generated speech samples from our model have comparable voice quality to the ones from existing multi-speaker TTS models.
Published: 2018
Full Text: View/download PDF

7. Learning pronunciation from a foreign language in speech synthesis networks

Author: Lee, Younggun, Shon, Suwon, and Kim, Taesu
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)
Abstract: Although there are more than 6,500 languages in the world, the pronunciations of many phonemes sound similar across the languages. When people learn a foreign language, their pronunciation often reflects their native language's characteristics. This motivates us to investigate how the speech synthesis network learns the pronunciation from datasets from different languages. In this study, we are interested in analyzing and taking advantage of multilingual speech synthesis network. First, we train the speech synthesis network bilingually in English and Korean and analyze how the network learns the relations of phoneme pronunciation between the languages. Our experimental result shows that the learned phoneme embedding vectors are located closer if their pronunciations are similar across the languages. Consequently, the trained networks can synthesize the English speakers' Korean speech and vice versa. Using this result, we propose a training framework to utilize information from a different language. To be specific, we pre-train a speech synthesis network using datasets from both high-resource language and low-resource language, then we fine-tune the network using the low-resource language dataset. Finally, we conducted more simulations on 10 different languages to show it is generally extendable to other languages.
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Kim, Taesu"'

1. Cross-Speaker Emotion Transfer by Manipulating Speech Style Latents

2. Squeezing Large-Scale Diffusion Models for Mobile

3. Affective responses to chromatic ambient light in a vehicle

4. GP22: A Car Styling Dataset for Automotive Designers

5. Affective Role of the Future Autonomous Vehicle Interior

6. Voice Imitating Text-to-Speech Neural Networks

7. Learning pronunciation from a foreign language in speech synthesis networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

7 results on '"Kim, Taesu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources