Author: "Lee, Seolhwa" / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

1. Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

Author: Park, Chanjun, Koo, Seonmin, Lee, Seolhwa, Seo, Jaehyung, Eo, Sugyeong, Moon, Hyeonseok, and Lim, Heuiseok
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Data-centric AI approach aims to enhance the model performance without modifying the model and has been shown to impact model performance positively. While recent attention has been given to data-centric AI based on synthetic data, due to its potential for performance improvement, data-centric AI has long been exclusively validated using real-world data and publicly available benchmark datasets. In respect of this, data-centric AI still highly depends on real-world data, and the verification of models using synthetic data has not yet been thoroughly carried out. Given the challenges above, we ask the question: Does data quality control (noise injection and balanced data), a data-centric AI methodology acclaimed to have a positive impact, exhibit the same positive impact in models trained solely with synthetic data? To address this question, we conducted comparative analyses between models trained on synthetic and real-world data based on grammatical error correction (GEC) task. Our experimental results reveal that the data quality control method has a positive impact on models trained with real-world data, as previously reported in existing studies, while a negative impact is observed in models trained solely on synthetic data., Accepted for Data-centric Machine Learning Research (DMLR) Workshop at ICML 2023
Published: 2023

2. Private Meeting Summarization Without Performance Loss

Author: Lee, Seolhwa and Søgaard, Anders
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computation and Language (cs.CL), Cryptography and Security (cs.CR)
Abstract: Meeting summarization has an enormous business potential, but in addition to being a hard problem, roll-out is challenged by privacy concerns. We explore the problem of meeting summarization under differential privacy constraints and find, to our surprise, that while differential privacy leads to slightly lower performance on in-sample data, differential privacy improves performance when evaluated on unseen meeting types. Since meeting summarization systems will encounter a great variety of meeting types in practical employment scenarios, this observation makes safe meeting summarization seem much more feasible. We perform extensive error analysis and identify potential risks in meeting summarization under differential privacy, including a faithfulness analysis., SIGIR23 Main conference
Published: 2023

3. Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study

Author: Cao, Yong, Zhou, Li, Lee, Seolhwa, Cabello, Laura, Chen, Min, and Hershcovich, Daniel
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: The recent release of ChatGPT has garnered widespread recognition for its exceptional ability to generate human-like responses in dialogue. Given its usage by users from various nations and its training on a vast multilingual corpus that incorporates diverse cultural and societal norms, it is crucial to evaluate its effectiveness in cultural adaptation. In this paper, we investigate the underlying cultural background of ChatGPT by analyzing its responses to questions designed to quantify human cultural differences. Our findings suggest that, when prompted with American context, ChatGPT exhibits a strong alignment with American culture, but it adapts less effectively to other cultural contexts. Furthermore, by using different prompts to probe the model, we show that English prompts reduce the variance in model responses, flattening out cultural differences and biasing them towards American culture. This study provides valuable insights into the cultural implications of ChatGPT and highlights the necessity of greater diversity and cultural awareness in language technologies., C3NLP@EACL 2023
Published: 2023

4. Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

Author: Park, Chanjun, Moon, Hyeonseok, Lee, Seolhwa, Seo, Jaehyung, Eo, Sugyeong, and Lim, Heuiseok
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Leaderboard systems allow researchers to objectively evaluate Natural Language Processing (NLP) models and are typically used to identify models that exhibit superior performance on a given task in a predetermined setting. However, we argue that evaluation on a given test dataset is just one of many performance indications of the model. In this paper, we claim leaderboard competitions should also aim to identify models that exhibit the best performance in a real-world setting. We highlight three issues with current leaderboard systems: (1) the use of a single, static test set, (2) discrepancy between testing and real-world application (3) the tendency for leaderboard-centric competition to be biased towards the test set. As a solution, we propose a new paradigm of leaderboard systems that addresses these issues of current leaderboard system. Through this study, we hope to induce a paradigm shift towards more real -world-centric leaderboard competitions.
Published: 2023

5. What does the Failure to Reason with 'Respectively' in Zero/Few-Shot Settings Tell Us about Language Models?

Author: Cui, Ruixiang, Lee, Seolhwa, Hershcovich, Daniel, and Søgaard, Anders
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Humans can effortlessly understand the coordinate structure of sentences such as "Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle, respectively". In the context of natural language inference (NLI), we examine how language models (LMs) reason with respective readings (Gawron and Kehler, 2004) from two perspectives: syntactic-semantic and commonsense-world knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally occurring dataset NatResNLI to encompass various explicit and implicit realizations of "respectively". We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision. While few-shot learning is easy in the presence of explicit cues, longer training is required when the reading is evoked implicitly, leaving models to rely on common sense inferences. Furthermore, our fine-grained analysis indicates models fail to generalize across different constructions. To conclude, we demonstrate that LMs still lag behind humans in generalizing to the long tail of linguistic constructions., Comment: To appear at ACL 2023
Published: 2023
Full Text: View/download PDF

6. FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue

Author: Park, Chanjun, Jang, Yoonna, Lee, Seolhwa, Park, Sungjin, and Lim, Heuiseok
Subjects: FOS: Computer and information sciences, Computer Science - Computers and Society, Computer Science - Computation and Language, Computers and Society (cs.CY), Computation and Language (cs.CL)
Abstract: We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models. A persona-based dialogue system that is embedded in NAO provides an interesting and consistent multi-turn dialogue for users. Also, an grammar error correction system promotes improvement in grammar skills of the users. Thus, our system enables personalized learning based on persona dialogue and facilitates grammar learning of a user using grammar error feedback. Furthermore, we verified whether FreeTalky provides practical help in alleviating xenoglossophobia by replacing the real human in the conversation with a NAO robot, through human evaluation., Accepted for Artificial Intelligence for Education (AI4EDU) workshop at AAAI 2022
Published: 2021

7. PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

Author: Park, Chanjun, Jang, Yoonna, Lee, Seolhwa, Seo, Jaehyung, Yang, Kisu, and Lim, Heuiseok
Subjects: FOS: Computer and information sciences, Computer Science - Computers and Society, Computer Science - Computation and Language, Computers and Society (cs.CY), Computer Science - Human-Computer Interaction, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)
Abstract: Augmentative and alternative communication (AAC) is a practical means of communication for people with language disabilities. In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities. PicTalky can process both text and pictograms more accurately by connecting a series of neural-based NLP modules. Moreover, we perform quantitative and qualitative analyses on the essential features of PicTalky. It is expected that those suffering from language problems will be able to express their intentions or desires more easily and improve their quality of life by using this service. We have made the models freely available alongside a demonstration of the Web interface. Furthermore, we implemented robotics AAC for the first time by applying PicTalky to the NAO robot., Accepted in AACL 2022 Demo Track
Published: 2021

8. Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC

Author: Park, Chanjun, Shim, Midan, Eo, Sugyeong, Lee, Seolhwa, Seo, Jaehyung, Moon, Hyeonseok, and Lim, Heuiseok
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Machine translation (MT) system aims to translate source language into target language. Recent studies on MT systems mainly focus on neural machine translation (NMT). One factor that significantly affects the performance of NMT is the availability of high-quality parallel corpora. However, high-quality parallel corpora concerning Korean are relatively scarce compared to those associated with other high-resource languages, such as German or Italian. To address this problem, AI Hub recently released seven types of parallel corpora for Korean. In this study, we conduct an in-depth verification of the quality of corresponding parallel corpora through Linguistic Inquiry and Word Count (LIWC) and several relevant experiments. LIWC is a word-counting software program that can analyze corpora in multiple ways and extract linguistic features as a dictionary base. To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT. Our findings suggest the direction of further research toward obtaining the improved quality parallel corpora through our correlation analysis in LIWC and NMT performance.
Published: 2021
Full Text: View/download PDF

9. How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus

Author: Park, Chanjun, Lee, Seolhwa, Moon, Hyeonseok, Eo, Sugyeong, Seo, Jaehyung, and Lim, Heuiseok
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: This paper proposes a tool for efficiently constructing high-quality parallel corpora with minimizing human labor and making this tool publicly available. Our proposed construction process is based on neural machine translation (NMT) to allow for it to not only coexist with human translation, but also improve its efficiency by combining data quality control with human translation in a data-centric approach., Comment: Accepted for Data-centric AI workshop at NeurIPS 2021
Published: 2021
Full Text: View/download PDF

10. An Evaluation Protocol for Generative Conversational Systems

Author: Lee, Seolhwa, Lim, Heuiseok, and Sedoc, Jo��o
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: There is a multitude of novel generative models for open-domain conversational systems; however, there is no systematic evaluation of different systems. Systematic comparisons require consistency in experimental design, evaluation sets, conversational systems and their outputs, and statistical analysis. We lay out a protocol for the evaluation of conversational models using head-to-head pairwise comparison. We analyze ten recent models that claim state-of-the-art performance using a paired head-to-head performance (win-loss-tie) on five evaluation datasets. Our findings show that DialoGPT and Blender are superior systems using Bradley-Terry model and TrueSkill ranking methods. These findings demonstrate the feasibility of our protocol to evaluate conversational agents and evaluation sets. Finally, we make all code and evaluations publicly available for researchers to compare their model to other state-of-the-art dialog models.
Published: 2020

11. EmotionX-KU: BERT-Max based Contextual Emotion Classifier

Author: Yang, Kisu, Lee, Dongyub, Whang, Taesun, Lee, Seolhwa, and Lim, Heuiseok
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: We propose a contextual emotion classifier based on a transferable language model and dynamic max pooling, which predicts the emotion of each utterance in a dialogue. A representative emotion analysis task, EmotionX, requires to consider contextual information from colloquial dialogues and to deal with a class imbalance problem. To alleviate these problems, our model leverages the self-attention based transferable language model and the weighted cross entropy loss. Furthermore, we apply post-training and fine-tuning mechanisms to enhance the domain adaptability of our model and utilize several machine learning techniques to improve its performance. We conduct experiments on two emotion-labeled datasets named Friends and EmotionPush. As a result, our model outperforms the previous state-of-the-art model and also shows competitive performance in the EmotionX 2019 challenge. The code will be available in the Github page., Comment: The 7th International Workshop on Natural Language Processing for Social Media (in conjunction with IJCAI 2019); figure modified
Published: 2019
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Lee, Seolhwa"'

1. Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

2. Private Meeting Summarization Without Performance Loss

3. Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study

4. Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

5. What does the Failure to Reason with 'Respectively' in Zero/Few-Shot Settings Tell Us about Language Models?

6. FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue

7. PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

8. Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC

9. How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus

10. An Evaluation Protocol for Generative Conversational Systems

11. EmotionX-KU: BERT-Max based Contextual Emotion Classifier

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

11 results on '"Lee, Seolhwa"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources