Author: "Imperial, Joseph Marvin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Imperial, Joseph Marvin"' showing total 48 results

Start Over Author "Imperial, Joseph Marvin"

48 results on '"Imperial, Joseph Marvin"'

1. SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning

Author: Imperial, Joseph Marvin and Madabushi, Harish Tayyar
Subjects: Computer Science - Computation and Language
Abstract: Specialized lexicons are collections of words with associated constraints such as special definitions, specific roles, and intended target audiences. These constraints are necessary for content generation and documentation tasks (e.g., writing technical manuals or children's reading materials), where the goal is to reduce the ambiguity of text content and increase its overall readability for a specific group of audience. Understanding how large language models can capture these constraints can help researchers build better, more impactful tools for wider use beyond the NLP community. Towards this end, we introduce SpeciaLex, a benchmark for evaluating a language model's ability to follow specialized lexicon-based constraints across 18 diverse subtasks with 1,785 test instances covering core tasks of Checking, Identification, Rewriting, and Open Generation. We present an empirical evaluation of 15 open and closed-source LLMs and discuss insights on how factors such as model scale, openness, setup, and recency affect performance upon evaluating with the benchmark., Comment: Camera-ready for EMNLP 2024 (Findings)
Published: 2024

2. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Author: Lovenia, Holy, Mahendra, Rahmad, Akbar, Salsabil Maulana, Miranda, Lester James V., Santoso, Jennifer, Aco, Elyanah, Fadhilah, Akhdan, Mansurov, Jonibek, Imperial, Joseph Marvin, Kampman, Onno P., Moniz, Joel Ruben Antony, Habibi, Muhammad Ravi Shulthan, Hudi, Frederikus, Montalan, Railey, Ignatius, Ryan, Lopo, Joanito Agili, Nixon, William, Karlsson, Börje F., Jaya, James, Diandaru, Ryandito, Gao, Yuze, Amadeus, Patrick, Wang, Bin, Cruz, Jan Christian Blaise, Whitehouse, Chenxi, Parmonangan, Ivan Halim, Khelli, Maria, Zhang, Wenyu, Susanto, Lucky, Ryanda, Reynard Adha, Hermawan, Sonny Lazuardi, Velasco, Dan John, Kautsar, Muhammad Dehan Al, Hendria, Willy Fitra, Moslem, Yasmin, Flynn, Noah, Adilazuarda, Muhammad Farid, Li, Haochen, Lee, Johanes, Damanhuri, R., Sun, Shuo, Qorib, Muhammad Reza, Djanibekov, Amirbek, Leong, Wei Qi, Do, Quyet V., Muennighoff, Niklas, Pansuwan, Tanrada, Putra, Ilham Firdausi, Xu, Yan, Tai, Ngee Chia, Purwarianti, Ayu, Ruder, Sebastian, Tjhi, William, Limkonchotiwat, Peerat, Aji, Alham Fikri, Keh, Sedrick, Winata, Genta Indra, Zhang, Ruochen, Koto, Fajri, Yong, Zheng-Xin, and Cahyawijaya, Samuel
Subjects: Computer Science - Computation and Language
Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, we introduce SEACrowd, a collaborative initiative that consolidates a comprehensive resource hub that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in SEA., Comment: https://seacrowd.github.io/ Accepted in EMNLP 2024
Published: 2024

3. Near to Mid-term Risks and Opportunities of Open-Source Generative AI

Author: Eiras, Francisco, Petrov, Aleksandar, Vidgen, Bertie, de Witt, Christian Schroeder, Pizzati, Fabio, Elkins, Katherine, Mukhopadhyay, Supratik, Bibi, Adel, Csaba, Botos, Steibel, Fabro, Barez, Fazl, Smith, Genevieve, Guadagni, Gianluca, Chun, Jon, Cabot, Jordi, Imperial, Joseph Marvin, Nolazco-Flores, Juan A., Landay, Lori, Jackson, Matthew, Röttger, Paul, Torr, Philip H. S., Darrell, Trevor, Lee, Yong Suk, and Foerster, Jakob
Subjects: Computer Science - Machine Learning
Abstract: In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact., Comment: Accepted to ICML'24 as a position paper
Published: 2024

4. Introducing v0.5 of the AI Safety Benchmark from MLCommons

Author: Vidgen, Bertie, Agrawal, Adarsh, Ahmed, Ahmed M., Akinwande, Victor, Al-Nuaimi, Namir, Alfaraj, Najla, Alhajjar, Elie, Aroyo, Lora, Bavalatti, Trupti, Bartolo, Max, Blili-Hamelin, Borhane, Bollacker, Kurt, Bomassani, Rishi, Boston, Marisa Ferrara, Campos, Siméon, Chakra, Kal, Chen, Canyu, Coleman, Cody, Coudert, Zacharie Delpierre, Derczynski, Leon, Dutta, Debojyoti, Eisenberg, Ian, Ezick, James, Frase, Heather, Fuller, Brian, Gandikota, Ram, Gangavarapu, Agasthya, Gangavarapu, Ananya, Gealy, James, Ghosh, Rajat, Goel, James, Gohar, Usman, Goswami, Sujata, Hale, Scott A., Hutiri, Wiebke, Imperial, Joseph Marvin, Jandial, Surgan, Judd, Nick, Juefei-Xu, Felix, Khomh, Foutse, Kailkhura, Bhavya, Kirk, Hannah Rose, Klyman, Kevin, Knotz, Chris, Kuchnik, Michael, Kumar, Shachi H., Kumar, Srijan, Lengerich, Chris, Li, Bo, Liao, Zeyi, Long, Eileen Peters, Lu, Victor, Luger, Sarah, Mai, Yifan, Mammen, Priyanka Mary, Manyeki, Kelvin, McGregor, Sean, Mehta, Virendra, Mohammed, Shafee, Moss, Emanuel, Nachman, Lama, Naganna, Dinesh Jinenhally, Nikanjam, Amin, Nushi, Besmira, Oala, Luis, Orr, Iftach, Parrish, Alicia, Patlak, Cigdem, Pietri, William, Poursabzi-Sangdeh, Forough, Presani, Eleonora, Puletti, Fabrizio, Röttger, Paul, Sahay, Saurav, Santos, Tim, Scherrer, Nino, Sebag, Alice Schoenauer, Schramowski, Patrick, Shahbazi, Abolfazl, Sharma, Vin, Shen, Xudong, Sistla, Vamsi, Tang, Leonard, Testuggine, Davide, Thangarasa, Vithursan, Watkins, Elizabeth Anne, Weiss, Rebecca, Welty, Chris, Wilbers, Tyler, Williams, Adina, Wu, Carole-Jean, Yadav, Poonam, Yang, Xianjun, Zeng, Yi, Zhang, Wenhui, Zhdanov, Fedor, Zhu, Jiacheng, Liang, Percy, Mattson, Peter, and Vanschoren, Joaquin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
Published: 2024

5. Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation

Author: Imperial, Joseph Marvin, Forey, Gail, and Madabushi, Harish Tayyar
Subjects: Computer Science - Computation and Language
Abstract: Domain experts across engineering, healthcare, and education follow strict standards for producing quality content such as technical manuals, medication instructions, and children's reading materials. However, current works in controllable text generation have yet to explore using these standards as references for control. Towards this end, we introduce Standardize, a retrieval-style in-context learning-based framework to guide large language models to align with expert-defined standards. Focusing on English language standards in the education domain as a use case, we consider the Common European Framework of Reference for Languages (CEFR) and Common Core Standards (CCS) for the task of open-ended content generation. Our findings show that models can gain a 45% to 100% increase in precise accuracy across open and commercial LLMs evaluated, demonstrating that the use of knowledge artifacts extracted from standards and integrating them in the generation process can effectively guide models to produce better standard-aligned content., Comment: Camera-ready for EMNLP 2024 (Main)
Published: 2024

6. Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Author: Mayhew, Stephen, Blevins, Terra, Liu, Shuheng, Šuppa, Marek, Gonen, Hila, Imperial, Joseph Marvin, Karlsson, Börje F., Lin, Peiqin, Ljubešić, Nikola, Miranda, LJ, Plank, Barbara, Riabi, Arij, and Pinter, Yuval
Subjects: Computer Science - Computation and Language
Abstract: We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public., Comment: NAACL 2024 Camera-ready
Published: 2023

7. BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages

Author: Imperial, Joseph Marvin and Kochmar, Ekaterina
Subjects: Computer Science - Computation and Language
Abstract: Current research on automatic readability assessment (ARA) has focused on improving the performance of models in high-resource languages such as English. In this work, we introduce and release BasahaCorpus as part of an initiative aimed at expanding available corpora and baseline models for readability assessment in lower resource languages in the Philippines. We compiled a corpus of short fictional narratives written in Hiligaynon, Minasbate, Karay-a, and Rinconada -- languages belonging to the Central Philippine family tree subgroup -- to train ARA models using surface-level, syllable-pattern, and n-gram overlap features. We also propose a new hierarchical cross-lingual modeling approach that takes advantage of a language's placement in the family tree to increase the amount of available training data. Our study yields encouraging results that support previous work showcasing the efficacy of cross-lingual models in low-resource settings, as well as similarities in highly informative linguistic features for mutually intelligible languages., Comment: Final camera-ready paper for EMNLP 2023 (Main)
Published: 2023

8. CebuaNER: A New Baseline Cebuano Named Entity Recognition Model

Author: Pilar, Ma. Beatrice Emanuela, Papas, Ellyza Mari, Buenaventura, Mary Loise, Dedoroy, Dane, Montefalcon, Myron Darrel, Padilla, Jay Rhald, Maceda, Lany, Abisado, Mideth, and Imperial, Joseph Marvin
Subjects: Computer Science - Computation and Language
Abstract: Despite being one of the most linguistically diverse groups of countries, computational linguistics and language processing research in Southeast Asia has struggled to match the level of countries from the Global North. Thus, initiatives such as open-sourcing corpora and the development of baseline models for basic language processing tasks are important stepping stones to encourage the growth of research efforts in the field. To answer this call, we introduce CebuaNER, a new baseline model for named entity recognition (NER) in the Cebuano language. Cebuano is the second most-used native language in the Philippines, with over 20 million speakers. To build the model, we collected and annotated over 4,000 news articles, the largest of any work in the language, retrieved from online local Cebuano platforms to train algorithms such as Conditional Random Field and Bidirectional LSTM. Our findings show promising results as a new baseline model, achieving over 70% performance on precision, recall, and F1 across all entity tags, as well as potential efficacy in a crosslingual setup with Tagalog., Comment: Accepted for PACLIC2023
Published: 2023

9. Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

Author: Imperial, Joseph Marvin and Madabushi, Harish Tayyar
Subjects: Computer Science - Computation and Language
Abstract: Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use. In this study, we select a diverse set of open and closed-source instruction-tuned language models and investigate their performances in writing story completions and simplifying narratives--tasks that teachers perform--using standard-guided prompts controlling text readability. Our extensive findings provide empirical proof of how globally recognized models like ChatGPT may be considered less effective and may require more refined prompts for these generative tasks compared to other open-sourced models such as BLOOMZ and FlanT5--which have shown promising results., Comment: Final camera-ready for EMNLP GEM Workshop 2023
Published: 2023

10. Automatic Readability Assessment for Closely Related Languages

Author: Imperial, Joseph Marvin and Kochmar, Ekaterina
Subjects: Computer Science - Computation and Language
Abstract: In recent years, the main focus of research on automatic readability assessment (ARA) has shifted towards using expensive deep learning-based methods with the primary goal of increasing models' accuracy. This, however, is rarely applicable for low-resource languages where traditional handcrafted features are still widely used due to the lack of existing NLP tools to extract deeper linguistic representations. In this work, we take a step back from the technical component and focus on how linguistic aspects such as mutual intelligibility or degree of language relatedness can improve ARA in a low-resource setting. We collect short stories written in three languages in the Philippines-Tagalog, Bikol, and Cebuano-to train readability assessment models and explore the interaction of data and features in various cross-lingual setups. Our results show that the inclusion of CrossNGO, a novel specialized feature exploiting n-gram overlap applied to languages with high mutual intelligibility, significantly improves the performance of ARA models compared to the use of off-the-shelf large multilingual language models alone. Consequently, when both linguistic representations are combined, we achieve state-of-the-art results for Tagalog and Cebuano, and baseline scores for ARA in Bikol., Comment: Camera-ready version for ACL 2023
Published: 2023

11. Uniform Complexity for Text Generation

Author: Imperial, Joseph Marvin and Madabushi, Harish Tayyar
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have shown promising results in a wide array of generative NLP tasks, such as summarization and machine translation. In the context of narrative generation, however, existing models still do not capture factors that contribute to producing consistent text. For instance, it is logical that a piece of text or a story should be uniformly readable throughout and that this form of complexity should be controllable. As such, if the complexity of an input text prompt is rated first-grade reading level in the Flesch Reading Ease test, then the generated text continuing the plot should also be within this range of complexity. With this in mind, we introduce Uniform Complexity for Text Generation (UCTG), a new benchmark test which raises the challenge of making generative models observe uniform linguistic properties with respect to prompts. We experiment with over 150+ linguistically and cognitively motivated features for evaluating text complexity in humans and generative models. From our results, we find that models such as GPT-2 struggle to preserve the complexity of input prompts used in its generations, even if finetuned with professionally written texts., Comment: Final camera-ready for EMNLP 2023
Published: 2022

12. A Baseline Readability Model for Cebuano

Author: Reyes, Lloyd Lois Antonie, Ibañez, Michael Antonio, Sapinit, Ranz, Hussien, Mohammed, and Imperial, Joseph Marvin
Subjects: Computer Science - Computation and Language
Abstract: In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano's documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87% across all metrics. The feature sets and algorithm used also is similar to previous results in readability assessment for the Filipino language showing potential of crosslingual application. To encourage more work for readability assessment in Philippine languages such as Cebuano, we open-sourced both code and data., Comment: Accepted to BEA Workshop at NAACL 2022
Published: 2022

13. NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space

Author: Imperial, Joseph Marvin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we present a unified model that works for both multilingual and crosslingual prediction of reading times of words in various languages. The secret behind the success of this model is in the preprocessing step where all words are transformed to their universal language representation via the International Phonetic Alphabet (IPA). To the best of our knowledge, this is the first study to favorable exploit this phonological property of language for the two tasks. Various feature types were extracted covering basic frequencies, n-grams, information theoretic, and psycholinguistically-motivated predictors for model training. A finetuned Random Forest model obtained best performance for both tasks with 3.8031 and 3.9065 MAE scores for mean first fixation duration (FFDAvg) and mean total reading time (TRTAvg) respectively.
Published: 2022

14. Under the Microscope: Interpreting Readability Assessment Models for Filipino

Author: Imperial, Joseph Marvin and Ong, Ethel
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Readability assessment is the process of identifying the level of ease or difficulty of a certain piece of text for its intended audience. Approaches have evolved from the use of arithmetic formulas to more complex pattern-recognizing models trained using machine learning algorithms. While using these approaches provide competitive results, limited work is done on analyzing how linguistic variables affect model inference quantitatively. In this work, we dissect machine learning-based readability assessment models in Filipino by performing global and local model interpretation to understand the contributions of varying linguistic features and discuss its implications in the context of the Filipino language. Results show that using a model trained with top features from global interpretation obtained higher performance than the ones using features selected by Spearman correlation. Likewise, we also empirically observed local feature weight boundaries for discriminating reading difficulty at an extremely fine-grained level and their corresponding effects if values are perturbed., Comment: Accepted for oral presentation at PACLIC 2021
Published: 2021

15. Discovering Insights via Hybrid Thematic Analysis: A Case Study on Disaster Risk Reduction and Management for Legazpi City, Albay

Author: Abisado, Mideth, primary, Maceda, Lany, additional, Rodriguez, Ramon, additional, Imperial, Joseph Marvin, additional, Montefalcon, Myron Darrel, additional, Padilla, Jay Rhald, additional, and Ponce, Gizelle, additional
Published: 2023
Full Text: View/download PDF

16. Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts

Author: Imperial, Joseph Marvin and Ong, Ethel
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In order to ensure quality and effective learning, fluency, and comprehension, the proper identification of the difficulty levels of reading materials should be observed. In this paper, we describe the development of automatic machine learning-based readability assessment models for educational Filipino texts using the most diverse set of linguistic features for the language. Results show that using a Random Forest model obtained a high performance of 62.7% in terms of accuracy, and 66.1% when using the optimal combination of feature sets consisting of traditional and syllable pattern-based predictors., Comment: Accepted at ICCE 2021
Published: 2021

17. How Do Pedophiles Tweet? Investigating the Writing Styles and Online Personas of Child Cybersex Traffickers in the Philippines

Author: Imperial, Joseph Marvin
Subjects: Computer Science - Computation and Language
Abstract: One of the most important humanitarian responsibility of every individual is to protect the future of our children. This entails not only protection of physical welfare but also from ill events that can potentially affect the mental well-being of a child such as sexual coercion and abuse which, in worst-case scenarios, can result to lifelong trauma. In this study, we perform a preliminary investigation of how child sex peddlers spread illegal pornographic content and target minors for sexual activities on Twitter in the Philippines using Natural Language Processing techniques. Results of our studies show frequently used and co-occurring words that traffickers use to spread content as well as four main roles played by these entities that contribute to the proliferation of child pornography in the country., Comment: Submitted as a short paper for a conference
Published: 2021

18. BERT Embeddings for Automatic Readability Assessment

Author: Imperial, Joseph Marvin
Subjects: Computer Science - Computation and Language
Abstract: Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task., Comment: Accepted at RANLP 2021
Published: 2021

19. A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance

Author: Imperial, Joseph Marvin and Ong, Ethel
Subjects: Computer Science - Computation and Language
Abstract: Assessing the proper difficulty levels of reading materials or texts in general is the first step towards effective comprehension and learning. In this study, we improve the conventional methodology of automatic readability assessment by incorporating the Word Mover's Distance (WMD) of ranked texts as an additional post-processing technique to further ground the difficulty level given by a model. Results of our experiments on three multilingual datasets in Filipino, German, and English show that the post-processing technique outperforms previous vanilla and ranking-based models using SVM.
Published: 2021

20. A Simple Disaster-Related Knowledge Base for Intelligent Agents

Author: Paulo, Clark Emmanuel, Ramirez, Arvin Ken, Reducindo, David Clarence, Mateo, Rannie Mark, and Imperial, Joseph Marvin
Subjects: Computer Science - Computation and Language
Abstract: In this paper, we describe our efforts in establishing a simple knowledge base by building a semantic network composed of concepts and word relationships in the context of disasters in the Philippines. Our primary source of data is a collection of news articles scraped from various Philippine news websites. Using word embeddings, we extract semantically similar and co-occurring words from an initial seed words list. We arrive at an expanded ontology with a total of 450 word assertions. We let experts from the fields of linguistics, disasters, and weather science evaluate our knowledge base and arrived at an agreeability rate of 64%. We then perform a time-based analysis of the assertions to identify important semantic changes captured by the knowledge base such as the (a) trend of roles played by human entities, (b) memberships of human entities, and (c) common association of disaster-related words. The context-specific knowledge base developed from this study can be adapted by intelligent agents such as chat bots integrated in platforms such as Facebook Messenger for answering disaster-related queries., Comment: 7 tables, 1 figure, presented at 34th Pacific Asia Conference on Language, Information and Computation
Published: 2021

21. Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature

Author: Imperial, Joseph Marvin and Ong, Ethel
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Proper identification of grade levels of children's reading materials is an important step towards effective learning. Recent studies in readability assessment for the English domain applied modern approaches in natural language processing (NLP) such as machine learning (ML) techniques to automate the process. There is also a need to extract the correct linguistic features when modeling readability formulas. In the context of the Filipino language, limited work has been done [1, 2], especially in considering the language's lexical complexity as main features. In this paper, we explore the use of lexical features towards improving the development of readability identification of children's books written in Filipino. Results show that combining lexical features (LEX) consisting of type-token ratio, lexical density, lexical variation, foreign word count with traditional features (TRAD) used by previous works such as sentence length, average syllable length, polysyllabic words, word, sentence, and phrase counts increased the performance of readability models by almost a 5% margin (from 42% to 47.2%). Further analysis and ranking of the most important features were shown to identify which features contribute the most in terms of reading complexity., Comment: 8 tables, 1 figure. Presented at the Philippine Computing Science Congress 2020
Published: 2021

22. Sentiment Analysis of Typhoon Related Tweets using Standard and Bidirectional Recurrent Neural Networks

Author: Imperial, Joseph Marvin, Orosco, Jeyrome, Mazo, Shiela Mae, and Maceda, Lany
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Computation and Language
Abstract: The Philippines is a common ground to natural calamities like typhoons, floods, volcanic eruptions and earthquakes. With Twitter as one of the most used social media platform in the Philippines, a total of 39,867 preprocessed tweets were obtained given a time frame starting from November 1, 2013 to January 31, 2014. Sentiment analysis determines the underlying emotion given a series of words. The main purpose of this study is to identify the sentiments expressed in the tweets sent by the Filipino people before, during, and after Typhoon Yolanda using two variations of Recurrent Neural Networks; standard and bidirectional. The best generated models after training with various hyperparameters achieved a high accuracy of 81.79% for fine-grained classification using standard RNN and 87.69% for binary classification using bidirectional RNN. Findings revealed that 51.1% of the tweets sent were positive expressing support, love, and words of courage to the victims; 19.8% were negative stating sadness and despair for the loss of lives and hate for corrupt officials; while the other 29% were neutral tweets from local news stations, announcements of relief operations, donation drives, and observations by citizens., Comment: 5 figures, 2 tables, presented at the 14th National Natural Language Processing Research Symposium - Student Research Workshop
Published: 2019

23. On Applicability of Neural Language Models for Readability Assessment in Filipino

Author: Ibañez, Michael, Reyes, Lloyd Lois Antonie, Sapinit, Ranz, Hussien, Mohammed Ahmed, Imperial, Joseph Marvin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rodrigo, Maria Mercedes, editor, Matsuda, Noburu, editor, Cristea, Alexandra I., editor, and Dimitrova, Vania, editor
Published: 2022
Full Text: View/download PDF

24. Cross-Textual Analysis of COVID-19 Tweets: On Themes and Trends Over Time

Author: Imperial, Joseph Marvin, De La Cruz, Angelica, Malaay, Emmanuel, Roxas, Rachel Edita, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Yang, Xin-She, editor, Sherratt, Simon, editor, Dey, Nilanjan, editor, and Joshi, Amit, editor
Published: 2022
Full Text: View/download PDF

25. Semi-automatic Construction of Sight Words Dictionary for Filipino Text Readability

Author: Imperial, Joseph Marvin, Ong, Ethel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Uehara, Hiroshi, editor, Yamaguchi, Takayasu, editor, and Bai, Quan, editor
Published: 2021
Full Text: View/download PDF

26. On Applicability of Neural Language Models for Readability Assessment in Filipino

Author: Ibañez, Michael, primary, Reyes, Lloyd Lois Antonie, additional, Sapinit, Ranz, additional, Hussien, Mohammed Ahmed, additional, and Imperial, Joseph Marvin, additional
Published: 2022
Full Text: View/download PDF

27. Cross-Textual Analysis of COVID-19 Tweets: On Themes and Trends Over Time

Author: Imperial, Joseph Marvin, primary, De La Cruz, Angelica, additional, Malaay, Emmanuel, additional, and Roxas, Rachel Edita, additional
Published: 2021
Full Text: View/download PDF

28. Semi-automatic Construction of Sight Words Dictionary for Filipino Text Readability

Author: Imperial, Joseph Marvin, primary and Ong, Ethel, additional
Published: 2021
Full Text: View/download PDF

29. Comparative Thematic Analysis of Reflections from Physical and Virtual Internship Experiences of Computing Undergraduates Students

Author: Rodriguez, Ramon, primary, Imperial, Joseph Marvin, additional, Darrel Montefalcon, Myron, additional, Padilla, Jay Rhald, additional, Trillanes, Arlene, additional, and Abisado, Mideth, additional
Published: 2023
Full Text: View/download PDF

30. Convolutions vs. Sequences: Understanding performances of neural-based methods for automatic Baybayin script recognition

Author: Dela Rosa, Cerwin Dexter, primary, Lagunilla, Kreed Zion, additional, Ramos, Jomari, additional, San Pedro, Austin Kenneth, additional, and Imperial, Joseph Marvin, additional
Published: 2023
Full Text: View/download PDF

31. Automatic Readability Assessment for Closely Related Languages

Author: Imperial, Joseph Marvin, primary and Kochmar, Ekaterina, additional
Published: 2023
Full Text: View/download PDF

32. Is Twitter an Echo Chamber? Connecting Online Public Sentiments to Actual Results From the 2019 Philippine Midterm Elections

Author: Cruz, Lamar Clarence, primary, dela Cruz, Jessica Nicole, additional, Maglangit, Shane Francis, additional, Magtira, Mico, additional, Imperial, Joseph Marvin, additional, and Rodriguez, Ramon, additional
Published: 2022
Full Text: View/download PDF

33. WikAnalytics: A Web-based Application for Identifying Linguistic Features of a Text Group Supporting Filipino, English, and Taglish Languages

Author: Ramos, Jomari Valmadrid, primary, Ballesta, John Michael, additional, Lee Yam, Andrew Kobe, additional, Mogol, Moises Kairon, additional, Rodriguez, Ramon, additional, and Imperial, Joseph Marvin, additional
Published: 2022
Full Text: View/download PDF

34. A Baseline Readability Model for Cebuano

Author: Imperial, Joseph Marvin, primary, Reyes, Lloyd Lois Antonie, additional, Ibanez, Michael Antonio, additional, Sapinit, Ranz, additional, and Hussien, Mohammed, additional
Published: 2022
Full Text: View/download PDF

35. NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space

Author: Imperial, Joseph Marvin, primary
Published: 2022
Full Text: View/download PDF

36. Audio-Based Hate Speech Classification from Online Short-Form Videos

Author: Ibanez, Michael, primary, Sapinit, Ranz, additional, Reyes, Lloyd Antonie, additional, Hussien, Mohammed, additional, Imperial, Joseph Marvin, additional, and Rodriguez, Ramon, additional
Published: 2021
Full Text: View/download PDF

37. Under the Microscope:Interpreting Readability Assessment Models for Filipino

Author: Imperial, Joseph Marvin and Ong, Ethel
Subjects: cs.LG, cs.CL
Abstract: Readability assessment is the process of identifying the level of ease or difficulty of a certain piece of text for its intended audience. Approaches have evolved from the use of arithmetic formulas to more complex pattern-recognizing models trained using machine learning algorithms. While using these approaches provide competitive results, limited work is done on analyzing how linguistic variables affect model inference quantitatively. In this work, we dissect machine learning-based readability assessment models in Filipino by performing global and local model interpretation to understand the contributions of varying linguistic features and discuss its implications in the context of the Filipino language. Results show that using a model trained with top features from global interpretation obtained higher performance than the ones using features selected by Spearman correlation. Likewise, we also empirically observed local feature weight boundaries for discriminating reading difficulty at an extremely fine-grained level and their corresponding effects if values are perturbed.
Published: 2021

38. Deploying Kalahok 1.0: Profiling Disaster-Stricken Communities Towards Intervention Initiatives

Author: Imperial, Joseph Marvin, primary, Octaviano, Manolito, additional, Zuniega, Jesvir, additional, De La Cruz, Angelica, additional, and Roxas, Rachel Edita, additional
Published: 2021
Full Text: View/download PDF

39. Understanding Facial Expression Expressing Hate from Online Short-form Videos

Author: Montefalcon, Myron Darrel, primary, Padilla, Jay Rhald, additional, Paulino, Joshua, additional, Go, Jeline, additional, Llabanes Rodriguez, Ramon, additional, and Imperial, Joseph Marvin, additional
Published: 2021
Full Text: View/download PDF

40. A BERT-based Hate Speech Classifier from Transcribed Online Short-Form Videos

Author: Hernandez Urbano Jr., Rommel, primary, Uy Ajero, Jeffrey, additional, Legaspi Angeles, Angelic, additional, Hacar Quintos, Maria Nikki, additional, Regalado Imperial, Joseph Marvin, additional, and Llabanes Rodriguez, Ramon, additional
Published: 2021
Full Text: View/download PDF

41. Exploring Hybrid Linguistic Feature Sets to Measure Filipino Text Readability

Author: Imperial, Joseph Marvin, primary and Ong, Ethel, additional
Published: 2020
Full Text: View/download PDF

42. An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature

Author: Imperial, Joseph Marvin R., primary, Ya-On, Czeritonnie Gail V., additional, and Ureta, Jennifer C., additional
Published: 2019
Full Text: View/download PDF

43. Understanding Anonymous Social Media Posts using Topic Modeling

Author: Valencia, John Daniel M., primary, Laure, Al Joseph T., additional, Centino, Nino Mark R., additional, Fabito, Bernie S., additional, Imperial, Joseph Marvin R., additional, Rodriguez, Ramon L., additional, De la Cruz, Angelica H., additional, Octaviano, Manolito V., additional, and Jamis, Marilou N., additional
Published: 2019
Full Text: View/download PDF

44. Developing a machine learning-based grade level classifier for Filipino children’s literature

Author: Imperial, Joseph Marvin, primary, Roxas, Rachel Edita, additional, Campos, Erica Mae, additional, Oandasan, Jemelee, additional, Caraballo, Reyniel, additional, Sabdani, Ferry Winsley, additional, and Almaroi, Ani Rosa, additional
Published: 2019
Full Text: View/download PDF

45. Motif search using Gibbs sampling: Notes on effectiveness in a distributed environment

Author: Imperial, Joseph Marvin, primary, Gail Ya-On, Czeritonnie, additional, and Cu, Gregory, additional
Published: 2019
Full Text: View/download PDF

46. Doctor’s Cursive Handwriting Recognition System Using Deep Learning

Author: Fajardo, Lovely Joy, primary, Sorillo, Nino Joshua, additional, Garlit, Jaycel, additional, Tomines, Cia Dennise, additional, Abisado, Mideth B., additional, Imperial, Joseph Marvin R., additional, Rodriguez, Ramon L., additional, and Fabito, Bernie S., additional
Published: 2019
Full Text: View/download PDF

47. Convolutions vs. Sequences: Understanding performances of neural-based methods for automatic Baybayin script recognition.

Author: Dela Rosa, Cerwin Dexter L., Lagunilla, Kreed Zion Lorenzo G., Ramos, Jomari V., San Pedro, Austin Kenneth V., and Imperial, Joseph Marvin R.
Published: 2023
Full Text: View/download PDF

48. Convolutions vs. Sequences: Understanding performances of neural-based methods for automatic Baybayin script recognition

Author: Ma, Jixin, Dela Rosa, Cerwin Dexter L., Lagunilla, Kreed Zion Lorenzo G., Ramos, Jomari V., San Pedro, Austin Kenneth V., and Imperial, Joseph Marvin R.
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

48 results on '"Imperial, Joseph Marvin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources