5,419 results on '"WordNet"'
Search Results
2. Integrating YOLO and WordNet for automated image object summarization.
- Author
-
Saqib, Sheikh Muhammad, Aftab, Aamir, Mazhar, Tehseen, Iqbal, Muhammad, Shahazad, Tariq, Almogren, Ahmad, and Hamam, Habib
- Abstract
The demand for methods that automatically create text summaries from images containing many things has recently grown. Our research introduces a fresh and creative way to achieve this. We bring together the WordNet dictionary and the YOLO model to make this happen. YOLO helps us find where the things are in the images, while WordNet provides their meanings. Our process then crafts a summary for each object found. This new technique can have a big impact on computer vision and natural language processing. It can make understanding complicated images, filled with lots of things, much simpler. To test our approach, we used 1381 pictures from the Google Image search engine. Our results showed high accuracy, with 72% for object detection. The precision was 85%, the recall was 72%, and the F1-score was 74%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. An Efficient Ant Colony Optimization Optimized Deep Belief Network Based Text Summarization Using Diverse Beam Search Computation for Social Media Content Extraction.
- Author
-
Vinitha M. and Vasundra S.
- Subjects
ANT algorithms ,TEXT summarization ,SOCIAL media ,MACHINE learning ,RECOMMENDER systems ,K-means clustering - Abstract
Social media is a platform for sharing various hashtags, news, and posts within the community. Increasing text content and information to read and understand is difficult because of the vast definitions. At present, every business must analyze data in social media. Social media platforms are where people worldwide discuss and share social commentary. Automated text summarization is crucial for condensing lengthy contents into concise ones using learning concepts. The unstructured data in social media are significant phrases with support from sources for analyzing sentiments and extracting the importance of the content. Previously, all data were analyzed as related content similarity matches. However, the main drawback is that data needs to be examined to rephrase or summarize the essential terms of extraction. Due to improper content extraction, the accuracy gets poor in precision level to increase the false content rate. To tackle these problems, in this paper presents a Machine Learning (ML) intelligence algorithm with Diverse Beam Search-Based Maximum Mutual Information (DBSMMI) and Ant Colony Optimization (ACO)-optimized Deep Belief Network Based Text Summarization (DBNTS). Initially, the COVID-19 Twitter dataset is preprocessed to remove noise and create a Pheromone value set based on the k-means semantic similarity algorithm. Our work analyzes and clusters the data according to their theme (area). Data analysis is the central concept and is performed using WordNet keyword matching and semantic matching of the words. Then, the similar word is clustered using a semantic similarity-based k-means clustering algorithm. Using DBSMMI to make identical content phrases maximum supports term sentence extraction. The maximum support clustered group is optimized for the respective theme using ant colony optimization with the DBNTS algorithm. The algorithm's efficiency can be tested with an existing classifier algorithms. The ACO semantic recommender system is implemented in this article to recommend relevant news to the Twitter user. The proposed simulation attains the 92.35% of accuracy, and 90.29% of precision performance. The proposed method efficiently improves classification accuracy, and precision performance compared to other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. A hybrid model to improve IC-related metrics of semantic similarity between words.
- Author
-
Xiao, Jia
- Subjects
WILCOXON signed-rank test ,STATISTICAL correlation ,RANK correlation (Statistics) ,CONFIDENCE intervals ,SAMPLE size (Statistics) - Abstract
This paper proposes a hybrid model to improve Information Content (IC) related metrics of semantic similarity between words, named IC+SP, based on the essential hypothesis that IC and the shortest path are two relatively independent semantic evidences and have approximately equal influences to the semantic similarity metric. The paradigm of IC+SP is to linearly combine the IC-related metric and the shortest path. Meanwhile, a transformation from the semantic similarity of the concepts to that of the words is presented by maximizing every component of IC+SP. 13 improved IC-related metrics based on IC+SP are formed and implemented on the experimental platform HESML Lastra-Díaz (Inf Syst 66:97–118, 2017). Pearson's and Spearman's correlation coefficients on well-accepted benchmarks for the improved metrics compare to those for the original ones to evaluate IC+SP. I introduce the Wilcoxon Signed-Rank Test needing no standard distribution hypothesis, while, this hypothesis is required by T-Test on the sample of small size. T-Test, as well as the Wilcoxon Signed-Rank Test, conduct on the differences of the correlative coefficients for improved and original metrics. It is expected that the improved IC-related metrics could significantly outperform their corresponding original ones, and the experimental results, including the comparisons of mean and maximum of correlation coefficients as well as the p-value and confidence interval of both tests, accomplish the anticipation in the vast majority of cases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Naïve Bayes classifier for Kashmiri word sense disambiguation.
- Author
-
Mir, Tawseef Ahmad and Lawaye, Aadil Ahmad
- Abstract
Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem very easy for humans to interpret the meaning of natural language, machines require the processing of huge amounts of data for similar tasks. In this paper, we propose an automatic WSD system for the Kashmiri language based on the Naive Bayes classifier. This work is the first attempt towards developing a WSD system for the Kashmiri language to the best of our knowledge. Bag-of-Words (BoW) and Part-of-Speech (PoS) based features are used in this study for developing the WSD system. Experiments are carried out on a manually crafted sense-tagged dataset for 60 ambiguous Kashmiri words. These 60 words are selected based on the frequency in the raw corpus collected. Senses for annotation purposes of these ambiguous words are extracted from Kashmiri WordNet. The performance of the proposed system is measured using accuracy, precision, recall and F-1 measure metrics. The proposed WSD model reported the best performance (accuracy = 89.92, precision = 0.84, recall = 0.89, F-1 measure = 0.86) when both PoS and BoW features were used at the same time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. System Fusion Based on WordNet Word Sense Disambiguation.
- Author
-
Duan, Mengtao and Luan, Tingyan
- Subjects
EXPERIMENTAL groups ,CONTROL groups - Abstract
In the realm of natural language processing (NLP), Word Sense Disambiguation (WSD) is a crucial task, and WSD systems are used in many NLP applications. Systems built on WordNet (e.g., Lesk) have been witnessing encouraging progress in the domain of word sense disambiguation. Yet, the performance of WordNet based WSD systems may have limits when disambiguating polysemous words. The purpose of this research was to investigate the discrepancies between a systematic fusing approach of WordNet WSD and a single best performing system. In the experimental test, the fusing approach was used to disambiguate, and in the control test, a single best disambiguating system was used to disambiguate. The accuracies, recalls, and disambiguation times of two groups were compared after the two groups were tested by the same test dataset. The result of the experiment is that the performance accuracy and recall of the experimental group is better than that of the control group. The decision result of the multiple systems was fused to strengthen the performance accuracy and comprehensive of the system. At the disambiguation time, the experimental group showed a worthy disambiguation rate of disambiguation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Bridging Natural Language Processing and psycholinguistics: computationally grounded semantic similarity datasets for Basque and Spanish
- Author
-
Josu Goikoetxea, Itziar San Martin, and Miren Arantzeta
- Subjects
WordNet ,text ,psycholinguistic features ,word similarity ,embeddings ,nouns ,Language and Literature - Abstract
IntroductionSemantic relations are crucial in various cognitive processes, highlighting the need to understand concept interactions and how such relations are represented in the brain. Psycholinguistics research requires computationally grounded datasets that include word similarity measures controlled for the variables that play a significant role in lexical processing. This work presents a dataset for noun pairs in Basque and European Spanish based on two well-known Natural Language Processing resources: text corpora and knowledge bases.MethodsThe dataset creation consisted of three steps, (1) computing four key psycholinguistic features for each noun; concreteness, frequency, semantic, and phonological neighborhood density; (2) pairing nouns across these four variables; (3) for each noun pair, assigning three types of word similarity measurements, computed out of text, Wordnet and hybrid embeddings.ResultsA dataset of noun pairs in Basque and Spanish involving three types of word similarity measurements, along with four lexical features for each of the nouns in the pair, namely, word frequency, concreteness, and semantic and phonological neighbors. The selection of the nouns for each pair was controlled by the mentioned variables, which play a significant role in lexical processing. The dataset includes three similarity measurements, based on their embedding computation: semantic relatedness from text-based embeddings, pure similarity from Wordnet-based embeddings and both categorical and associative relations from hybrid embeddings.DiscussionThe present work covers an existent gap in Basque and Spanish in terms of the lack of datasets that include both word similarity and detailed lexical properties, which provides a more useful resource for psycholinguistics research in those languages.
- Published
- 2024
- Full Text
- View/download PDF
8. BioBERT for Multiple Knowledge-Based Question Expansion and Biomedical Extractive Question Answering
- Author
-
Gabsi, Imen, Kammoun, Hager, Wederni, Asma, Amous, Ikram, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nguyen, Ngoc Thanh, editor, Franczyk, Bogdan, editor, Ludwig, André, editor, Núñez, Manuel, editor, Treur, Jan, editor, Vossen, Gottfried, editor, and Kozierkiewicz, Adrianna, editor
- Published
- 2024
- Full Text
- View/download PDF
9. Word2Vec-GloVe-BERT Embeddings for Query Expansion
- Author
-
Gabsi, Imen, Kammoun, Hager, Mtar, Rawed, Amous, Ikram, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Bajaj, Anu, editor, Hanne, Thomas, editor, and Hong, Tzung-Pei, editor
- Published
- 2024
- Full Text
- View/download PDF
10. Knowledge Graph-Based Evaluation Metric for Conversational AI Systems: A Step Towards Quantifying Semantic Textual Similarity
- Author
-
Gaur, Rajat, Dwivedi, Ankit, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Dhar, Suparna, editor, Goswami, Sanjay, editor, Dinesh Kumar, U., editor, Bose, Indranil, editor, Dubey, Rameshwar, editor, and Mazumdar, Chandan, editor
- Published
- 2024
- Full Text
- View/download PDF
11. Multi-objective Black-Box Test Case Prioritization Based on Wordnet Distances
- Author
-
van Dinten, Imara, Zaidman, Andy, Panichella, Annibale, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Arcaini, Paolo, editor, Yue, Tao, editor, and Fredericks, Erik M., editor
- Published
- 2024
- Full Text
- View/download PDF
12. Advances Toward Word-Sense Disambiguation
- Author
-
Mir, Tawseef Ahmad, Lawaye, Aadil Ahmad, Ahmed, Ghayas, Rana, Parveen, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Hassanien, Aboul Ella, editor, Castillo, Oscar, editor, Anand, Sameer, editor, and Jaiswal, Ajay, editor
- Published
- 2024
- Full Text
- View/download PDF
13. Data Augmentation Techniques in Automatic Translation of Vietnamese Sign Language for the Deaf
- Author
-
Do, Duy Cop, Ho, Thi Tuyen, Nguyen, Thi Bich Diep, Magjarević, Ratko, Series Editor, Ładyżyński, Piotr, Associate Editor, Ibrahim, Fatimah, Associate Editor, Lackovic, Igor, Associate Editor, Rock, Emilio Sacristan, Associate Editor, Vo, Van Toi, editor, Nguyen, Thi-Hiep, editor, Vong, Binh Long, editor, Le, Ngoc Bich, editor, and Nguyen, Thanh Qua, editor
- Published
- 2024
- Full Text
- View/download PDF
14. A hybrid model to improve IC-related metrics of semantic similarity between words
- Author
-
Jia Xiao
- Subjects
Semantic similarity ,Ontology ,Information content ,WordNet ,Wilcoxon Signed-Rank Test ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract This paper proposes a hybrid model to improve Information Content (IC) related metrics of semantic similarity between words, named IC+SP, based on the essential hypothesis that IC and the shortest path are two relatively independent semantic evidences and have approximately equal influences to the semantic similarity metric. The paradigm of IC+SP is to linearly combine the IC-related metric and the shortest path. Meanwhile, a transformation from the semantic similarity of the concepts to that of the words is presented by maximizing every component of IC+SP. 13 improved IC-related metrics based on IC+SP are formed and implemented on the experimental platform HESML Lastra-Díaz (Inf Syst 66:97–118, 2017). Pearson’s and Spearman’s correlation coefficients on well-accepted benchmarks for the improved metrics compare to those for the original ones to evaluate IC+SP. I introduce the Wilcoxon Signed-Rank Test needing no standard distribution hypothesis, while, this hypothesis is required by T-Test on the sample of small size. T-Test, as well as the Wilcoxon Signed-Rank Test, conduct on the differences of the correlative coefficients for improved and original metrics. It is expected that the improved IC-related metrics could significantly outperform their corresponding original ones, and the experimental results, including the comparisons of mean and maximum of correlation coefficients as well as the p-value and confidence interval of both tests, accomplish the anticipation in the vast majority of cases.
- Published
- 2024
- Full Text
- View/download PDF
15. A hybrid semantic recommender system based on an improved clustering.
- Author
-
Bahrani, Payam, Minaei-Bidgoli, Behrouz, Parvin, Hamid, Mirzarezaee, Mitra, and Keshavarz, Ahmad
- Subjects
- *
RECOMMENDER systems , *K-nearest neighbor classification , *SCALABILITY - Abstract
A recommender system is a model that automatically recommends some meaningful cases (such as clips/films/goods/items) to the clients/people/consumers/users according to their (previous) interests. These systems are expected to recommend the items according to the users' interests. There are two traditional general recommender system models, i.e., Collaborative Filtering Recommender System (ColFRS) and Content-based Filtering Recommender System (ConFRS). Also, there is another model that is a hybrid of those two traditional recommender systems; it is called Hybrid Recommender System (HRS). An HRS usually outperforms simple traditional recommender systems. The problems such as scalability, cold start, and sparsity belong to the main problems that any recommender system should solve. The memory-based (modeless) recommender systems benefit from good accuracies. But they suffer from a lack of admissible scalability. The model-based recommender systems suffer from a lack of admissible accuracies. But they benefit from good scalability. In this paper, it is tried to propose a hybrid model based on an automatically improved ontology to solve the scalability, cold start, and sparsity problems. Our proposed HRS also uses an innovative approach of clustering as an augmented section. When there are enough ratings, it uses a collaborative filtering approach to predict the missing ratings. When there are not enough ratings, it uses a content-based filtering approach to predict the missing ratings. In the content-based filtering section of our proposed HRS, ontology concepts are used to improve the accuracy of ratings' prediction. If our target client is severely sparse, we cannot trust even the ratings predicted by the content-based filtering section of our proposed HRS. Therefore, our proposed HRS uses additive clustering to improve the prediction of the missing ratings if the target client is severely sparse. It is experimentally shown that our model outperforms many of the newly developed recommender systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Machine Hands on Flaws to Machine: The Surprising Sources of Biases in Machine Learning Models.
- Author
-
Steinfeld, Kyle
- Subjects
GENERATIVE adversarial networks ,REINFORCEMENT learning ,COLLEGE teachers ,ARTIFICIAL intelligence - Abstract
After musing on the history and varying media of the concept of 'gone viral', Associate Professor of Architecture at the University of California, Berkeley, Kyle Steinfeld further investigates computational design through the lens of cultural practices. Even the seemingly most contemporary and innovative technological ideas and gizmos can be traced back to a series of legacy notions that remain silently present in new advances. The article discusses such 'hinge' moments and searches for them in AI. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Query reformulation system based on WordNet and word vectors clusters.
- Author
-
Jumde, Amol and Keskar, Ravindra
- Subjects
- *
CHATBOTS , *MATHEMATICAL reformulation , *SEARCH engines , *CHATGPT , *VECTOR spaces , *PERSONAL assistants , *INTERNET users - Abstract
With tremendous evolution in the internet world, the internet has become a household thing. Internet users use search engines or personal assistants to request information from the internet. Search results are greatly dependent on the entered keywords. Casual users may enter a vague query due to lack of knowledge of the domain-specific words. We propose a query reformulation system that determines the context of the query, decides on keywords to be replaced and outputs a better-modified query. We propose strategies for keyword replacements and metrics for query betterment checks. We have found that if we project keywords into the vector space of word projection using word embedding techniques and if the keyword replacement is correct, clusters of a new set of keywords become more cohesive. This assumption forms the basis of our proposed work. To prove the effectiveness of the proposed system, we applied it to the ad-hoc retrieval tasks over two benchmark corpora viz TREC-CDS 2014 and OHSUMED corpus. We indexed Whoosh search engine on these corpora and evaluated based on the given queries provided along with the corpus. Experimental results show that the proposed techniques achieved 9 to 11% improvement in precision and recall scores. Using Google's popularity index, we also prove that the reformulated queries are not only more accurate but also more popular. The proposed system also applies to Conversational AI chatbots like ChatGPT, where users must rephrase their queries to obtain better results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Internetowy słownik języka zawodowego polskich dziennikarzy prasowych. Koncepcja tezaurusa dziedzinowego typu wordnet - preliminaria.
- Author
-
Jarosz, Beata
- Abstract
Copyright of Prace Jezykoznawcze is the property of University of Warmia & Mazury in Olsztyn and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
19. Towards a Multimodal WordNet for Language Learning in Bulgarian
- Author
-
Petya Osenova and Kiril Simov
- Subjects
Wordnet ,Sub-lexicons ,Language Learning ,Images ,Bulgarian ,Information technology ,T58.5-58.64 - Abstract
In this paper we present some modifications to and extensions of a Wordnet for Bulgarian designed to make it more appropriate for applications in the area of language learning. However, in order to support education, we need to ensure the appropriate selection of sets of synonyms (synsets) from BTB- Wordnet, depending on the education level of the learners, and various types of exercises based on integration of the learning topic and semantic information within Wordnet. For this purpose, our focus is mainly on the combination of the lexemes (lemmas), with their meanings and examples, and the specially designed pictures as illustrations of those meanings within the synsets. We report on our preliminary results.
- Published
- 2024
- Full Text
- View/download PDF
20. Enhancing Aspect-based Sentiment Analysis with ParsBERT in Persian Language
- Author
-
Farid Ariai, Maryam Tayefeh Mahmoudi, and Ali Moeini
- Subjects
opinion mining ,sentiment analysis ,aspect-based sentiment analysis ,lexical semantic disambiguation ,wordnet ,Information technology ,T58.5-58.64 ,Computer software ,QA76.75-76.765 - Abstract
In the era of pervasive internet use and the dominance of social networks, researchers face significant challenges in Persian text mining, including the scarcity of adequate datasets in Persian and the inefficiency of existing language models. This paper specifically tackles these challenges, aiming to amplify the efficiency of language models tailored to the Persian language. Focusing on enhancing the effectiveness of sentiment analysis, our approach employs an aspect-based methodology utilizing the ParsBERT model, augmented with a relevant lexicon. The study centers on sentiment analysis of user opinions extracted from the Persian website 'Digikala.' The experimental results not only highlight the proposed method's superior semantic capabilities but also showcase its efficiency gains with an accuracy of 88.2% and an F1 score of 61.7. The importance of enhancing language models in this context lies in their pivotal role in extracting nuanced sentiments from user-generated content, ultimately advancing the field of sentiment analysis in Persian text mining by increasing efficiency and accuracy.
- Published
- 2024
- Full Text
- View/download PDF
21. Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
- Author
-
Musarat Nazeer, Musarrat Azher, Azhar Pervaiz, and Iqra Yasmeen
- Subjects
corpus-based study ,saraiki nouns ,lexico-semantic relations ,wordnet ,nlp ,English literature ,PR1-9680 ,Language. Linguistic theory. Comparative grammar ,P101-410 - Abstract
Saraiki, being the fourth most widely spoken language in Pakistan and being used in some parts of India and Afghanistan, is of significant geographical, historical, and cultural importance. However, it remains neglected in terms of proper documentation and identification of its unique linguistic features. The current study is centered on identifying the lexico-semantic categories of Saraiki nouns and then developing their hierarchical relationships (Miller et al., 1993). This quantitative research is designed to contribute to the process of developing Saraiki WordNet and is related to Natural Language Processing (NLP). A corpus of 3 million words was developed on the basis of data collected from different genres of the Saraiki language, including newspapers, academic essays, literary texts, and religious books. Both expansion and merge approaches were used to analyze the data. A wordlist of 1500 most occurring nouns was extracted from the corpus using Antconc 3.4.4.0, followed by manual tagging in Microsoft Excel 2010. Resultantly, 39 most occurring nouns from the wordlist were used to develop 173 related synsets, and lexico-semantic relationships among these nouns were identified with the help of 30 hierarchies (Miller et al., 1993). This study is limited to areas like Bahawalpur, Multan, and Muzaffarabad. It would be a milestone for Saraiki language learners, SWN development, Saraiki lexical resources, online SL dictionaries, and a guide for researchers.
- Published
- 2024
22. Conversion of the Spanish WordNet databases into a Prolog-readable format
- Author
-
Julián-Iranzo, Pascual, Rigau, Germán, Sáenz-Pérez, Fernando, and Velasco-Crespo, Pablo
- Published
- 2024
- Full Text
- View/download PDF
23. Are We Talking about the Same Thing? Modeling Semantic Similarity between Common and Specialized Lexica in WordNet.
- Author
-
Barbero, Chiara and Amaro, Raquel
- Subjects
CODE switching (Linguistics) ,DATABASES ,LEXICON ,EXPERTISE ,SHARING - Abstract
Specialized languages can activate different sets of semantic features when compared to general language or express concepts through different words according to the domain. The specialized lexicon, i.e., lexical units that denote more specific concepts and knowledge emerging from specific domains, however, co-exists with the common lexicon, i.e., the set of lexical units that denote concepts and knowledge shared by the average speakers, regardless of their specific training or expertise. Communication between specialists and non-specialists can show a big gap between language(s), and therefore lexical units, used by the two groups. However, quite often, semantic and conceptual overlapping between specialized and common lexical units occurs and, in many cases, the specialized and common units refer to close concepts or even point to the same reality. Considering the modeling of meaning in functional lexical resources, this paper puts forth a solution that links common and specialized lexica within the WordNet model framework. We propose a new relation expressing semantic proximity between common and specialized units and define the conditions for its establishment. Besides contributing to the observation and understanding of the process of knowledge specialization and its reflex on the lexicon, the proposed relation allows for the integration of specialized and non-specialized lexicons into a single database, contributing directly to improving communication in specialist/non-specialist contexts, such as teaching–learning situations or health professional-patient interactions, among many others, where code-switching is frequent and necessary. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Plagiarism Detection System by Semantic and Syntactic Analysis Based on Latent Dirichlet Allocation Algorithm.
- Author
-
Nahar, Khalid M. O., Alshtaiwi, Ma’moun, Alikhashashneh, Enas, Shatnawi, Nahlah, Al-Shannaq, Moy’awiah A., Abual-Rub, Mohammed, and BaniIsmail, Basel
- Subjects
NATURAL language processing ,PLAGIARISM ,LATENT variables ,ALGORITHMS - Abstract
The process of plagiarism detection is one of the challenges in revealing the originality of a document, especially in the fields of science and research. Natural language processing methods can recognize and determine the level of similarity between different documents. In this paper, we tackle the task of extrinsic plagiarism detection based on semantic and syntactic approaches. The objective is to identify segments of a document that show strong similarity with a group of reference documents dealing with the same topic. In this paper, we present our hybrid approach that implements semantic and syntactic features based on Latent Dirichlet Allocation (LDA) and Wu & Plamer algorithm. The proposed approach has been evaluated on a PAN13 public dataset with a total accuracy of 85%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. A hybrid semantic recommender system enriched with an imputation method.
- Author
-
Bahrani, Payam, Minaei-Bidgoli, Behrouz, Parvin, Hamid, Mirzarezaee, Mitra, and Keshavarz, Ahmad
- Abstract
Recommender systems are widely used in many applications. They can be viewed as the predictor systems that are to suggest accurate and highly preferred items to consumers or clients. These systems can be considered to be information filtering systems. They counter some important challenges such as cold start (it means the absence of enough data for a new item to make accurate recommendations), scalability, and sparsity. The memory-based recommender systems have high accuracy but lack scalability. Also, the model-based systems are scalable but not accurate. Current recommender systems use hybrid methods to deal with the most important shortages of traditional filtering approaches. Current recommender systems are usually a hybrid of content-based filtering and collaborative filtering, and so on. In this paper, a hybrid recommender system is presented to meet the stated challenges, increase system performance and provide more accurate recommendations. This system uses both content-based filtering and collaborative filtering. In addition, using an automatically collected wordnet, we create an ontology that has been used in the content-based filtering section of our proposed approach. Furthermore, this framework applies KNN (k nearest neighbors) algorithm and clustering to improve its functionality. The proposed system is evaluated on a real benchmark. The experimentations show the proposed method has a better performance compared with the current superior related methods. The experimentations also show that our recommender system has desirable scalability compared with the state-of-the-art recommender systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A new ontology-based similarity approach for measuring caching coverages provided by mediation systems.
- Author
-
Ajarroud, Ouafa, Zellou, Ahmed, and Idri, Ali
- Subjects
CACHE memory ,ONTOLOGIES (Information retrieval) ,SEMANTIC computing ,ONOMASIOLOGY ,SYNTAX (Grammar) - Abstract
Most mediation systems use a caching policy in order to overcome their performance challenges. One of the most widely adopted strategies is known as semantic caching. Semantic caches are called so because they store the descriptions of all submitted queries. Although they may seem to be based on semantics because of their name, this is not really the case. In fact, they actually compare the syntax of the cached queries to the syntax of the new query to retrieve responses from the cache. This can lead to significant delays, especially if multiple requests are stored in the cache. In this work, we propose a new semantic approach based on ontologies to compute the semantic similarity between two given queries, and we provide also a new algorithm to filter all regions of the cache that do not semantically cover a user query. In this way, the use of the cache would be optimal and fast at the same time, despite the large number of regions in the cache. In fact, only the most beneficial regions will be processed to retrieve data from the cache. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Detection of Hate Speech in Assamese Text
- Author
-
Baruah, Nomi, Gogoi, Arjun, Neog, Mandira, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Kumar, Sandeep, editor, Hiranwal, Saroj, editor, Purohit, S.D., editor, and Prasad, Mukesh, editor
- Published
- 2023
- Full Text
- View/download PDF
28. Evaluating a Synthetic Image Dataset Generated with Stable Diffusion
- Author
-
Stöckl, Andreas, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Yang, Xin-She, editor, Sherratt, R. Simon, editor, Dey, Nilanjan, editor, and Joshi, Amit, editor
- Published
- 2023
- Full Text
- View/download PDF
29. Automatic Text Summarization Using Word Embeddings
- Author
-
Antony, Sophiya, Pankaj, Dhanya S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Sharma, Neha, editor, Goje, Amol, editor, Chakrabarti, Amlan, editor, and Bruckstein, Alfred M., editor
- Published
- 2023
- Full Text
- View/download PDF
30. GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms, and Gujarati–English Translation of Input Text
- Author
-
Patel, Margi, Joshi, Brijendra Kumar, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Singh, Pradeep, editor, Singh, Deepak, editor, Tiwari, Vivek, editor, and Misra, Sanjay, editor
- Published
- 2023
- Full Text
- View/download PDF
31. Semantic Similarity based Automated Answer Script Evaluation System using Machine Learning Pipeline and Natural Language Processing
- Author
-
Shabariram, C. P., Priya Ponnuswamy, P., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Smys, S., editor, Tavares, João Manuel R. S., editor, and Shi, Fuqian, editor
- Published
- 2023
- Full Text
- View/download PDF
32. Using Classifier Ensembles to Predict Election Results Using Twitter Data Sentiment Analysis
- Author
-
Sharma, Pinki, Kumar, Santosh, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Mahapatra, Rajendra Prasad, editor, Peddoju, Sateesh K., editor, Roy, Sudip, editor, and Parwekar, Pritee, editor
- Published
- 2023
- Full Text
- View/download PDF
33. Word Sense Disambiguation from English to Indic Language: Approaches and Opportunities
- Author
-
Mishra, Binod Kumar, Jain, Suresh, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Patel, Kanubhai K., editor, Santosh, K. C., editor, and Patel, Atul, editor
- Published
- 2023
- Full Text
- View/download PDF
34. HanaNLG: A Flexible Hybrid Approach for Natural Language Generation
- Author
-
Barros, Cristina, Lloret, Elena, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Gelbukh, Alexander, editor
- Published
- 2023
- Full Text
- View/download PDF
35. Exploiting Metonymy from Available Knowledge Resources
- Author
-
Gonzalez-Dios, Itziar, Álvez, Javier, Rigau, German, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Gelbukh, Alexander, editor
- Published
- 2023
- Full Text
- View/download PDF
36. Russian Emotional Concepts in the Multilingual Technological Environment
- Author
-
Serikov, Andrei E., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Bylieva, Daria, editor, and Nordmann, Alfred, editor
- Published
- 2023
- Full Text
- View/download PDF
37. Retrospective Inspection for Research in Natural Language Processing of Hindi Language Using Fuzzy Logic
- Author
-
Vij, Sonakshi, Virmani, Deepali, Jain, Amita, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Tuba, Milan, editor, Akashe, Shyam, editor, and Joshi, Amit, editor
- Published
- 2023
- Full Text
- View/download PDF
38. A Survey of Different Approaches for Word Sense Disambiguation
- Author
-
Ransing, Rasika, Gulati, Archana, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Fong, Simon, editor, Dey, Nilanjan, editor, and Joshi, Amit, editor
- Published
- 2023
- Full Text
- View/download PDF
39. Semantic-Based Feature Extraction and Feature Selection in Digital Library User Behaviour Dataset
- Author
-
Fernandez, F. Mary Harin, Punithavathi, I. S. Hephzi, Ramana, T. Venkata, Ramana, K. Venkata, Xhafa, Fatos, Series Editor, Smys, S., editor, Lafata, Pavel, editor, Palanisamy, Ram, editor, and Kamel, Khaled A., editor
- Published
- 2023
- Full Text
- View/download PDF
40. A Novel Hysynset-Based Topic Modeling Approach for Marathi Language
- Author
-
Bafna, Prafulla B., Saini, Jatinderkumar R., Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Choudrie, Jyoti, editor, Mahalle, Parikshit, editor, Perumal, Thinagaran, editor, and Joshi, Amit, editor
- Published
- 2023
- Full Text
- View/download PDF
41. A Graph-Based Extractive Assamese Text Summarization
- Author
-
Baruah, Nomi, Sarma, Shikhar Kr., Borkotokey, Surajit, Borah, Randeep, Phukan, Rakhee D., Gogoi, Arjun, Xhafa, Fatos, Series Editor, Asari, Vijayan K., editor, Singh, Vijendra, editor, Rajasekaran, Rajkumar, editor, and Patel, R. B., editor
- Published
- 2023
- Full Text
- View/download PDF
42. User's learning capability aware E-content recommendation system for enhanced learning experience
- Author
-
P. Vijayakumar and G. Jagatheeshkumar
- Subjects
E-Learning ,Recommendation system ,Learning experience ,Classification ,WordNet ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 - Abstract
E-learning is inevitable during these pandemic days and most of the learners find it comfortable to learn online. However, the main challenge is to locate the appropriate data in line with the learner's requirement. Considering the necessity of this issue, this article presents an e-content recommendation system that considers the user's learning capability. This work categorizes the documents into three categories such as basic, intermediate and advanced levels. Based on the users' learning capability, corresponding documents are recommended and this idea enhances the overall learning experience. This work is based on three phases such as data pre-processing, feature extraction and classification. The collected documents are pre-processed for preparing the documents suitable for further processes. Features such as Parts–OF–Speech (POS) tagging, Term Frequency - Inverse Document Frequency (TF-IDF) and semantic similarity based on WordNet are extracted and the multiclass Support Vector Machine (SVM) is employed for distinguishing between the classes. The performance of the work is tested and the results prove the efficacy of the work with 98 % accuracy rates, in contrast to the comparative techniques.
- Published
- 2024
- Full Text
- View/download PDF
43. Ensemble-Based Short Text Similarity: An Easy Approach for Multilingual Datasets Using Transformers and WordNet in Real-World Scenarios.
- Author
-
Gagliardi, Isabella and Artese, Maria Teresa
- Subjects
LANGUAGE models ,CULTURAL property - Abstract
When integrating data from different sources, there are problems of synonymy, different languages, and concepts of different granularity. This paper proposes a simple yet effective approach to evaluate the semantic similarity of short texts, especially keywords. The method is capable of matching keywords from different sources and languages by exploiting transformers and WordNet-based methods. Key features of the approach include its unsupervised pipeline, mitigation of the lack of context in keywords, scalability for large archives, support for multiple languages and real-world scenarios adaptation capabilities. The work aims to provide a versatile tool for different cultural heritage archives without requiring complex customization. The paper aims to explore different approaches to identifying similarities in 1- or n-gram tags, evaluate and compare different pre-trained language models, and define integrated methods to overcome limitations. Tests to validate the approach have been conducted using the QueryLab portal, a search engine for cultural heritage archives, to evaluate the proposed pipeline. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. Multi-knowledge resources-based semantic similarity models with application for movie recommender system.
- Author
-
Huang, Guangjian, Zhu, Xingtu, Wasti, Shahbaz Hassan, and Jiang, Yuncheng
- Subjects
AMBIGUITY ,RECOMMENDER systems ,RESEARCH personnel - Abstract
In recent years, researchers have proposed several feature-based methods to measure semantic similarity using knowledge resources like Wikipedia and WordNet. While Wikipedia covers millions of concepts with multiple features, it has some limitations such as articles with limited content and concept ambiguity. Disambiguating these concepts remains a challenge. Conversely, WordNet offers unambiguous terms by covering all possible senses, making it a useful resource for disambiguating Wikipedia concepts. Additionally, WordNet can enrich the limited content of Wikipedia articles. Thus, we present a new approach that combines both resources to enhance previous feature-based methods of semantic similarity. We begin by analyzing the limitations of previous research, followed by introducing a novel method to disambiguate Wikipedia concepts using WordNet's synonym structure, resulting in more effective disambiguation. Furthermore, we use WordNet to supplement the features in Wikipedia articles and redefine the feature similarity functions. Finally, we train non-linear fitting-based models to measure semantic similarity. Our approach outperforms other previous methods on various benchmarks. To further showcase our approach, we apply our models to develop a movie recommender system using the MovieLens dataset, which consistently outperforms other systems. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Specialised language and conceptual knowledge in lexicographic portals.
- Author
-
Giacomini, Laura
- Abstract
Copyright of Lexicographica is the property of De Gruyter and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
46. Lexical Semantics Identification Using Fuzzy Centrality Measures and BERT Embedding
- Author
-
Jain, Minni, Jindal, Rajni, and Jain, Amita
- Published
- 2024
- Full Text
- View/download PDF
47. Reversal of the Word Sense Disambiguation Task Using a Deep Learning Model
- Author
-
Algirdas Laukaitis
- Subjects
word sense disambiguation ,natural language processing ,WordNet ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Word sense disambiguation (WSD) remains a persistent challenge in the natural language processing (NLP) community. While various NLP packages exist, the Lesk algorithm in the NLTK library demonstrates suboptimal accuracy. In this research article, we propose an innovative methodology and an open-source framework that effectively addresses the challenges of WSD by optimizing memory usage without compromising accuracy. Our system seamlessly integrates WSD into NLP tasks, offering functionality similar to that provided by the NLTK library. However, we go beyond the existing approaches by introducing a novel idea related to WSD. Specifically, we leverage deep neural networks and consider the language patterns learned by these models as the new gold standard. This approach suggests modifying existing semantic dictionaries, such as WordNet, to align with these patterns. Empirical validation through a series of experiments confirmed the effectiveness of our proposed method, achieving state-of-the-art performance across multiple WSD datasets. Notably, our system does not require the installation of additional software beyond the well-known Python libraries. The classification model is saved in a readily usable text format, and the entire framework (model and data) is publicly available on GitHub for the NLP research community.
- Published
- 2024
- Full Text
- View/download PDF
48. Monitoring semantic relatedness and revealing fairness and biases through trend tests
- Author
-
Jean-Rémi Bourguet and Adama Sow
- Subjects
Semantic relatedness ,Fairness ,Biases ,WordNet ,ReVerb ,Visualization ,Information technology ,T58.5-58.64 - Abstract
An emerging application domain concerning content-based recommender systems provides a better consideration of the semantics behind textual descriptions. Traditional approaches often miss relevant information due to their sole focus on syntax. However, the Semantic Web community has enriched resources with cultural and linguistic background knowledge, offering new standards for word categorization. This paper proposes a framework that combines the information extractor ReVerb with the WordNet taxonomy to monitor global semantic relatedness scores. Additionally, an experimental validation confronts human-based semantic relatedness scores with theoretical ones, employing Mann–Kendall trend tests to reveal fairness and biases. Overall, our framework introduces a novel approach to semantic relatedness monitoring by providing valuable insights into fairness and biases.
- Published
- 2025
- Full Text
- View/download PDF
49. Semantic-Based Integrated Plagiarism Detection Approach for English Documents.
- Author
-
Kaur, Manpreet, Gupta, Vishal, and Kaur, Ravreet
- Subjects
- *
PLAGIARISM , *NATURAL language processing , *PERFORMANCE standards - Abstract
The proposed work models a novel plagiarism detection system based on the semantic features to uncover the cases of plagiarism. The system constructs the dynamic relation matrix for each suspicious and source sentence pair to measure the degree of similarity using semantic features. Two Weighted Inverse Distance and GlossDice procedures show several text properties (synonyms, shortest path, etc.) to overcome the limitations of the existing features and new similarity metric for plagiarism detection are presented in this paper. Moreover, this research investigates the independent performance of various features to detect plagiarized cases and combine the best features by assigning different weight contributions to further enhance the system performance. Weighted Inverse Distance integrated with SynJaccard boosts the system performance and shows promising results. Initially, all the experiments were performed on PAN-PC-11dataset, and then PAN-14 text alignment dataset was used to validate the results of the proposed system. The effectiveness of the proposed system has been measured using standard performance measures i.e. Precision, Recall, F-measure, Granularity, and Plagdet score. The proposed system has outperformed the other baseline systems with precision (0.9459), recall (0.8861), f-measure (0.8917), and plagdet (0.8857) on the PAN-PC-11 dataset. For PAN-14 text alignment, the system exhibits precision (0.9257), recall (0.9055), f-measure (0.8931), and plagdet (0.8806). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. Analysis of web data classification methods based on semantic similarity measure.
- Author
-
Ramesh, Kante and R, Mohanasundram
- Subjects
- *
OPTIMIZATION algorithms , *DATA analysis , *EVIDENCE gaps , *WEB search engines , *CLASSIFICATION - Abstract
In this survey, 60 research papers are reviewed based on various web data classification techniques, which are used for effective classification of web data and measuring the semantic relatedness between the two words. The web data classification techniques are classified into three types, such as semantic-based approach, search engine-based approach, and WordNet-based approach, and the research issues and challenges confronted by the existing techniques are reported in this survey. Moreover, the analysis is carried out based on the research works using the categorized web data classification techniques, dataset, and evaluation metrics are carried out. From the analysis, it is clear that semantic-based approach is the widely used techniques in the classification of web data. Similarly, Miller-Charles dataset is the most commonly used dataset in most of the research papers, and the evaluation metrics, like precision, recall, and F-measure are widely utilized in web data classification. The insights from this manuscript can be utilized to understand various research gaps and problems in this area. Those can be considered in the future by developing novel optimization algorithms, which might enhance the performance of web data classifications. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.