Publication Year Range: Last 10 years / Search Limiters: Peer Reviewed / Topic: information retrieval and semantics - Searchworks@Jio Institute Digital Library Search Results

1. Increasing papers' discoverability with precise semantic labeling: The sci.AI publishing platform.

Author: Gurinovich, Roman, Pashuk, Alexander, Petrovskiy, Yuriy, Dmitrievskij, Alex, Kuryan, Oleg, Scerbacov, Alexei, Tiggre, Antonia, Moroz, Elena, Buinitskaya, Yuliya, and Nikolsky, Yuri
Subjects: *MEDICAL publishing, *DATA mining, *INFORMATION retrieval, *SEMANTICS, *SEARCH engines
Abstract: The number of published findings in biomedicine increases continually. At the same time, specifics of the domain's terminology complicates the task of relevant publications retrieval. In the current research, we investigate influence of terms' variability and ambiguity on a paper's likelihood of being retrieved. We obtained statistics that demonstrate significance of the issue and its challenges, followed by presenting the sci.AI platform, which allows precise terms labeling as a resolution. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

2. Generating keyphrases for readers: A controllable keyphrase generation framework.

Author: Jiang, Yi, Meng, Rui, Huang, Yong, Lu, Wei, and Liu, Jiawei
Subjects: *SEMANTICS, *NATURAL language processing, *TASK performance, *CONCEPTUAL structures, *INFORMATION retrieval, *ACCESS to information, *INFORMATION science, *DESCRIPTIVE statistics, *RESEARCH funding, *ABSTRACTING & indexing services, *READING, *BLOGS, *INFORMATION technology
Abstract: With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end‐to‐end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro‐avgs of P@5, R@5, and F1@5 on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. Semantic-Based Hybrid Query Reformulation for Biomedical Information Retrieval.

Author: Selmi, Wided, Kammoun, Hager, and Amous, Ikram
Subjects: *INFORMATION retrieval, *INFORMATION storage & retrieval systems, *SEMANTICS
Abstract: Query reformulation is a well-known technique intended to improve the performance of Information Retrieval Systems. Among the several available techniques, Query Expansion (QE) reformulates the initial query by adding similar terms, drawn from several sources (corpus, knowledge resources), to the query terms in order to retrieve more relevant documents. Most QE methods are based on the relationships between the original query term and candidate terms (new terms) in order to select the most similar expansion terms. In this paper, we suggested a new hybrid query reformulation through QE and term re-weighting techniques. The suggested approach aimed to demonstrate the effectiveness of QE with a semantic selection of candidate terms according to the specificity of original query terms in the improvement of retrieval performance. To this end, we exploited both relationships defined by knowledge resources and the distributed semantics, recently revealed by neural network analysis. For term re-weighting, we proposed a new semantic method based on semantic similarity measure that assigns a weight to each term of the expanded query. The conducted experiments on OHSUMED and TREC 2014 CDS test collections, including long and short queries, yielded significant results that outperformed the baseline and state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Extracting the evolutionary backbone of scientific domains: The semantic main path network analysis approach based on citation context analysis.

Author: Jiang, Xiaorui and Liu, Junjun
Subjects: *SEMANTICS, *DEEP learning, *MOTIVATION (Psychology), *RESEARCH methodology, *LINGUISTICS, *INFORMATION resources management, *NATURAL language processing, *TASK performance, *CITATION analysis, *CONCEPTUAL structures, *SEMANTIC Web, *INFORMATION science, *LATENT semantic analysis, *INTELLECT, *INFORMATION retrieval, *RESEARCH funding, *AUTOMATION, *PATH analysis (Statistics), *ELECTRONIC publications, *STATISTICAL models, *INFORMATION technology, *ALGORITHMS
Abstract: Main path analysis is a popular method for extracting the scientific backbone from the citation network of a research domain. Existing approaches ignored the semantic relationships between the citing and cited publications, resulting in several adverse issues, in terms of coherence of main paths and coverage of significant studies. This paper advocated the semantic main path network analysis approach to alleviate these issues based on citation function analysis. A wide variety of SciBERT‐based deep learning models were designed for identifying citation functions. Semantic citation networks were built by either including important citations, for example, extension, motivation, usage and similarity, or excluding incidental citations like background and future work. Semantic main path network was built by merging the top‐K main paths extracted from various time slices of semantic citation network. In addition, a three‐way framework was proposed for the quantitative evaluation of main path analysis results. Both qualitative and quantitative analysis on three research areas of computational linguistics demonstrated that, compared to semantics‐agnostic counterparts, different types of semantic main path networks provide complementary views of scientific knowledge flows. Combining them together, we obtained a more precise and comprehensive picture of domain evolution and uncover more coherent development pathways between scientific ideas. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Assisting researchers in bibliographic tasks: A new usable, real‐time tool for analyzing bibliographies.

Author: Dattolo, Antonina and Corbatto, Marco
Subjects: *SEMANTICS, *BIBLIOGRAPHIC databases, *METADATA, *USER interfaces, *TASK performance, *BIBLIOGRAPHY, *DOCUMENTATION, *BIBLIOGRAPHICAL citations, *DESCRIPTIVE statistics, *INFORMATION retrieval, *STATISTICAL models, *DATA analysis
Abstract: The amount of scientific papers is growing together with the development of science itself; but, although there is an unprecedented availability of large citation indexes, some daily activities of researchers remain time‐consuming and poorly supported. In this paper, we present Visual Bibliographies (VisualBib), a real‐time visual platform, designed using a zz‐structure‐based model for linking metadata and a narrative, visual approach for showing bibliographies. VisualBib represents a usable, advanced, and visual tool, which simplifies the management of bibliographies, supports a core set of bibliographic tasks, and helps researchers during complex analyses on scientific bibliographies. We present the variety of metadata formats and visualization methods, proposing two use case scenarios. The maturity of the system implementation allowed us two studies, for evaluating both the effectiveness of VisualBib in providing answers to specific data analysis tasks and to support experienced users during real‐life uses. The results of the evaluation are positive and describe an effective and usable platform. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

6. LEOnto+: a scalable ontology enrichment approach.

Author: Sassi, Salma, Tissaoui, Anis, and Chbeir, Richard
Subjects: *ONTOLOGY, *DISTRIBUTION (Probability theory), *DIGITAL libraries, *INFORMATION retrieval, *SEMANTICS, *ONTOLOGIES (Information retrieval), *SCALABILITY
Abstract: Distributional semantic models like the Latent Dirichlet Allocation (LDA) model Guo et al. (Concurr. Comput.: Pract. Exper. 29(3), 319–343 2016) consist of defining similar representation of words according to their similar context. LDA has been originally used to model documents and extract topics in Information Retrieval. In recent years, LDA has become a hot topic among ontology learning because of the exponential increase of the number of documents and textual data not only on the web but also in digital libraries. LDA-based approaches have proven to provide the best result. However, they suffer of several limitations related to concept and relation extraction, as well as handling the corpus evolution and maintaining. In order to cope with these problems, we propose in this paper LEOnto+, an extended version of LEOnto (Tissaoui et al. 2020, Tissaoui et al. SN Comput. Sci. J. 1: 336 2020), to provide a new approach for automatic ontology enriching from textual corpus. In LEOnto+, LDA is used to provide dimension reduction and to identify semantic relationships between topic-document and word-topic using probability distributions. Here, we provide several experiments conducted using several evaluation techniques (Evaluation based criteria, Gold standard evaluation, Expert evaluation, Task-based evaluation and Corpus-based evaluation). We also compare the results of LEOnto+ with two existing methods using their respective datasets. The evaluation results show that LEOnto+ outperforms the aforementioned methods (particularly in terms of precision). We also compare our approach using two large corpus in order to demonstrate its scalability. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

7. In favour of or against multi-lingual Q&A sites? Exploring the evidence from user and knowledge perspectives.

Author: Jia, Junfang, Tumanian, Valeriia, and Li, Guoqiang
Subjects: *COMPUTER software, *STATISTICS, *SEMANTICS, *ENGLISH language, *MULTILINGUALISM, *CONSUMER attitudes, *INTELLECT, *SEMANTIC Web, *INFORMATION retrieval, *RESEARCH funding, *WEB development, *ARTIFICIAL neural networks, *DATA analysis, *STATISTICAL sampling, *WORLD Wide Web, *TRANSLATIONS
Abstract: Many Q&A sites initially run only in English, and then gradually release their multi-lingual variants to serve users who speak other languages. The launch of such multi-lingual sites always lead to an intense dispute about the pros and cons of multi-lingual sites. Although all arguments and concerns sound reasonable, people can rarely provide solid evidence to convince each other. In this paper, from users' comments about the launch of several non-English Stack Overflow sites, we first identify three major concerns including community split, knowledge needs and interests in other languages, and knowledge fragmentation and duplication. To validate these three concerns, we conduct an evidence-based data analysis and comparison of user characteristics, tag usage and cross-site links between the Russian Stack Overflow and the English Stack Overflow on these three concerns. Our study sheds light on the existence value and risks of multi-lingual Q&A sites. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

8. Exploiting Subspace Relation in Semantic Labels for Cross-Modal Hashing.

Author: Shen, Heng Tao, Liu, Luchen, Yang, Yang, Xu, Xing, Huang, Zi, Shen, Fumin, and Hong, Richang
Subjects: *BINARY codes, *INFORMATION retrieval, *MATHEMATICAL optimization, *COMPUTER programming education, *DATA mapping
Abstract: Hashing methods have been extensively applied to efficient multimedia data indexing and retrieval on account of the explosion of multimedia data. Cross-modal hashing usually learns binary codes by mapping multi-modal data into a common Hamming space. Most supervised methods utilize relation information like class labels as pairwise similarities of cross-modal data pair to narrow intra-modal and inter-modal gap. In this paper, we propose a novel supervised cross-modal hashing method dubbed Subspace Relation Learning for Cross-modal Hashing (SRLCH), which exploits relation information of labels in semantic space to make similar data from different modalities closer in the low-dimension Hamming subspace. SRLCH preserves the modality relationships, the discrete constraints and nonlinear structures, while admitting a closed-form binary codes solution, which effectively enhances the training efficiency. An iterative alternative optimization algorithm is developed to simultaneously learn both hash functions and unified binary codes. With these binary codes and hash functions, we can index multimedia data and search them in an efficient way. Evaluations in two cross-modal retrieval tasks on several widely-used datasets show that the proposed SRLCH outperforms most cross-modal hashing methods. Theoretical analysis also illustrates reasons for our method’s promotion in subspace relation learning. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

9. A Personalized Search Query Generating Method for Safety-Enhanced Vehicle-to-People Networks.

Author: Yan, Xiaodan, Zhang, Jiwei, Elahi, Haroon, Jiang, Meiyi, and Gao, Hui
Subjects: *INFORMATION storage & retrieval systems, *INFORMATION retrieval, *SEARCH engines, *TRAFFIC safety, *TRAFFIC accidents, *INFORMATION services
Abstract: Distracted driving due to smartphone use is one of the key reasons for road accidents. However, the 6G super-heterogeneous network systems and highly differentiated application scenarios require highly elastic and endogenous information services involving the use of smart apps, and related information retrieval by drivers in modern Vehicle-to-People (V2P) Networks. The tension raised due to the conflicting attention requirements of driving and information retrieval can be resolved by designing information retrieval solutions that demand minimal user interaction. In this paper, we construct a Personalized Search Query Generator (PSQG) to reduce driver-mobile interaction during information retrieval in the 6 G era. This system has a query generator and a query recommendation component that update two sets of relationships dynamically: one is the query and the title, another is search and recommendation. The proposed system learns a user's intent based on historical query records and recommends personalized queries, thus reducing the driver-mobile interaction time. We deploy the system into a real search engine and conduct several online experiments. These experiments are conducted using a custom constructed dataset comprising ten million samples. We use the BLEU-score metric and perform A/B testing. The results demonstrate that our system can assist users in making precise queries efficiently. The proposed system can improve drivers’ safety if used in smartphones and other information retrieval systems in vehicles. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

10. Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network.

Author: Guo, Zhaoyu, Zhao, Zhou, Jin, Weike, Wei, Zhicheng, Yang, Min, Wang, Nannan, and Yuan, Nicholas Jing
Subjects: *VIDEOS, *INFORMATION retrieval, *REINFORCEMENT learning, *MACHINE learning
Abstract: Video question generation is a challenging task in visual information retrieval, which generates questions given a sequence of video frames. The existing methods mainly tackle the problem of single-turn video question generation, but single-turn conversation usually can’t meet the needs of video information acquisition. In this paper, we propose a new framework for single-turn VQG, which introduces attention mechanism to process inference of dialog history. And we introduce selection mechanism to choose from the candidate questions generated by each round of dialog history. In the framework, we leverage a recent video question answering model to predict the answer to the generated question and adopt the answer quality as rewards to fine-tune our model based on a reinforced learning mechanism. We also introduce a new task of multi-turn video question generation (M-VQG), which is generating multiple questions based on dialog history and video information to build conversation step by step. Our method achieves the state-of-the-art performance of the single-turn VQG task on two large-scale datasets, YouTube-Clips and TACoS-MultiLevel, and provides a baseline approach for M-VQG task. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

11. Creation, Storage and Presentation of Information Content - Semantics, Sharing, Presentation, and Archiving.

Author: Masner, Jan, Šimek, Pavel, Kánská, Eva, and Vaněk, Jiří
Subjects: *INFORMATION retrieval, *SEMANTIC Web, *SEMANTICS, *WORLD Wide Web, *SEARCH engines, *OPEN source software, *ELECTRONIC publications
Abstract: People are getting more and more used to consume a digital, online content. Many outlets switched to online publication or at least increased their online presence. Besides, online publication is not the only domain of publishing houses. Many different organisations and companies - including those from area of agriculture and rural development - provide online content in form of articles. Importance of semantic web is growing constantly. Together with metadata descriptions, it is necessary for all the current search engines, smart assistants and AI technologies. Public standards and open source software can significantly speed up development and reduce costs when it comes to the Internet and World Wide Web. The paper provides overview of an updated methodology for creation, storage and presentation of online information content in World Wide Web environment. The latest research was focused mainly on presentation and semantics. The whole research process is established as well as the final f ormulation of the methodology. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

12. Query Expansion With Local Conceptual Word Embeddings in Microblog Retrieval.

Author: Wang, Yashen, Huang, Heyan, and Feng, Chong
Subjects: *INFORMATION retrieval, *INFORMATION needs, *VOCABULARY
Abstract: Since the length of microblog texts, such as tweets, is strictly limited to 140 characters, traditional Information Retrieval techniques suffer from the vocabulary mismatch problem severely and cannot yield good performance in the context of microblogosphere. To address this critical challenge, in this paper, we focus on the use of local conceptual word embeddings for enhance microblog retrieval effectiveness. In particular, we propose a novel k-Nearest Neighbor (k NN) based Query Expansion (QE) algorithm to generate words from local word embeddings to expand the original query, which leads to better understanding of the information need. Besides, in order to further satisfy users’ real-time information need, we incorporate temporal evidences into the expansion algorithm, which can boost recent tweets in the retrieval results with respect to a given topic. Experimental results on the official TREC Twitter corpora demonstrate the significant superiority of our approach over baseline methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

13. Event Detection on Microposts: A Comparison of Four Approaches.

Author: Bhardwaj, Akansha, Blarer, Albert, Cudre-Mauroux, Philippe, Lenders, Vincent, Motik, Boris, Tanner, Axel, and Tonon, Alberto
Subjects: *INFORMATION retrieval, *INFORMATION resources, *NUMBER systems, *KEYWORD searching, *TIME series analysis, *SEMANTIC Web
Abstract: Microblogging services such as Twitter are important, up-to-date, and live sources of information on a multitude of topics and events. An increasing number of systems use such services to detect and analyze events in real-time as they unfold. In this context, we recently proposed ArmaTweet—a system developed in collaboration among armasuisse and the Universities of Oxford and Fribourg to support semantic event detection on Twitter streams. Our experiments have shown that ArmaTweet is successful at detecting many complex events that cannot be detected by simple keyword-based search methods alone. Building up on this work, we explore in this paper several approaches for event detection on microposts. In particular, we describe and compare four different approaches based on keyword search (Plain-Seed-Query), information retrieval (Temporal Query Expansion), Word2Vec word embeddings (Embedding), and semantic retrieval (ArmaTweet). We provide an extensive empirical evaluation of these techniques using a benchmark dataset of about 200 million tweets on six event categories that we collected. While the performance of individual systems varies depending on the event category, our results show that ArmaTweet outperforms the other approaches on five out of six categories, and that a combined approach offers highest recall without adversely affecting precision of event detection. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. An Art of Review on Conceptual based Information Retrieval.

Author: Mahalakshmi, P. and Fathima, N. Sabiyath
Subjects: *CONCEPTUAL art, *INFORMATION retrieval, *QUERY (Information retrieval system), *SEMANTICS, *KEYWORD searching, *DOCUMENT clustering
Abstract: Basically keywords are used to index and retrieve the documents for the user query in a conventional information retrieval systems. When more than one keywords are used for defining the single concept in the documents and in the queries, inaccurate and incomplete results were produced by keyword based retrieval systems. Additionally, manual interventions are required for determining the relationship between the related keywords in terms of semantics to produce the accurate results which have paved the way for semantic search. Various research work has been carried out on concept based information retrieval to tackle the difficulties that are caused by the conventional keyword search and the semantic search systems. This paper aims at elucidating various representation of text that is responsible for retrieving relevant search results, approaches along with the evaluation that are carried out in conceptual information retrieval, the challenges faced by the existing research to expatiate requirements of future research. In addition, the conceptual information that are extracted from the different sources for utilizing the semantic representation by the existing systems have been discussed. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

15. Research Collaboration and Authorship Pattern in the field of Semantic Digital Libraries.

Author: Pandey, Shriram and Sahoo, Sidhartha
Subjects: *DIGITAL libraries, *AUTHORSHIP collaboration, *SEMANTICS, *METADATA, *INFORMATION retrieval, *NATURAL language processing, *INFORMATION science, *SEMANTIC Web
Abstract: This study aims to explore research collaborations and authorship patterns in the field of semantic digital libraries(SDL). The data is extracted (N=2075) from the Scopus database using keywords related to semantic digital libraries by considering all types of publications during 1983-2019. The analysis of each document is based on the following scientometrics indicators: author productivity, degree of collaboration, collaboration index, collaboration coefficient and modified collaboration coefficient. Correlation matrices were also calculated and inferences drawn in terms of authors and publications. A network visualisation tool VOSviewer was used to present authorship correlation network strength and keyword mapping for a better insight into the emerging areas in the field of SDL. The resulting average degree of collaboration of 0.898 indicates that a large number of publications are multi-authored and that there is a higher level of collaborative research in the field of semantic digital libraries. Meghini C from the Institute of Information Science and Technologies, Italy has produced the highest number of research paper(n=18) whereas Egenhofer MJ found to be a profoundly impacted author with 851 citations on in the studied domain. Results also reveal that the focus areas of research related to SDL include digital libraries, semantic web, ontology, metadata and information retrieval. However, keywords such as natural language processing system, computational linguistics, linked data are also repeated frequently in the published literature, thus revealing the emerging areas of the future research in the domain of SDL. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

16. Detecting trends in academic research from a citation network using network representation learning.

Author: Asatani, Kimitaka, Mori, Junichiro, Ochi, Masanao, and Sakata, Ichiro
Subjects: *CITATION networks, *EDUCATION research, *INFORMATION retrieval, *ALTERNATIVE fuels, *LINEAR algebra
Abstract: Several network features and information retrieval methods have been proposed to elucidate the structure of citation networks and to detect important nodes. However, it is difficult to retrieve information related to trends in an academic field and to detect cutting-edge areas from the citation network. In this paper, we propose a novel framework that detects the trend as the growth direction of a citation network using network representation learning(NRL). We presume that the linear growth of citation network in latent space obtained by NRL is the result of the iterative edge additional process of a citation network. On APS datasets and papers of some domains of the Web of Science, we confirm the existence of trends by observing that an academic field grows in a specific direction linearly in latent space. Next, we calculate each node’s degree of trend-following as an indicator called the intrinsic publication year (IPY). As a result, there is a correlation between the indicator and the number of future citations. Furthermore, a word frequently used in the abstracts of cutting-edge papers (high-IPY paper) is likely to be used often in future publications. These results confirm the validity of the detected trend for predicting citation network growth. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

17. A Novel Method on Information Recommendation via Hybrid Similarity.

Author: Zhao, Qin, Wang, Cheng, Wang, Pengwei, Zhou, MengChu, and Jiang, Changjun
Subjects: *DATA mining, *INFORMATION retrieval
Abstract: Link similarity is widely applied in measuring the similarity between such objects as Web pages, scientific papers, and social networks. However, there are some deficiencies in the existing methods to measure it. For example, they cannot handle some semantic-similar contents. Their computation may not lead to accurate results in some cases. This paper presents a novel method to do so. It introduces the semantic similarity to calculate the similarity between two given objects, and overcomes the drawback caused by the fact that the existing methods ignore the semantic information of objects. It also gives a novel computation function to make the computing result of similarity more accurate. [ABSTRACT FROM PUBLISHER]
Published: 2018
Full Text: View/download PDF

18. An analytical study of content and context of keywords on physics.

Author: Dutta, Bidyarthi
Subjects: *PHYSICS, *KEYWORDS, *SEMANTICS, *CONTEXTUAL analysis, *VOCABULARY
Abstract: This paper is based on the analysis of author-assigned and title keywords and their constituent component words collected from 769 articles published in the journal Low Temperature Physics since the year 2006 to 2010. The total number of distinct keywords is 1155 of which 869 are single keywords having total frequency of occurrence of 2287. The single keywords have been categorized in four broad classes, viz. eponymous word, form word, acronym and semantic word. A semantic word bears several contexts and thus it may be considered as relevant in several other subject areas. The probable subject areas have been found with the aid of two popular online reference tools. The semantic words are further categorized in twelve classes according to their contexts. Some parameters have been defined on the basis of associations among the words and formation of keywords in consequence, i.e. Word Association Density, Word Association Coefficient and Keyword Formation Density. The values of these parameters have been observed for different word categories. The statistics of word association tending keyword formation would be known from this study. The allied subject domains also become predictable from this study. [ABSTRACT FROM AUTHOR]
Published: 2020

19. Learning cross-modal correlations by exploring inter-word semantics and stacked co-attention.

Author: Yu, Jing, Lu, Yuhang, Zhang, Weifeng, Qin, Zengchang, Liu, Yanbing, and Hu, Yue
Subjects: *ARTIFICIAL neural networks, *SEMANTICS, *FEATURE extraction, *DEEP learning, *INFORMATION retrieval, *COST functions
Abstract: • Featured graph models inter-word semantics of texts. • Stacked co-attention learns fine-grained correlations. • 19% increase on MAP. Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to learn the semantic correlations between different modalities and measure the distance across modalities. For text-image retrieval, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we explore the inter-word semantics by modelling texts by graphs using similarity measure based on word2vec. Besides feature presentations, we further study the problem of information imbalance between different modalities when describing the same semantics. For example textual descriptions often contain more background information that cannot be conveyed by images and vice versa. We propose a stacked co-attention network to progressively learn the mutually attended features of different modalities and enhance their fine-grained correlations. A dual-path neural network is proposed for cross-modal information retrieval. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 19% improvement on accuracy for the best case. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

20. Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks.

Author: Zhao, Zhou, Zhang, Zhu, Xiao, Shuwen, Xiao, Zhenxin, Yan, Xiaohui, Yu, Jun, Cai, Deng, and Wu, Fei
Subjects: *STREAMING video & television, *NATURAL languages, *VIDEOS, *INFORMATION retrieval, *REINFORCEMENT learning, *QUESTIONING
Abstract: Open-ended long-form video question answering is a challenging task in visual information retrieval, which automatically generates a natural language answer from the referenced long-form video contents according to a given question. However, the existing works mainly focus on short-form video question answering, due to the lack of modeling semantic representations from long-form video contents. In this paper, we introduce a dynamic hierarchical reinforced network for open-ended long-form video question answering, which employs an encoder–decoder architecture with a dynamic hierarchical encoder and a reinforced decoder. Concretely, we first propose a frame-level dynamic long-short term memory (LSTM) network with binary segmentation gate to learn frame-level semantic representations according to the given question. We then develop a segment-level highway LSTM network with a question-aware highway gate for segment-level semantic modeling. Furthermore, we devise the reinforced decoder with a hierarchical attention mechanism to generate natural language answers. We construct a large-scale long-form video question answering dataset. The extensive experiments on the long-form dataset and another public short-form dataset show the effectiveness of our method. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

21. Should one use term proximity or multi-word terms for Arabic information retrieval?

Author: El Mahdaouy, Abdelkader, Gaussier, Eric, and El Alaoui, Saïd Ouatik
Subjects: *INFORMATION retrieval, *PROXIMITY matrices, *DEPENDENCY grammar, *SEMANTICS, *ARABIC language -- Terms & phrases, *WORD formation (Grammar)
Abstract: • Explore whether term dependencies (TDs) can help improve Arabic IR systems. • Consider explicit term dependencies based on MWTs and implicit term proximity. • The study provides complete extensions and their comparison to deal with TDs. • Formal condition that IR models should satisfy to deal with term dependencies. • The difference between both TDs performances is not statistically significant. Recently, several information retrieval (IR) models have been proposed in order to boost the retrieval performance using term dependencies. However, in the context of the Arabic language, most IR researchers have focused on the problem of stemming, which is highly challenging in this language. In this paper, we propose to explore whether term dependencies can help improve Arabic IR systems, and what are the best methods to use. To do so, we consider both explicit term dependencies based on multi-word terms (MWTs) that are extracted using syntactic patterns and statistical filters, as well as implicit ones based on the notion of cross-terms or term proximities. Our experiments, performed on standard TREC Arabic IR collections, show the importance of taking into account term dependencies for Arabic IR. To the best of our knowledge, this is the first study that provides complete extensions, and their comparison, of most standard IR models to deal with term dependencies in the Arabic language. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

22. Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval.

Author: Zhou, Guangyou and Huang, Jimmy Xiangji
Subjects: *METADATA, *FACILITATED learning, *INTERROGATIVE (Grammar), *DOCUMENT type definitions, *ARCS Model of Motivational Design
Abstract: Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

23. Relations in KOS: is it possible to couple a common nature with different roles?

Author: Mazzocchi, Fulvio
Subjects: *KNOWLEDGE management, *HERMENEUTICS, *SYSTEMS design, *INFORMATION science, *SEMANTIC computing
Abstract: Purpose The purpose of this paper, which increases and deepens what was expressed in a previous work (Mazzocchi et al., 2007), is to scrutinize the underlying assumptions of the types of relations included in thesauri, particularly the genus-species relation. Logicist approaches to information organization, which are still dominant, will be compared with hermeneutically oriented approaches. In the light of these approaches, the nature and features of the relations, and what the notion of a priori could possibly mean with regard to them, are examined, together with the implications for designing and implementing knowledge organizations systems (KOS).Design/methodology/approach The inquiry is based on how the relations are described in literature, engaging in particular a discussion with Hjørland (2015) and Svenonius (2004). The philosophical roots of today’s leading views are briefly illustrated, in order to put them under perspective and deconstruct the uncritical reception of their authority. To corroborate the discussion a semantic analysis of specific terms and relations is provided too.Findings All relations should be seen as “perspectival” (not as a priori). On the other hand, different types of relations, depending on the conceptual features of the terms involved, can hold a different degree of “stability.” On this basis, they could be used to address different information concerns (e.g. interoperability vs expressiveness).Research limitations/implications Some arguments that the paper puts forth at the conceptual level need to be tested in application contexts.Originality/value This paper considers that the standpoint of logic and of hermeneutic (usually seen as conflicting) are both significant for information organization, and could be pragmatically integrated. In accordance with this view, an extension of thesaurus relations’ set is advised, meaning that perspective hierarchical relations (i.e. relations that are not logically based but function contingently) should be also included in such a set. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

24. Making knowledge machine-processable: some implications of general semantic search.

Author: Waller, Vivienne
Subjects: *CLASSIFICATION, *INFORMATION retrieval, *SEMANTICS, *WORLD Wide Web, *REFERENCE sources, *SEARCH engines
Abstract: The central argument of this paper is that the design, implementation and use of technologies that underpin general semantic search have implications for what we know and the way in which knowledge is understood. Semantic search is an assemblage of technologies that most Internet users would use regularly without necessarily realising. Users of search engines implementing semantic search can obtain answers to questions rather than just retrieve pages that include their search query. This paper critically examines the design of the Semantic Web, upon which semantic search is based. It demonstrates that implicit in the design of the Semantic Web are particular assumptions about the nature of classification and the nature of knowledge. The Semantic Web was intended for interoperability within specific domains. It is here argued that the extension to general semantic search, for use by the general public, has implications for what type of knowledge is visible and what counts as legitimate knowledge. The provision of a definitive answer to a query, via the reduction of discursive knowledge into machine-processable data, provides the illusion of objectivity and authority in a way that is increasingly impenetrable to critical scrutiny. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

25. Detecting, classifying, and tracing non-functional software requirements.

Author: Mahmoud, Anas and Williams, Grant
Subjects: *COMPUTER software quality control, *FRAD (Conceptual model), *COMPUTER software selection, *DOCUMENT clustering, *SOURCE code, *COMPUTER software
Abstract: In this paper, we describe a novel unsupervised approach for detecting, classifying, and tracing non-functional software requirements (NFRs). The proposed approach exploits the textual semantics of software functional requirements (FRs) to infer potential quality constraints enforced in the system. In particular, we conduct a systematic analysis of a series of word similarity methods and clustering techniques to generate semantically cohesive clusters of FR words. These clusters are classified into various categories of NFRs based on their semantic similarity to basic NFR labels. Discovered NFRs are then traced to their implementation in the solution space based on their textual semantic similarity to source code artifacts. Three software systems are used to conduct the experimental analysis in this paper. The results show that methods that exploit massive sources of textual human knowledge are more accurate in capturing and modeling the notion of similarity between FR words in a software system. Results also show that hierarchical clustering algorithms are more capable of generating thematic word clusters than partitioning clustering techniques. In terms of performance, our analysis indicates that the proposed approach can discover, classify, and trace NFRs with accuracy levels that can be adequate for practical applications. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

26. Introducing Multimedia Information Retrieval to libraries.

Author: Raieli, Roberto
Subjects: *MULTIMEDIA systems, *DIGITAL libraries, *INFORMATION retrieval, *SEMANTICS, *INFORMATION processing
Abstract: The paper aims to introduce libraries to the view that operating within the terms of traditional Information Retrieval (IR), only through textual language, is limitative, and that considering broader criteria, as those of Multimedia Information Retrieval (MIR), is necessary. The paper stresses the story of MIR fundamental principles, from early years of questioning on documentation to today's theories on semantic means. New issues for a LIS methodology of processing and searching multimedia documents are theoretically argued, introducing MIR as a holistic whole composed by content-based and semantic information retrieval methodologies. MIR offers a better information searching way: every kind of digital document can be analyzed and retrieved through the elements of language appropriate to its own nature. MIR approach directly handles the concrete content of documents, also considering semantic aspects. Paper conclusions remark the organic integration of the revolutionary contentual conception of information processing with an improved semantics conception, gathering and composing advantages of both systems for accessing to information. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

27. Local Semantic-Aware Deep Hashing With Hamming-Isometric Quantization.

Author: Wang, Yunbo, Liang, Jian, Cao, Dong, and Sun, Zhenan
Subjects: *HASHING, *BINARY codes, *HAMMING distance, *INFORMATION retrieval, *BIG data, *ARTIFICIAL neural networks, *IMAGE retrieval
Abstract: Hashing is a promising approach for compact storage and efficient retrieval of big data. Compared to the conventional hashing methods using handcrafted features, emerging deep hashing approaches employ deep neural networks to learn both feature representations and hash functions, which have been proven to be more powerful and robust in real-world applications. Currently, most of the existing deep hashing methods construct pairwise or triplet-wise constraints to obtain similar binary codes between a pair of similar data points or relatively similar binary codes within a triplet. However, we argue that some critical local structures have not been fully exploited. So, this paper proposes a novel deep hashing method named local semantic-aware deep hashing with Hamming-isometric quantization (LSDH), aiming to make full use of local similarity in hash function learning. Specifically, the potential semantic relation is exploited to robustly preserve local similarity of data in the Hamming space. In addition to reducing the error introduced by binary quantizing, a Hamming-isometric objective is designed to maximize the consistency of similarity between the pairwise binary-like features and corresponding binary codes pair, which is shown to be able to improve the quality of binary codes. Extensive experimental results on several benchmark datasets, including three single-label datasets and one multi-label dataset, demonstrate that the proposed LSDH achieves better performance than the latest state-of-the-art hashing methods. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

28. Deep Ordinal Hashing With Spatial Attention.

Author: Jin, Lu, Shu, Xiangbo, Li, Kai, Li, Zechao, Qi, Guo-Jun, and Tang, Jinhui
Subjects: *INFORMATION retrieval, *HASHING, *COMPUTER vision, *INFORMATION sharing, *ALGORITHMS
Abstract: Hashing has attracted increasing research attention in recent years due to its high efficiency of computation and storage in image retrieval. Recent works have demonstrated the superiority of simultaneous feature representations and hash functions learning with deep neural networks. However, most existing deep hashing methods directly learn the hash functions by encoding the global semantic information, while ignoring the local spatial information of images. The loss of local spatial structure makes the performance bottleneck of hash functions, therefore limiting its application for accurate similarity retrieval. In this paper, we propose a novel deep ordinal hashing (DOH) method, which learns ordinal representations to generate ranking-based hash codes by leveraging the ranking structure of feature space from both local and global views. In particular, to effectively build the ranking structure, we propose to learn the rank correlation space by exploiting the local spatial information from fully convolutional network and the global semantic information from the convolutional neural network simultaneously. More specifically, an effective spatial attention model is designed to capture the local spatial information by selectively learning well-specified locations closely related to target objects. In such hashing framework, the local spatial and global semantic nature of images is captured in an end-to-end ranking-to-hashing manner. Experimental results conducted on three widely used datasets demonstrate that the proposed DOH method significantly outperforms the state-of-the-art hashing methods. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

29. Automating the search for a patent’s prior art with a full text similarity search.

Author: Helmers, Lea, Horn, Franziska, Biegler, Franziska, Oppermann, Tim, and Müller, Klaus-Robert
Subjects: *FULL-text databases, *PATENT infringement, *PATENT applications, *KEYWORD searching, *NATURAL language processing
Abstract: More than ever, technical inventions are the symbol of our society’s advance. Patents guarantee their creators protection against infringement. For an invention being patentable, its novelty and inventiveness have to be assessed. Therefore, a search for published work that describes similar inventions to a given patent application needs to be performed. Currently, this so-called search for prior art is executed with semi-automatically composed keyword queries, which is not only time consuming, but also prone to errors. In particular, errors may systematically arise by the fact that different keywords for the same technical concepts may exist across disciplines. In this paper, a novel approach is proposed, where the full text of a given patent application is compared to existing patents using machine learning and natural language processing techniques to automatically detect inventions that are similar to the one described in the submitted document. Various state-of-the-art approaches for feature extraction and document comparison are evaluated. In addition to that, the quality of the current search process is assessed based on ratings of a domain expert. The evaluation results show that our automated approach, besides accelerating the search process, also improves the search results for prior art with respect to their quality. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

30. Disease vocabulary size as a surrogate marker for physicians’ disease knowledge volume.

Author: Tanaka, Hiroaki, Ueda, Kazuhiro, Watanuki, Satoshi, Watari, Takashi, Tokuda, Yasuharu, and Okumura, Takashi
Subjects: *PHYSICIANS, *MEDICAL terminology, *DECISION support systems, *PROBLEM solving, *QUESTIONNAIRES
Abstract: Objective: Recognizing what physicians know and do not know about a particular disease is one of the keys to designing clinical decision support systems, since these systems can fulfill complementary role by recognizing this boundary. To our knowledge, however, no study has attempted to quantify how many diseases physicians actually know and thus the boundary is unclear. This study explores a method to solve this problem by investigating whether the vocabulary assessment techniques developed in the linguistics field can be applied to assess physicians’ knowledge. Methods: The test design required us to pay special attention to disease knowledge assessment. First, to avoid imposing unnecessary burdens on the physicians, we chose a self-assessment questionnaire that was straightforward to fill out. Second, to prevent overestimation, we used a “pseudo-word” approach: fictitious diseases were included in the questionnaire, and positive responses to them were penalized. Third, we used paper-based tests, rather than computer-based ones, to further prevent participants from cheating by using a search engine. Fourth, we selectively used borderline diseases, i.e., diseases that physicians might or might not know about, rather than well-known or little-known diseases, in the questionnaire. Results: We collected 102 valid answers from 109 physicians who attended the seminars we conducted. On the basis of these answers, we estimated that the average physician knew of 2008 diseases (95% confidence interval: (1939, 2071)). This preliminary estimation agrees with the guideline for the national license examination in Japan, suggesting that this vocabulary assessment was able to evaluate physicians’ knowledge. The survey included physicians with various backgrounds, but there were no significant differences between subgroups. Other implication for researches on clinical decision support and limitation of the sampling method adopted in this study are also discussed, toward more rigorous estimation in future surveys. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

31. Hyperspectral Unmixing Based on Dual-Depth Sparse Probabilistic Latent Semantic Analysis.

Author: Fernandez-Beltran, Ruben, Plaza, Antonio, Plaza, Javier, and Pla, Filiberto
Subjects: *REMOTE sensing, *HYPERSPECTRAL imaging systems, *LATENT semantic analysis, *INFORMATION retrieval, *IMAGING systems
Abstract: This paper presents a novel approach for spectral unmixing of remotely sensed hyperspectral data. It exploits probabilistic latent topics in order to take advantage of the semantics pervading the latent topic space when identifying spectral signatures and estimating fractional abundances from hyperspectral images. Despite the contrasted potential of topic models to uncover image semantics, they have been merely used in hyperspectral unmixing as a straightforward data decomposition process. This limits their actual capabilities to provide semantic representations of the spectral data. The proposed model, called dual-depth sparse probabilistic latent semantic analysis (DEpLSA), makes use of two different levels of topics to exploit the semantic patterns extracted from the initial spectral space in order to relieve the ill-posed nature of the unmixing problem. In other words, DEpLSA defines a first level of deep topics to capture the semantic representations of the spectra, and a second level of restricted topics to estimate endmembers and abundances over this semantic space. An experimental comparison in conducted using the two standard topic models and the seven state-of-the-art unmixing methods available in the literature. Our experiments, conducted using four different hyperspectral images, reveal that the proposed approach is able to provide competitive advantages over available unmixing approaches. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

32. Kernel-Based Semantic Hashing for Gait Retrieval.

Author: Zhou, Yucan, Huang, Yongzhen, Hu, Qinghua, and Wang, Liang
Subjects: *BIOMETRIC identification, *GAIT in humans, *KERNEL functions, *HAMMING distance, *SIMILARITY judgment, *INFORMATION retrieval, *VIDEO surveillance
Abstract: It is very important to retrieve a specific person in locating and tracking the missing people as well as the suspects quickly. However, the well-studied face-based and appearance-based individual retrieval methods are ineffective in the surveillance scenarios because of the far photograph distances, the low camera resolutions, the long time intervals, and the complex lighting conditions. To avoid the disadvantages of face-based and appearance-based methods, we propose to retrieve individuals from the surveillance videos with the gait biometric, which has been proved to be beneficial to remote person recognition and robust to lighting variations. What’s more, the gait biometric can be collected without conscious cooperation, making the data collection much easier. But it varies greatly with the view angles, the clothing style, and the carrying conditions. Therefore, the videos of the target person from a similar view angle with the same clothing style and carrying conditions should rank higher than the others. To achieve this purpose and improve the efficiency, this paper proposes a kernel-based semantic hashing (KSH) model, which is learnt by optimizing a semantic triplet ranking loss. Specifically, in the training phase, a semantic similarity score, which depends on the view angles, the clothing style, and the carrying conditions, is calculated for each training pair. Then, a weighted triplet loss considering these semantic scores is designed, which encourages videos with a higher score to stay closer to the gallery in the binary Hamming space. To evaluate the performance of the proposed method, we compare it with several methods on the CASIA Gait Database B and the OU-ISIR Gait Database. The experimental results demonstrate that the KSH is effective and efficient. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

33. Semantic association ranking schemes for information retrieval applications using term association graph representation.

Author: VENINGSTON, K., SHANMUGALAKSHMI, R., and NIRMALA, V.
Subjects: *SEMANTICS, *INFORMATION retrieval, *APPLICATION software, *GRAPH theory, *DATA analysis
Abstract: Most of the Information Retrieval (IR) techniques are based on representing the documents using the traditional vector space and probabilistic language model i.e., bag-of- words model. In this paper, associations among words in the documents are assessed and it is expressed in Term Association Graph model to represent the document content and the relationship among the keywords. Earlier attempt on exploiting term association graph was done for non-personalized document re-ranking task. This paper experiments improved non-personalized and personalized re-ranking strategy which exploits term association graph data structure to assess the importance of a document for the user query and thus documents are re-ranked according to the association and similarity exists among the documents. This paper proposes various approaches under two models namely, Term Rank based Approach (TRA) and Path Traversal based Approaches (PTA1, PTA2, and PTA3). These approaches employ term association graph and has been evaluated using manually prepared real dataset and benchmark OHSUMED dataset. The results obtained are reasonably promising. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

34. Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval.

Author: Gong, Yan and Cosma, Georgina
Subjects: *INFORMATION retrieval, *SEMANTICS, *SET functions, *LEARNING, *BOOSTING algorithms, *PARTICLE swarm optimization
Abstract: • Existing Visual Semantic Embedding (VSE) networks are trained by a hard negatives loss function that learns an objective margin between the similarity of relevant and irrelevant image–description embedding pairs and ignores the semantic differences between the irrelevant pairs. • We propose a novel Semantically Enhanced Hard negatives Loss function (LSEH) for Cross-modal Information Retrieval that considers the semantic differences between irrelevant training pairs. • The proposed LSEH function dynamically adjusts the learning objectives of VSE networks to make their learning flexible and efficient. • Experiments with various benchmark datasets and VSE networks revealed that the proposed LSEH function reduces their training epochs by approximately 50% and also improves their retrieval performance. Visual Semantic Embedding (VSE) networks aim to extract the semantics of images and their descriptions and embed them into the same latent space for cross-modal information retrieval. Most existing VSE networks are trained by adopting a hard negatives loss function which learns an objective margin between the similarity of relevant and irrelevant image–description embedding pairs. However, the objective margin in the hard negatives loss function is set as a fixed hyperparameter that ignores the semantic differences of the irrelevant image–description pairs. To address the challenge of measuring the optimal similarities between image–description pairs before obtaining the trained VSE networks, this paper presents a novel approach that comprises two main parts: (1) finds the underlying semantics of image descriptions; and (2) proposes a novel semantically-enhanced hard negatives loss function, where the learning objective is dynamically determined based on the optimal similarity scores between irrelevant image–description pairs. Extensive experiments were carried out by integrating the proposed methods into five state-of-the-art VSE networks that were applied to three benchmark datasets for cross-modal information retrieval tasks. The results revealed that the proposed methods achieved the best performance and can also be adopted by existing and future VSE networks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

35. Towards an evaluation of semantic searching in digital repositories: a DSpace case-study.

Author: Solomou, Georgia and Koutsomitropoulos, Dimitrios
Subjects: *INSTITUTIONAL repositories, *SEMANTICS, *METADATA, *ONTOLOGIES (Information retrieval), *INFORMATION retrieval
Abstract: Purpose -- Successful learning infrastructures and repositories often depend on well-organized content collections for effective dissemination, maintenance and preservation of resources. By combining semantic descriptions already lying or implicit within their descriptive metadata, reasoning-based or semantic searching of these collections can be enabled and produce novel possibilities for content browsing and retrieval. The specifics and necessities of such an approach, however, make it hard to assess and measure its effectiveness. The paper aims to discuss these issues. Design/methodology/approach -- Therefore in this paper the authors introduce a concrete methodology toward a pragmatic evaluation of semantic searching in such scenarios, which is exemplified through the semantic search plugin the authors have developed for the popular DSpace repository system. iFndings -- The results reveal that this approach can be appealing to expert as well as novice users alike, improve the effectiveness of content discovery and enable new retrieval possibilities in comparison to traditional, keyword-based search. Originality/value -- This paper presents applied research efforts to employ semantic searching techniques on digital repositories and to construct a novel methodology for evaluating the outcomes against various perspectives. Although this is original in itself, value lies also within the concrete and measurable results presented, accompanied by an analysis, that would be helpful to assess similar (i.e. semantic query answering and searching) techniques in the particular scenario of digital repositories and libraries and to evaluate corresponding investments. To the knowledge there has been hardly any other evaluation effort in the literature for this particular case; that is, to assess the merit and usage of advanced semantic technologies in digital repositories. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

36. Biomedical literature classification with a CNNs-based hybrid learning network.

Author: Yan, Yan, Yin, Xu-Cheng, Yang, Chun, Li, Sujian, and Zhang, Bo-Wen
Subjects: *DEEP learning, *MACHINE learning, *BIOINFORMATICS, *ARTIFICIAL neural networks, *INFORMATION retrieval
Abstract: Deep learning techniques, e.g., Convolutional Neural Networks (CNNs), have been explosively applied to the research in the fields of information retrieval and natural language processing. However, few research efforts have addressed semantic indexing with deep learning. The use of semantic indexing in the biomedical literature has been limited for several reasons. For instance, MEDLINE citations contain a large number of semantic labels from automatically annotated MeSH terms, and for a great deal of the literature, only the information of the title and the abstract is readily available. In this paper, we propose a Boltzmann Convolutional neural network framework (B-CNN) for biomedicine semantic indexing. In our hybrid learning framework, the CNN can adaptively deal with features of documents that have sequence relationships, and can capture context information accordingly; the Deep Boltzmann Machine (DBM) merges global (the entity in each document) and local information through its training with undirected connections. Additionally, we have designed a hierarchical coarse to fine style indexing structure for learning and classifying documents, and a novel feature extension approach with word sequence embedding and Wikipedia categorization. Comparative experiments were conducted for semantic indexing of biomedical abstract documents; these experiments verified the encouraged performance of our B-CNN model. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

37. Multi-View Missing Data Completion.

Author: Zhang, Lei, Zhao, Yao, Zhu, Zhenfeng, Shen, Dinggang, and Ji, Shuiwang
Subjects: *DEEP learning, *DATA transmission systems, *ARTIFICIAL intelligence, *INFORMATION retrieval, *DATA mining, *LINEAR programming
Abstract: A growing number of multi-view data arises naturally in many scenarios, including medical diagnosis, webpage classification, and multimedia analysis. A challenge in learning from multi-view data is that not all instances are fully represented in all views, resulting in missing view data. In this paper, we focus on feature-level completion for missing view of multi-view data. Aiming at capturing both semantic complementarity and identical distribution among different views, an Isomorphic Linear Correlation Analysis (ILCA) method is proposed to linearly map multi-view data to a feature-isomorphic subspace through learning a set of excellent isomorphic features, thereby unfolding the shared information from different views. Meanwhile, we assume that missing view obeys normal distribution. Then, the missing view data matrix can be modeled as a low-rank component plus a sparse contribution. Thus, to accomplish missing view completion, an Identical Distribution Pursuit Completion (IDPC) model based on the learned features is proposed, in which the identical distribution constraint of missing view to the other available one in the feature-isomorphic subspace is fully exploited. Comprehensive experiments on several multi-view datasets demonstrate that our proposed framework yields promising results. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

38. SLAMM: Visual monocular SLAM with continuous mapping using multiple maps.

Author: Daoud, Hayyan Afeef, Md. Sabri, Aznul Qalid, Loo, Chu Kiong, and Mansoor, Ali Mohammed
Subjects: *COMPUTER systems, *ELECTRONIC systems, *COMPUTER software, *ARTIFICIAL intelligence, *ROBOTICS
Abstract: This paper presents the concept of Simultaneous Localization and Multi-Mapping (SLAMM). It is a system that ensures continuous mapping and information preservation despite failures in tracking due to corrupted frames or sensor’s malfunction; making it suitable for real-world applications. It works with single or multiple robots. In a single robot scenario the algorithm generates a new map at the time of tracking failure, and later it merges maps at the event of loop closure. Similarly, maps generated from multiple robots are merged without prior knowledge of their relative poses; which makes this algorithm flexible. The system works in real time at frame-rate speed. The proposed approach was tested on the KITTI and TUM RGB-D public datasets and it showed superior results compared to the state-of-the-arts in calibrated visual monocular keyframe-based SLAM. The mean tracking time is around 22 milliseconds. The initialization is twice as fast as it is in ORB-SLAM, and the retrieved map can reach up to 90 percent more in terms of information preservation depending on tracking loss and loop closure events. For the benefit of the community, the source code along with a framework to be run with Bebop drone are made available at . [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

39. Learning discriminative representations for semantical crossmodal retrieval.

Author: Jiang, Aiwen, Li, Hanxi, Li, Yi, and Wang, Mingwen
Subjects: *LEARNING, *INFORMATION retrieval, *MULTIMEDIA systems, *TASK performance, *SEMANTICS
Abstract: Heterogeneous gap among different modalities emerges as one of the critical issues in multimedia retrieval areas. Unlike traditional unimodal cases, where raw features are extracted and directly measured, the heterogeneous nature of crossmodal tasks requires the intrinsic semantic representation to be compared in a unified framework. Based on a flexible “feature up-lifting and down projecting” mechanism, this paper studies the learning of crossmodal semantic features that can be retrieved across different modalities. Two effective methods are proposed to mine semantic correlations. One is for traditional handcrafted features, and the other is based on deep neural network. We treat them respectively as normal and deep version of our proposed shared discriminative semantic representation learning (SDSRL) framework. We evaluate both of these two methods on two public multimodal datasets for crossmodal and unimodal retrieval tasks. The experimental results demonstrate that our proposed methods outperform the compared baselines and achieve state-of-the-art performance in most scenarios. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

40. Tracking word semantic change in biomedical literature.

Author: Yan, Erjia and Zhu, Yongjun
Subjects: *MEDICAL literature, *SEMANTICS, *MINIMALIST theory (Communication), *DISCOURSE analysis, *LANGUAGE & languages, *INFORMATION retrieval, *MEDICAL research, *MEDLINE, *ONLINE information services, *READABILITY (Literary style)
Abstract: Up to this point, research on written scholarly communication has focused primarily on syntactic, rather than semantic, analyses. Consequently, we have yet to understand semantic change as it applies to disciplinary discourse. The objective of this study is to illustrate word semantic change in biomedical literature. To that end, we identify a set of representative words in biomedical literature based on word frequency and word-topic probability distributions. A word2vec language model is then applied to the identified words in order to measure word- and topic-level semantic changes. We find that for the selected words in PubMed, overall, meanings are becoming more stable in the 2000s than they were in the 1980s and 1990s. At the topic level, the global distance of most topics (19 out of 20 tested) is declining, suggesting that the words used to discuss these topics are stabilizing semantically. Similarly, the local distance of most topics (19 out of 20) is also declining, showing that the meanings of words from these topics are becoming more consistent with those of their semantic neighbors. At the word level, this paper identifies two different trends in word semantics, as measured by the aforementioned distance metrics: on the one hand, words can form clusters with their semantic neighbors, and these words, as a cluster, coevolve semantically; on the other hand, words can drift apart from their semantic neighbors while nonetheless stabilizing in the global context. In relating our work to language laws on semantic change, we find no overwhelming evidence to support either the law of parallel change or the law of conformity. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

41. A graph-based semantic relatedness assessment method combining wikipedia features.

Author: Li, Pu, Zhang, Zhifeng, Xiao, Bao, Ma, Wenjun, and Jiang, Yuncheng
Subjects: *ARTIFICIAL intelligence, *SEMANTICS, *INFORMATION retrieval, *EVALUATION
Abstract: Semantic relatedness assessment between concepts is a critical issue in many domains such as artificial intelligence, information retrieval, psychology, biology, linguistics and cognitive science. Therefore, several methods assess relatedness by exploiting knowledge bases to express the semantics of concepts. However, there are some limitations such as high-dimensional space, high-computational complexity, fitting non-dynamic domains. Considering that Wikipedia, a domain-independent encyclopedic repository, which provides very large coverage, has been exploited by many methods as a huge semantic resource. In this paper, we propose a novel graph-based relatedness assessment method using Wikipedia features to avoid some of the limitations and drawbacks mentioned above. Firstly, for each term in a word pair, the top k most relevant Wikipedia concepts are returned by the Naive-ESA algorithm to reduce the dimensional space of Explicit Semantic Analysis (ESA) method. Secondly, for each different candidate concept in two relevant concept sets, we collect its categories set from the Wikipedia Category Graph (WCG). Based on the categories in WCG network, the relatedness between concepts at the correspondence position of the two sorted concept sets is computed as the association coefficient. Thirdly, based on this parameter, a novel relatedness assessment metric is presented. The evaluation is performed on some datasets well-recognized as benchmarks, using several widely used metrics and a new metric designed by ourselves. The result demonstrates that our method has a better correlation with the intuitions of human judgments than other related works. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

42. An NLP-guided ontology development and refinement approach to represent and query visual information.

Author: Patel, Ashish Singh, Merlino, Giovanni, Puliafito, Antonio, Vyas, Ranjana, Vyas, O.P., Ojha, Muneendra, and Tiwari, Vivek
Subjects: *ONTOLOGIES (Information retrieval), *NATURAL language processing, *DESCRIPTION logics, *ONTOLOGY, *SEMANTIC Web, *VIDEO surveillance, *DATA warehousing, *MULTIMEDIA systems, *SEMANTICS
Abstract: The ubiquitous presence of surveillance systems generates massive amounts of video data. Storage and analysis of this data in real-time is a substantial challenge. There is huge potential in representing data in machine-readable and machine-interpretable format due to the presence of hidden semantics in images and videos. However, such representation requires ontology, which calls for expert domain knowledge. In this paper, a novel NLP-guided approach to generate an ontology for multimedia representation and information retrieval is proposed. A semi-automatic NLP-guided framework, which extracts all possible relations among objects is presented. This framework leverages the textual data of the domain to generate possible descriptions and actions within the domain. Relations among objects get embedded as object properties, whereas the category of an object as a class. Features and attributes of objects encode the data properties of the ontology. The proposed ontology is compared with existing multimedia ontologies and evaluated with regard to its capability to represent relations occurring in benchmark datasets, demonstrating the completeness and thorough coverage of the domain concepts. Spatial reasoning rules are established using Semantic Web Rule Language (SWRL) rules, and information retrieval is demonstrated using Description Logic (DL) and SPARQL queries. The proposed NLP-guided ontology generation approach is general enough to help in the development of ontologies for other domains as well, by providing video and textual data of the domain of interest, with limited human involvement. • Object properties extraction with textbase and NLP-guided approach. • Frequency-based semantic refinement to enhance domain-specific relations. • A novel data-driven video surveillance ontology "VizITS" for the ITS domain. • Semantic information retrieval as events, metadata, and question-answering. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

43. An NLP-guided ontology development and refinement approach to represent and query visual information.

Author: Patel, Ashish Singh, Merlino, Giovanni, Puliafito, Antonio, Vyas, Ranjana, Vyas, O.P., Ojha, Muneendra, and Tiwari, Vivek
Subjects: *ONTOLOGIES (Information retrieval), *NATURAL language processing, *DESCRIPTION logics, *ONTOLOGY, *SEMANTIC Web, *VIDEO surveillance, *DATA warehousing, *MULTIMEDIA systems, *SEMANTICS
Abstract: The ubiquitous presence of surveillance systems generates massive amounts of video data. Storage and analysis of this data in real-time is a substantial challenge. There is huge potential in representing data in machine-readable and machine-interpretable format due to the presence of hidden semantics in images and videos. However, such representation requires ontology, which calls for expert domain knowledge. In this paper, a novel NLP-guided approach to generate an ontology for multimedia representation and information retrieval is proposed. A semi-automatic NLP-guided framework, which extracts all possible relations among objects is presented. This framework leverages the textual data of the domain to generate possible descriptions and actions within the domain. Relations among objects get embedded as object properties, whereas the category of an object as a class. Features and attributes of objects encode the data properties of the ontology. The proposed ontology is compared with existing multimedia ontologies and evaluated with regard to its capability to represent relations occurring in benchmark datasets, demonstrating the completeness and thorough coverage of the domain concepts. Spatial reasoning rules are established using Semantic Web Rule Language (SWRL) rules, and information retrieval is demonstrated using Description Logic (DL) and SPARQL queries. The proposed NLP-guided ontology generation approach is general enough to help in the development of ontologies for other domains as well, by providing video and textual data of the domain of interest, with limited human involvement. • Object properties extraction with textbase and NLP-guided approach. • Frequency-based semantic refinement to enhance domain-specific relations. • A novel data-driven video surveillance ontology "VizITS" for the ITS domain. • Semantic information retrieval as events, metadata, and question-answering. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

44. Exploration on efficient similar sentences extraction.

Author: Gu, Yanhui, Yang, Zhenglu, Xu, Guandong, Nakano, Miyuki, Toyoda, Masashi, and Kitsuregawa, Masaru
Subjects: *SEMANTICS, *WEBSITES, *INFORMATION retrieval, *QUESTION answering systems, *DATA extraction
Abstract: Measuring the semantic similarity between sentences is an essential issue for many applications, such as text summarization, Web page retrieval, question-answer model, image extraction, and so forth. A few studies have explored on this issue by several techniques, e.g., knowledge-based strategies, corpus-based strategies, hybrid strategies, etc. Most of these studies focus on how to improve the effectiveness of the problem. In this paper, we address the efficiency issue, i.e., for a given sentence collection, how to efficiently discover the top- k semantic similar sentences to a query. The previous methods cannot handle the big data efficiently, i.e., applying such strategies directly is time consuming because every candidate sentence needs to be tested. In this paper, we propose efficient strategies to tackle such problem based on a general framework. The basic idea is that for each similarity, we build a corresponding index in the preprocessing. Traversing these indices in the querying process can avoid to test many candidates, so as to improve the efficiency. Moreover, an optimal aggregation algorithm is introduced to assemble these similarities. Our framework is general enough that many similarity metrics can be incorporated, as will be discussed in the paper. We conduct extensive experimental evaluation on three real datasets to evaluate the efficiency of our proposal. In addition, we illustrate the trade-off between the effectiveness and efficiency. The experimental results demonstrate that the performance of our proposal outperforms the state-of-the-art techniques on efficiency while keeping the same high precision as them. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

45. Exploring intrinsic information content models for addressing the issues of traditional semantic measures to evaluate verb similarity.

Author: Krishna Siva Prasad, M. and Sharma, Poonam
Subjects: *VERBS, *SIMILARITY (Language learning), *SEMANTICS, *NATURAL language processing, *INFORMATION retrieval, *NOUNS, *SYNONYMS
Abstract: Semantic similarity measures play an important role in many natural language processing and information retrieval activities. It is highly challenging to measure semantic similarity with higher accuracy. A notable branch of semantic similarity evaluation based on information content (IC) is popular in this aspect. Intrinsic information content (IIC) models are another wing of IC based evaluation. Both IC based and IIC based approaches majorly handled similarity evaluation of nouns. Research related to semantic similarity assessment of verb pairs are rarely discussed. To bridge this gap, this work examines various IC based, IIC based approaches on verb pairs. A detailed discussion of the existing measures and their drawbacks are mentioned in this work. Strategies based on information content, length and depth of the concepts are discussed and tested on benchmark datasets. Existing intrinsic information content models are enhanced by addressing various issues like (a) dealing concepts with no path in WordNet and (b) handling the synonym sets of verb concepts. Measures based on path length, intrinsic information content, combined strategies and non-linear strategies for verb pairs are thoroughly inspected. This paper also presents novel strategies to understand novel aspects that are not addressed before. The strategies are experimented by generating the synonym sets of required parts-of-speech which proved very effective in improving the correlation with human judgment. Results on benchmark datasets specify that the proposed approaches for verb similarity will be a guiding factor for understanding the natural language processing tasks. • Semantic similarity measures to address the verb pairs. • Path-based, information content based, combined and non-linear strategies for evaluating semantic similarity. • Intrinsic information content models for evaluating semantic similarity. • In this paper, handling the word pairs which does not have path in the semantic network proved effective. • Generating proper synonym sets of the concepts, improved the correlation of the strategies proposed. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

46. Classification of text documents based on score level fusion approach.

Author: Bharath Bhushan, S.N. and Danti, Ajit
Subjects: *INFORMATION retrieval, *TEXT mining, *VECTOR spaces, *ARTIFICIAL neural networks, *SEMANTICS
Abstract: Text document classification is a well known theme in the field of the information retrieval and text mining. Selection of most desired features in the text document plays a vital role in classification problem. This research article addresses the problem of text classification by considering Sentence–Vector Space Model (S-VSM) and Unigram representation models for the text document. An enhanced S-VSM model will be considered for the constructive representation of text documents. A neural network based representation for text documents is proposed for effective capturing of semantic information of the text data. Two different classifiers are designed based on the two different representation models of the text documents. Score level fusion is applied on two proposed models to find out the overall accuracy of the proposed model. Key contributions of the paper are an enhanced S-VSM model, an interval valued representation model for the proposed S-VSM approach. A word level representation model for semantic information preserving of the text document and score level fusion approach. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

47. A semantic-grained perspective of latent knowledge modeling.

Author: Della Rocca, Paola, Senatore, Sabrina, and Loia, Vincenzo
Subjects: *KNOWLEDGE management, *SEMANTICS, *INTERNETWORKING, *INFORMATION sharing, *INFORMATION retrieval, *DIRICHLET problem
Abstract: In the era of Web 2.0, the knowledge is the de-facto social currency in the global network environment. Knowledge is not an accumulation of data, but a relation-based representation of the information content, which needs to be distilled and arranged in a semantic infrastructure to guarantee interoperability and sharable understanding. In the light of this scenario, the paper introduces a semantically enhanced document retrieval system that describes each retrieved document with an ontological multi-grained network of the extracted conceptualization. The system is based on two well-known latent models: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA): LSA provides a spatial distribution of the input documents, facilitating their retrieval, thanks to an ontological representation of their relationship network. LDA works instead at deeper level: it drives the ontological structuring of the knowledge inside the individual retrieved documents in terms of words, concepts and topics. The novelty of this approach is a multi-level granulation of the knowledge: from a document matching the query (coarse granularity), to the topics that join documents, until to the words describing a concept into a topic (fine granularity). The final result is a SKOS-based ontology, ad-hoc created for a document corpus; graphically supported for the navigation, it enables the exploration of the concepts at different granularity levels. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

48. Control and syntagmatization: Vocabulary requirements in information retrieval thesauri and natural language lexicons.

Author: Engerer, Volkmar
Subjects: *INFORMATION retrieval, *INTELLECT, *LINGUISTICS, *NATURAL language processing, *SEMANTICS, *SUBJECT headings
Abstract: This paper explores the relationships between natural language lexicons in lexical semantics and thesauri in information retrieval research. These different areas of knowledge have different restrictions on use of vocabulary; thesauri are used only in information search and retrieval contexts, whereas lexicons are mental systems and generally applicable in all domains of life. A set of vocabulary requirements that defines the more concrete characteristics of vocabulary items in the 2 contexts can be derived from this framework: lexicon items have to be learnable, complex, transparent, etc., whereas thesaurus terms must be effective, current and relevant, searchable, etc. The differences in vocabulary properties correlate with 2 other factors, the well-known dimension of Control (deliberate, social activities of building and maintaining vocabularies), and Syntagmatization, which is less known and describes vocabulary items' varying formal preparedness to exit the thesaurus/lexicon, enter into linear syntactic constructions, and, finally, acquire communicative functionality. It is proposed that there is an inverse relationship between Control and Syntagmatization. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

49. Diversity Regularized Latent Semantic Match for Hashing.

Author: Chen, Yong, Zhang, Hui, Tong, Yongxin, and Lu, Ming
Subjects: *HASHING, *SEMANTICS, *COMPUTER vision, *INFORMATION retrieval, *NATURAL language processing, *NEAREST neighbor analysis (Statistics)
Abstract: Hashing based approximate nearest neighbors (ANN) search has drawn considerable attraction owing to its low-memory storage and hardware-level logical computing which is doomed to be greatly applicable to quantities of large-scale and practical scenarios, such as information retrieval, computer vision and natural language processing. However, most existing hashing methods concentrate either on images only or on pairwise image-texts (labels, short documents) and rarely utilize more common sentences. In this paper, we propose D iversity R egularized L atent S emantic M atch for H ashing (DRLSMH), a new multimodal hashing method that projects images and sentences into a shared latent semantic space with label-supervised semantic constraints to proceed on multimodal retrieval. Notably, soft orthogonality is induced as a novel regularizer to preserve diverse hashing functions for compact and accurate representations; what's more, this kind of regularization also benefits the derivations of closed-form solutions with some proper relaxations under iterative optimization framework. Extensive experiments on two public datasets demonstrate the advantages of our method over some state-of-the-art baselines under cross-modal retrieval both on image-query-image, image-query-text and text-query-image tasks. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

50. A method for determining ontology-based user profile in document retrieval system.

Author: Maleszka, Bernadetta
Subjects: *INFORMATION retrieval, *DOCUMENTATION, *ONTOLOGY, *INFORMATION theory, *THEORY of knowledge, *PHILOSOPHY, *SEMANTICS
Abstract: Information overload has become a very important aspect of information retrieval domain. Even if a user knows where to look for interesting information, he can have a problem with precisely formulating his information needs. A solution of the problem is personalization and recommendation system - they observe user activities, analyze them to discover important preferences. Based on these information the system can improve the effectiveness of the results. In this paper we present a method for determining user profile in a document retrieval system. We propose ontology-based profile. Such a structure allows to process semantic relations between users' queries. We focus on methods for adapting profile because only up-to-date profile can help the user to obtained results that correspond with his information needs. We present a set of postulates for adaptation methods. Performed experimental evaluations of developed methods are promising. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

95 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources