147 results
Search Results
2. A dynamic snow depth retrieval model based on time-series clustering optimization for GPS-IR.
- Author
-
Wang, Tianyu, Zhang, Rui, Yang, Yunjie, Liu, Anmengyun, Jiang, Yao, Lv, Jichao, Tu, Jinsheng, and Song, Yunfan
- Subjects
- *
SNOW accumulation , *GLOBAL Positioning System , *MACHINE learning , *GPS receivers , *INFORMATION retrieval - Abstract
Due to the influence of environmental factors (i.e., terrain and surface coverage) around the GPS receivers, the snow depth retrieval results obtained by the existing global positioning system interferometric reflection (GPS-IR) method show significant variability. The resulting loss of reliability and accuracy limits the broad application of this technology. Therefore, this paper proposes a dynamic snow depth retrieval model based on time-series clustering optimization for GPS-IR to fully leverage multi-source satellite observation data for automatic and high-precision snow depth retrieval. The model employs Dynamic Time Warping distance measurement combined with the K-Medoids clustering algorithm to categorize frequency sequences obtained from various satellite trajectories, facilitating effective integration of multi-constellation data and acquisition of optimal datasets. Additionally, Long Short-Term Memory networks are integrated to capture and process the long-term dependencies in snow depth data, enhancing the model's adaptability in handling time-series data. Validated against SNOTEL measured data and standard machine learning algorithms (such as BP Neural Networks, RBF, and SVM), the model's retrieval capability is confirmed. For P351 and AB39 sites, the correlation coefficients for L1 band data retrieval were both 0.996, with RMSEs of 0.051 and 0.018 m, respectively. The experiment results show that the proposed model demonstrates superior precision and robustness in snow depth retrieval compared to the previous method. Then, we analyze the accuracy loss caused by sudden snowfall events. The proposed model and methodology offer new insights into the in-depth study of snow depth monitoring. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. A hierarchical shape description approach and its application in similarity measurement of polygon entities.
- Author
-
Ma, Jingzhen, Sun, Qun, Ma, Chao, Lyu, Zheng, Sun, Shijie, and Wen, Bowei
- Subjects
- *
SHAPE measurement , *MULTISENSOR data fusion , *INFORMATION retrieval , *POLYGONS , *INFORMATION processing , *MEASUREMENT - Abstract
Spatial similarity provides an important basis for geographic information processing and is widely applied in multi-source data fusion and update, data retrieval and query, and cartographic generalization. To address the shape description and similarity measurement of polygon entities, this study presents a new hierarchical shape description approach and examines its application in similarity measurement of polygon entities. Using the rotation and segmentation methods, we first constructed a hierarchical shape description model for target polygon entities, followed by measurement of global and hierarchical shape description of polygon entities, respectively, using the farthest-point-distance and geometric feature description methods. Finally, we constructed a comprehensive similarity measurement model through a weighted integration of position, size, direction, and shape. The hierarchical shape description approach proposed in this paper can be applied to the shape similarity measurement of polygon elements, similarity measurement after spatial object simplification, and multi-scale polygon entity matching. The experimental results showed that the hierarchical shape description approach and similarity measurement model are able to effectively measure spatial similarity between different polygon entities, and have obtained good application results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. The Process and Algorithm Analysis of Text Mining System Based on Artificial Intelligence.
- Author
-
Chai, Xiaoliang, Xu, Songxiao, Li, Shilin, and Zhao, Junyu
- Subjects
TEXT mining ,ARTIFICIAL intelligence ,GENETIC algorithms ,ALGORITHMS ,INFORMATION retrieval ,INFORMATION networks - Abstract
The rapid development of the Internet leads to the rapid growth of network information, we call it information explosion. The Internet is full of information, and it is difficult for users to find this information and useful knowledge of the ocean. The Web has become the world's largest information repository, and there is an urgent need for efficient access to the valuable knowledge of vast amounts of web information. The purpose of this paper is to study the process and algorithm analysis of text mining system based on artificial intelligence. This paper presents an algorithm of document feature acquisition based on genetic algorithm. Selecting suitable features is an important task in specific text classification and information retrieval. Finding appropriate feature vectors to represent the text will undoubtedly help with subsequent sorting and grouping. Based on the genetic algorithm of variable length chromosome, this paper improves the crossover, mutation and selection operations, and proposes an algorithm to obtain text feature vectors. This method has a wide range of applications and good results. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Hybrid Approach To Unsupervised Keyphrase Extraction.
- Author
-
Singh, Vijender and Bolla, Bharat Kumar
- Subjects
INFORMATION retrieval - Abstract
The exponential growth of textual data poses a monumental challenge for extracting meaningful knowledge. Manually identifying descriptive keywords or keyphrases for each document is infeasible given the massive daily generated text. Automatic keyphrase extraction is, therefore, essential. However, current techniques struggle with learning the most salient semantic features from lengthy documents. This hybrid keyphrase extraction framework uniquely combines the complementary strengths of graph-based and textual feature methods. Our approach demonstrates improved performance over relying solely on statistical or graphical. Graph-based systems leverage word co- occurrence networks to score importance. Textual methods extract keyphrases using linguistic properties. Together, these complementary techniques overcome the limitations of relying on any strategy. The hybrid approach is evaluated on standard SemEval 2017 Task 10 and SemEval 2010 Task 5 benchmark datasets for scientific paper keyphrase extraction. Performance is quantified using the F1 score relative to human-annotated ground truth keyphrase. Results will quantify effectiveness on long documents with thousands of terms where only a few keywords represent salient concepts. Results show our technique effectively identifies the most salient semantic keywords, overcoming limitations of current techniques that struggle to mix features of graphical or statistical methods. Our experiments demonstrate that the proposed hybrid approach achieves superior F1 scores compared to current state-of-the-art methods on benchmark datasets. These results validate that synergistically combining graph and textual features enables more accurate keyphrase extraction, especially for long documents laden with extraneous terms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Listwise learning to rank method combining approximate NDCG ranking indicator with Conditional Generative Adversarial Networks.
- Author
-
Li, Jinzhong, Zeng, Huan, Xiao, Cunwei, Ouyang, Chunjuan, and Liu, Hua
- Subjects
- *
GENERATIVE adversarial networks , *INFORMATION retrieval - Abstract
Some previous empirical studies have shown that the performances of the listwise learning to rank approaches are in general better than the pointwise or pairwise learning to rank techniques. The listwise learning to rank methods which directly optimize information retrieval indicators are a type of essential and popular method of learning to rank. However, the existing learning to rank approaches based on Generative Adversarial Networks (GAN) do not utilize a loss function based on information retrieval indicators to optimize the generator and/or discriminator. Thus, an approach of learning to rank that combines approximate Normalized Discounted Cumulative Gain (NDCG) ranking indicators with Conditional Generative Adversarial Networks (CGAN) is proposed in this paper, named NCGAN-LTR. The NCGAN-LTR approach constructs loss functions of the generator and discriminator based on the Plackett-Luce model and an approximate version of the NDCG ranking indicator, which is utilized to train the network parameters of CGAN. The experimental results on four benchmark datasets of learning to rank, i.e., TREC TD2004, OHSUMED, MQ2008, and MSLR-WEB10K demonstrate that our proposed NCGAN-LTR approach has superior performance across almost various ranking indicators of information retrieval compared with the IRGAN-List approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Microprism-based layered BIM modeling for railway station subgrade.
- Author
-
Fan, Xiaomeng, Pu, Hao, Schonfeld, Paul, Zhang, ShiHong, Li, Wei, Ran, Yang, and Wang, Jia
- Subjects
- *
BUILDING information modeling , *RAILROAD stations , *PARALLEL processing , *INFORMATION retrieval , *GEOLOGY - Abstract
Volumetric modeling for Railway Station Subgrade (RSS) is a complex task in applying BIM to railway stations. However, an effective method for handling RSS volumetric modeling, which is further complicated by heterogeneous data sources and geometric features, has been unavailable. To address this issue, this paper presents a BIM modeling method that generates layered microprisms to represent the volumetric information of an RSS. The proposed method is demonstrated with a real-world case, through which the modeling accuracy and the efficiency of spatial data retrieval are verified to be satisfactory. This paper contributes to the existing body of knowledge by proposing a unified and accurate volumetric modeling method for the multi-layer structures. In the future, modeling efficiency can be further improved by introducing GPU-based parallel processing. • Comprehensive volumetric information of railway station subgrade (RSS) is expressed with a multi-layer semantic model. • Involved layers of filler, geology and terrain are depicted by the proposed unified modeling method. • Multi-layer 3D microprisms are generated to express the irregular spaces among the involved layers. • A grid-based spatial retrieval method is developed to achieve high-efficient spatial query. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Learning from construction accidents in virtual reality with an ontology-enabled framework.
- Author
-
Pedro, Akeem, Bao, Quy Lan, Hussain, Rahat, Soltani, Mehrtash, Pham, Hai Chien, and Park, Chansik
- Subjects
- *
INTERACTIVE learning , *INFORMATION retrieval , *ACCESS to information , *EDUCATIONAL games , *LEARNING modules - Abstract
Learning from accidents is essential in preventing their recurrence; however, the unstructured nature of construction accident data poses a significant challenge to the retrieval of insightful information from past incidents. The absence of engaging training tools that facilitate access to such information also impedes learning. This paper aims to develop an ontology-enabled Virtual Reality (VR) framework to provide access to incident data in immersive educational settings. The framework comprises three modules: 1) Ontology module for structuring information from accidents; 2) VR module for interactive learning based on accident cases; and 3) Semantic enrichment module for embedding accident information in VR scenarios. A prototype was developed to verify the frameworks' technical feasibility, usability, and educational potential. User trials confirm that the solution offers a promising medium for learning from accidents. It is anticipated that the framework would enhance practices for learning from accidents and contribute to improved safety outcomes in construction. • Learning from accidents is crucial for enhancing construction safety. • This paper proposes the CASE-VR framework to improve learning from accidents. • CASE-VR provides an ontology-enabled VR platform with structured accident data. • A prototype demonstrates the feasibility and usability of the proposed framework. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Vietnamese Legal Text Retrieval based on Sparse and Dense Retrieval approaches.
- Author
-
Khang, Nguyen Hoang Gia, Nhat, Nguyen Minh, Quoc, Trung Nguyen, and Hoang, Vinh Truong
- Subjects
LANGUAGE models ,VIETNAMESE language ,DATA augmentation ,LEGAL documents ,INFORMATION retrieval - Abstract
We introduce the combination of two techniques: Sparse Retrieval and Dense Retrieval, while experimenting with different training approaches to find the optimal method for the Vietnamese Legal Text Retrieval task. Moreover, the Question Answering task was only built on the open domain of UIT-ViQuAD but shown promising results on the in-domain legal dataset. Finally, we also mentioned the data augmentation of legal documents up to 3GB to train the Phobert language model, improve this backbone with Condenser, Cocondenser in this paper. Furthermore, these techniques can be utilized for other information retrieval assignments in languages with limited resources. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A new weakly supervised discrete discriminant hashing for robust data representation.
- Author
-
Wan, Minghua, Chen, Xueyu, Zhao, Cairong, Zhan, Tianming, and Yang, Guowei
- Subjects
- *
INFORMATION retrieval , *COMPUTER programming education , *MACHINE learning , *SUPERVISED learning , *INFORMATION processing - Abstract
In real applications, the label information on many data is inaccurate, or a completely reliable label needs to be obtained at a high cost. The previous supervised hashing algorithms consider only the label information in the mapping process from Euclidean space to Hamming space when learning hash codes. However, there is no doubt that these algorithms are suboptimal in maintaining the relationships between high-dimensional data spaces. To overcome this problem, this paper advances a new weakly supervised discrete discriminant hashing (WDDH) to ensure a more effective representation of data and better retrieval of information. First, we consider the nearest neighbour relationship between samples, and new neighbourhood graphs are constructed to describe the geometric relationship between samples. Second, the algorithm embeds the learning of the hash function into the model and optimises the hash codes by a one-step iterative updating algorithm. Finally, it is compared with the existing classical unsupervised hashing algorithm and supervised hashing algorithm on different databases. The results and discussion of the experiments clearly show that the proposed WDDH algorithm in this paper is more robust for data representation in learning low-quality label data, coarse-grained label data and noisy data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. A Review on recent research in information retrieval.
- Author
-
Ibrihich, S., Oussous, A., Ibrihich, O., and Esghir, M.
- Subjects
INFORMATION retrieval ,LITERATURE reviews ,NATURAL language processing - Abstract
In this paper, we present a survey of modeling and simulation approaches to describe information retrieval basics. We investigate its methods, its challenges, its models, its components and its applications. Our contribution is twofold: on the one hand, reviewing the literature on discovery some search techniques that help to get pertinent results and reach an effective search, and on the other hand, discussing the different research perspectives for study and compare more techniques used in information retrieval. This paper will also shedding the light on some of the famous AI applications in the legal field. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Retrieval of behavior trees using map-and-reduce technique.
- Author
-
Abbas, Safia, Hodhod, Rania, and El-Sheikh, Mohamed
- Subjects
TREES ,TIME management ,COGNITIVE structures ,SOCIAL interaction - Abstract
There has been an increased interest in the creation of AI social agents who possess complex behaviors that allow them to perform social interactions. Behavior trees provide a plan model execution that has been widely used to build complex behaviors for AI social agents. Behavior trees can be represented in the form of a memory structure known as cognitive scripts, which would allow them to evolve through further development over multiple exposure to repeated enactment of a particular behavior or similar ones. Behavior trees that share the same context will then be able to learn from each other resulting in new behavior trees with richer experience. The main challenge appears in the expensive cost of retrieving contextually similar behavior trees (scripts) from a repertoire of scripts to allow for that learning process to occur. This paper introduces a novel application of map-and-reduce technique to retrieve cognitive with low computational time and memory allocation. The paper focuses on the design of a corpus of cognitive scripts, as a knowledge engineering key challenge, and the application of map-and-reduce with semantic information to retrieve contextually similar cognitive scripts. The results are compared to other techniques used to retrieve cognitive scripts in the literature, such as Pharaoh which uses the least common parent (LCP) technique in its core. The results show that the map-and-reduce technique can be successfully used to retrieve cognitive scripts with high retrieval accuracy of 92.6%, in addition to being cost effective. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Secure multi-dimensional data retrieval with access control and range query in the cloud.
- Author
-
Mei, Zhuolin, Yu, Jin, Zhang, Caicai, Wu, Bin, Yao, Shimao, Shi, Jiaoli, and Wu, Zongda
- Subjects
- *
ACCESS control , *INFORMATION retrieval , *DATA encryption , *DATA security - Abstract
Outsourcing data to the cloud offers various advantages, such as improved reliability, enhanced flexibility, accelerated deployment, and so on. However, data security concerns arise due to potential threats such as malicious attacks and internal misuse of privileges, resulting in data leakage. Data encryption is a recognized solution to address these issues and ensure data confidentiality even in the event of a breach. However, encrypted data presents challenges for common operations like access control and range queries. To address these challenges, this paper proposes Secure Multi-dimensional Data Retrieval with Access Control and Range Search in the Cloud (SMDR). In this paper, we propose SMDR policy, which supports both access control and range queries. The design of the SMDR policy cleverly utilizes the minimum and maximum points of buckets, enabling the SMDR policy is highly appropriate for supporting range queries on multi-dimensional data. Additionally, we have made modifications to Ciphertext Policy-Attribute Based Encryption (CP-ABE) to enable effective integration with the SMDR policy, and then constructed a secure index using the SMDR policy and CP-ABE. By utilizing the secure index, access control and range queries can be effectively supported over the encrypted multi-dimensional data. To evaluate the efficiency of SMDR, extensive experiments have been conducted. The experimental results demonstrate the effectiveness and suitability of SMDR in handling encrypted multi-dimensional data. Additionally, we provide a detailed security analysis of SMDR. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. On the trade-off between ranking effectiveness and fairness.
- Author
-
Melucci, Massimo
- Subjects
- *
INFORMATION storage & retrieval systems , *FAIRNESS , *SYMMETRIC matrices , *AUTHOR-editor relationships , *ACCESS to information - Abstract
This paper addresses the problem of maximizing the effectiveness of the ranking produced by information retrieval or recommender systems and at the same time maximizing two fairnesses, that of the group and that of the individual. The context of this paper is therefore that of access to information carried out by users, who aim to satisfy their own information needs, to documents produced by authors and curators who aim to be exposed in a fair manner, i.e. without discriminating between groups nor individuals. The paper describes a general method based on the spectral decomposition of mixtures of symmetric matrices, each of which represents a variable to be maximized, and experiments conducted with the use of a test collection. The method described in this paper has explained if and how the trade-offs between effectiveness, group fairness and individual fairness manifest themselves. The experimental results show that maintaining an acceptable level of effectiveness and fairness at the same time is feasible and (b) the trade-offs exist but the order of magnitude of the variations depends on the measure of effectiveness used and therefore by what the user's model of access to information is as well as on the fairness measures and therefore on how authors or editors should be exposed. • Modern information access systems should balance fairness and effectiveness. • A single eigensystem achieves simultaneous maximization. • Fairness, effectiveness, and access measures are crucial in trade-offs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A fast local citation recommendation algorithm scalable to multi-topics.
- Author
-
Yin, Maxwell J., Wang, Boyu, and Ling, Charles
- Subjects
- *
NATURAL language processing , *VECTOR spaces , *ALGORITHMS , *RESEARCH personnel - Abstract
In the era of rapid paper publications in various venues, automatic citation recommendations would be highly useful to researchers when they write papers. Local citation recommendation aims to recommend possible papers to cite given local citation contexts. Previous work mainly computes the similarity score between citation contexts and cited papers on a one-to-one basis, which is quite time-consuming. We train a pair of neural network encoders that map citation contexts and all possible cited papers to the same vector space, respectively. After that, we index the positions of all cited papers in the vector space. This makes our process for searching recommended papers considerably faster. On the other hand, existing methods tend to recommend papers that are highly similar to each other, which makes recommendations lack diversity. Therefore, we extend our algorithm to perform multi-topic recommendations. We generate multi-topic training examples based on the index we mentioned earlier. Furthermore, we specially design a multi-group contrastive learning method to train our model so that it can distinguish different topics. Empirical experiments show that our model outperforms previous methods by a wide margin. Our model is also light weighted and has been deployed online so that researchers can use it to obtain recommended citations for their own paper in real-time. • Proposed FLCR algorithm with sentence-transformer & k-d tree. • Introduced multi-topic citation recommendation for diverse contexts. • Developed large-scale dataset with 1.7 million citation contexts for evaluation. • Demonstrated significant performance improvement over previous methods. • Deployed demo for real-time citation recommendations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Textual tag recommendation with multi-tag topical attention.
- Author
-
Xu, Pengyu, Xia, Mingxuan, Xiao, Lin, Liu, Huafeng, Liu, Bing, Jing, Liping, and Yu, Jian
- Subjects
- *
TAGS (Metadata) , *INFORMATION retrieval , *INFORMATION services , *RECOMMENDER systems , *USER-generated content , *NEUROPROSTHESES - Abstract
Tagging can be regarded as the action of connecting relevant user-defined keywords to an item, indirectly improving the quality of the information retrieval services that rely on tags as data sources. Tag recommendation dramatically enhances the quality of tags by assisting users in tagging. Although there exist many studies on tag recommendation for textual content, few of them consider two characteristics in real applications, i.e., the long-tail distribution of tags and the topic-tag correlation. In this paper, we propose a Topic-Guided Tag Recommendation (TGTR) model to recommend tags by jointly incorporating dynamic neural topic. Specifically, TGTR first generates dynamic neural topic that would indicate the tags by a neural topic generator. Then, a sequence encoder is used to distill indicative features from the post. To effectively leverage the topic and alleviate the data imbalance, we design a multi-tag topical attention mechanism to get a tag-specific post representation for each tag with the help of dynamic neural topic. These three modules are seamlessly joined together via an end-to-end multi-task learning model, which is helpful for the three parts to enhance each other and balance the effects of topics and tags. Extensive experiments have been conducted on four real-world datasets and demonstrate that our model outperforms the state-of-the-art approaches by a large margin, especially on tail-tags. The code, data and hyper-parameter settings are publicly released for reproducibility. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Multi-granularity retrieval of mineral resource geological reports based on multi-feature association.
- Author
-
Ma, Kai, Deng, Junyuan, Tian, Miao, Tao, Liufeng, Liu, Junjie, Xie, Zhong, Huang, Hua, and Qiu, Qinjun
- Subjects
- *
MINES & mineral resources , *APRIORI algorithm , *GEOLOGICAL research , *INFORMATION retrieval , *GEOLOGICAL maps , *NATURAL language processing - Abstract
[Display omitted] • Multi-granularity information association approaches are used to uncover multi-level geological knowledge and support accurate retrieval of mineral resource geological reports. • Extracting geological multiple feature information (topic, time, space, figure, and table) based on the multi-granularity of geological reports. • Mining potential multiple feature information associations by an improved apriori algorithm. Massive geologic report contains all kinds of multimodal geologic data information (geologic text, geologic maps, geologic tables, etc.), which contain a lot of rich geologic basic knowledge and expert experience knowledge about rocks and minerals, stratigraphic structure, geologic age, geographic location, and so on. Accurate retrieval of specific information from massive geologic data has become an important need for geologic information retrieval. However, the majority of existing research primarily revolves around extracting and associating information at a single granularity to facilitate geological semantic retrieval, which ignores many potential semantic associations, leading to ambiguity and fuzziness in semantic retrieval. To solve these problems, this paper proposes a multi-granularity (document-chapter-paragraph) geological information retrieval framework for accurate semantic retrieval. The framework firstly extracts topic feature information, spatiotemporal feature information, figure and table feature information based on the multi-granularity of geological reports. Then, an improved apriori algorithm is used to mine and visualize the associations among the feature information to discover the semantic associations of the geological reports at multiple levels of granularity. Finally, experiments are designed to validate the application of the proposed multi-granularity information retrieval framework on the accurate retrieval of geological reports. The experimental results show that the proposed multi-granularity information retrieval framework in this paper can dig deeper into underlying geo-semantic information and realize accurate retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. A neural harmonic-aware network with gated attentive fusion for singing melody extraction.
- Author
-
Yu, Shuai, Yu, Yi, Sun, Xiaoheng, and Li, Wei
- Subjects
- *
MELODY , *CONVOLUTIONAL neural networks , *SINGING , *INFORMATION retrieval - Abstract
Singing melody extraction from polyphonic musical audio is one of the most challenging tasks in music information retrieval (MIR). Recently, data-driven methods based on convolutional neural networks (CNNs) have achieved great success for this task. In the literature, harmonic relationship has been proven crucial for this task. However, few existing CNN-based singing melody extraction methods consider the harmonic relationship in the training stage. The state-of-the-art CNN based methods are not capable of capturing such long-dependency harmonic relationship due to limited receptive field and unacceptable computation cost. In this paper, we introduce a neural harmonic-aware network with gated attentive fusion (NHAN-GAF) for singing melody extraction. Specifically, in the 2-D spectrograms modeling branch, we propose to employ multiple parallel 1-D CNN kernels to capture the harmonic relations between 1–2 octaves along the frequency axis in the spectrogram. Considering the advantage of jointly using Time–Frequency (T-F) domain and time domain information, we use two-branch neural nets to learn discriminative representation for this task. A novel gated attentive fusion (GAF) network is suggested to encode potential correlations between the two branches and fuse the descriptors learned from raw waveform and T-F spectrograms. Moreover, the idea of GAF can be exploited to the multimedia applications with multimodal analysis. With the two proposed components, our proposed model is capable of learning the harmonic relationship in the spectrogram and better capturing the contextual but discriminative features for singing melody extraction. We use part of the vocal tracks of the RWC dataset and MIR-1 K dataset to train the model and evaluate the performance of the proposed model on the ADC 2004, MIREX 05 and MedleyDB datasets. The experimental results show that the proposed method outperforms the state-of-the-art ones. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. A semi-automatic data integration process of heterogeneous databases.
- Author
-
Barbella, Marcello and Tortora, Genoveffa
- Subjects
- *
DATA integration , *ELECTRONIC data processing , *INFORMATION retrieval , *DATABASES , *CONTENT analysis - Abstract
• Data Integration of two or more heterogeneous databases. • Syntactic and semantic analysis of textual data. • Semi-automatic process. [Display omitted] One of the most difficult issues today, is the integration of data from various sources. Thus, it arises the need of automatic Data Integration (DI) methods. However, in the literature there are fully automatic or semi-automatic DI techniques, but they require the involvement of IT-experts with specific domain skills. In this paper we present a novel DI methodology for which it is not required the involvement of IT-experts; in this methodology syntactically/semantically similar entities present in the sources are merged, by exploiting an information retrieval technique, a clustering method and a trained neural network. Although the suggested process is completely automated, we planned some interactions with the Company Manager, a figure who is not required to have IT-skills, but whose only contribution will be to define limits and tolerance thresholds during the DI process, based on the interests of the company. The validity of the proposed approach showed an integration accuracy between 99 % − 100 %. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. An efficacy analysis of data encryption architecture for cloud platform.
- Author
-
Malhotra, Sheenam and Singh, Williamjeet
- Subjects
DATA encryption ,INFORMATION storage & retrieval systems ,CLOUD computing ,CLOUD storage ,DATA analysis ,INFORMATION retrieval - Abstract
In recent times, cloud computing is being utilized largely for storage and information sharing purposes in several established commercial segments, particularly those where online businesses are prevalent, such as Google, Amazon, etc. Cloud system presents several benefits to users in terms of easy operations, low implementation, and maintenance expenses. However, significant risks are encountered in the data security procedures of cloud systems. Although the area is frequently being analyzed and reformed, the concern of cloud data protection and user reliability remains under uncertainty due to growing cyber-attack schemes as well as cloud storage system errors. To deal with this risk and contribute to the endeavor of providing optimal data security solutions in cloud data storage and retrieval system, this paper proposes a Symmetric Searchable Encryption influenced Machine Learning based cloud data encryption and retrieval model. The proposed model enhances data security and employs an effective keyword ranking approach by using an Artificial Neural Network. The comparative assessment of the proposed model against multiclass SVM and Naïve Bayes has established the better operational potentiality of the model. The effectiveness of the proposed work is justified by the association between high TPR and low FPR. Further, a low CCR of 0.6973 adds up to the success of the proposed work. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Perf-Use-Sport study: Consummation of performance enhancing substances among athletes consulting in primary cares centers of Herault.
- Author
-
Jeannou, B., Feuvrier, F., Peyre-Costa, D., and Griffiths, K.
- Subjects
- *
PRIMARY care , *GENERAL practitioners , *MEDICAL practice , *MEDICAL centers , *INFORMATION retrieval - Abstract
In our knowledge data about supplementations of recreational athletes are very limited in France. The health of these athletes is mostly under the supervision of their general practitioner. If it was shown that these athletes use substances, their general practitioners will need to find effective ways to protect their health. The main objective of this study is to estimate the prevalence of use of performance enhancing substances among athletes consulting in primary care center. Others goals are to collect information about motivations of user's, person advising consumer's and places to purchase substances. All major athletes consulting in general medical practice between the 24th of August 2020 and the 06th of November 2020 were invited to answer an anonymous questionnaire. A total of 40 randomized physicians of Herault participate in the study. We installed posters and flyers in waiting room to provide information and allow the athlete to submit to online version or paper version of questionnaire. A total of 281 athletes submitted to the questionnaire with 54.5% of male (n = 153) and 45.5% of female (n = 128) and an average age of 39.7 years (± 14.9). Over than 96% was recreational athletes (n = 272) and 59.9% of them report using a least one substance. About 66.9% of consumers report using dietary supplement, 67.5% medications and 13.6% illicit drugs or anabolic agents. Motivations and places to purchase depend on substances. Near of 60% of athletes consulting in primary care center report use performances enhancing substances. It seems important that physicians are aware of this in order to help the athlete to protect his health. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. BagFormer: Better cross-modal retrieval via bag-wise interaction.
- Author
-
Hou, Haowen, Yan, Xiaopeng, and Zhang, Yigeng
- Subjects
- *
INFORMATION retrieval - Abstract
In the field of cross-modal retrieval, single encoder models tend to perform better than dual encoder models, but they suffer from high latency and low throughput. In this paper, we propose a dual encoder model called BagFormer that utilizes bag-wise late interaction mechanism to improve re-rank performance without sacrificing latency and throughput. BagFormer achieves this by employing a bagging layer, which facilitates the transformation of text to an appropriate granularity. This not only mitigates the issue of modal granularity mismatch but also enables the integration of entity knowledge into the model. Our experiments have shown that BagFormer ViT-B outperforms the traditional dual-encoder model CLIP ViT-B by 7.97% in zero-shot settings. Under fine-tuned conditions, BagFormer ViT-B demonstrates an even more significant improvement of 17.98% over CLIP ViT-B. Moreover, BagFormer not only matches the performance of cutting-edge single-encoder models in cross-modal retrieval tasks but also provides efficient inference processes characterized by lower latency and higher throughput. Compared to single-encoder models, BagFormer can achieve a speedup ratio of 38.14 when re-ranking individual candidates. Code and models are available at github.com/howard-hou/BagFormer. [Display omitted] • BagFormer introduces bag-wise interaction for cross-modal retrieval. • Reduces modal granularity mismatch, improving dual encoder performance. • Achieves state-of-the-art results with lower latency and higher throughput. • Utilizes bagging layer to transform text for better image–text alignment. • Outperforms other dual encoder models in retrieval metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. On suffix tree detection.
- Author
-
Amir, Amihood, Kondratovsky, Eitan, and Levy, Avivit
- Subjects
- *
DATA structures , *REVERSE engineering , *SUFFIXES & prefixes (Grammar) , *INFORMATION retrieval , *INFORMATION processing - Abstract
A suffix tree is a fundamental data structure for string processing and information retrieval, however, its structure is still not well understood. The suffix trees reverse engineering problem , which its research aims at reducing this gap, is the following. Given an ordered rooted tree T with unlabeled edges, determine whether there exists a string w such that the unlabeled-edges suffix tree of w is isomorphic to T. Previous studies on this problem consider the relaxation of having the suffix links as well as assume a binary alphabet. This paper is the first to consider the suffix tree detection problem , in which the relaxation of having suffix links as input is removed. We study suffix tree detection on two scenarios that are interesting per se. We provide a suffix tree detection algorithm for general alphabet periodic strings. Given an ordered tree T with n leaves, our detection algorithm takes O (n + | Σ | p) -time, where p is the unknown in advance length of a period that repeats at least 3 times in a string S having a suffix tree structure identical to T , if such S exists. Therefore, it is a polynomial time algorithm if p is a constant and a linear time algorithm if, in addition, the alphabet has a sub-linear size. We also show some necessary (but insufficient) conditions for binary alphabet general strings suffix tree detection. By this we take another step towards understanding suffix trees structure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Information extraction from medical case reports using OpenAI InstructGPT.
- Author
-
Sciannameo, Veronica, Pagliari, Daniele Jahier, Urru, Sara, Grimaldi, Piercesare, Ocagli, Honoria, Ahsani-Nasab, Sara, Comoretto, Rosanna Irene, Gregori, Dario, and Berchialla, Paola
- Abstract
• We explored the ability of pre-trained language models (LLMs) to extract information from raw PDFs of clinical case reports. • We used InstructGPT to extract data on pediatric body injuries from articles written in multiple languages and achieved high accuracy in extracting child's sex and type of foreign body causing injury. • Compared InstructGPT to other NLP systems and highlighted its flexibility in being easily adaptable to various domains and tasks without extensive training data. • Showed advantages of using LLMs for data extraction, such as not requiring data pre-processing or programming skills and reducing the need for high-performing computers. Researchers commonly use automated solutions such as Natural Language Processing (NLP) systems to extract clinical information from large volumes of unstructured data. However, clinical text's poor semantic structure and domain-specific vocabulary can make it challenging to develop a one-size-fits-all solution. Large Language Models (LLMs), such as OpenAI's Generative Pre-Trained Transformer 3 (GPT-3), offer a promising solution for capturing and standardizing unstructured clinical information. This study evaluated the performance of InstructGPT, a family of models derived from LLM GPT-3, to extract relevant patient information from medical case reports and discussed the advantages and disadvantages of LLMs versus dedicated NLP methods. In this paper, 208 articles related to case reports of foreign body injuries in children were identified by searching PubMed, Scopus, and Web of Science. A reviewer manually extracted information on sex, age, the object that caused the injury, and the injured body part for each patient to build a gold standard to compare the performance of InstructGPT. InstructGPT achieved high accuracy in classifying the sex, age, object and body part involved in the injury, with 94%, 82%, 94% and 89%, respectively. When excluding articles for which InstructGPT could not retrieve any information, the accuracy for determining the child's sex and age improved to 97%, and the accuracy for identifying the injured body part improved to 93%. InstructGPT was also able to extract information from non-English language articles. The study highlights that LLMs have the potential to eliminate the necessity for task-specific training (zero-shot extraction), allowing the retrieval of clinical information from unstructured natural language text, particularly from published scientific literature like case reports, by directly utilizing the PDF file of the article without any pre-processing and without requiring any technical expertise in NLP or Machine Learning. The diverse nature of the corpus, which includes articles written in languages other than English, some of which contain a wide range of clinical details while others lack information, adds to the strength of the study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. CAST: Cross-Modal Retrieval and Visual Conditioning for image captioning.
- Author
-
Cao, Shan, An, Gaoyun, Cen, Yigang, Yang, Zhaoqilin, and Lin, Weisi
- Subjects
- *
IMAGE representation , *EPISODIC memory , *INFORMATION retrieval , *COGNITION , *HALLUCINATIONS (Artificial intelligence) - Abstract
Image captioning requires not only accurate recognition of objects and corresponding relationships, but also full comprehension of the scene information. However, existing models suffer from partial understanding and object hallucination. In this paper, a Cross-modal retrievAl and viSual condiTioning model (CAST) is proposed to address the above issues for image captioning with three key modules: an image–text retriever, an image & memory comprehender and a dual attention decoder. Aiming at a comprehensive understanding, we propose to exploit cross-modal retrieval to mimic human cognition, i.e. , to trigger retrieval of contextual information (called episodic memory) about a specific event. Specifically, the image–text retriever searches the top n relevant sentences which serve as episodic memory for each input image. Then the image & memory comprehender encodes an input image and enriches episodic memory by self-attention and relevance attention respectively, which can encourage CAST to comprehend the scene thoroughly and support decoding more effectively. Finally, such image representation and memory are integrated into our dual attention decoder, which performs visual conditioning by re-weighting image and text features to alleviate object hallucination. Extensive experiments are conducted on MS COCO and Flickr30k datasets, which demonstrate that our CAST achieves state-of-the-art performance. Our model also has a promising performance even in low-resource scenarios (i.e. 0.1%, 0.5% and 1% of MS COCO training set). • An image–text retriever is proposed to search contextual information for captioning. • An image & memory comprehender is proposed for further understanding the scene. • A dual attention decoder is proposed to alleviate object hallucination. • The cross-modal retrieval and visual conditioning model achieves SOTA performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Chemical, biological, radiological and nuclear event detection and classification using ontology interrogation and social media data.
- Author
-
Alrefaie, Mohamed Taher, Jackson, Tom W., Onojeharho, Ejovwoke, and Elayan, Suzanne
- Subjects
- *
NATURAL language processing , *POISONS , *EMERGENCY management , *PUBLIC safety , *INFORMATION resources management - Abstract
In an era where chemical, biological, radiological, and nuclear (CBRN) incidents present a grave threat to public safety, timely and accurate information is paramount. The complexity of the CBRN concept encompasses a range of incidents, each with unique and overlapping symptoms, related substances, and event descriptions. This study introduces an innovative approach to the development of a CBRN-specific ontology, uniting diverse data sources and domain expertise to construct a comprehensive repository of CBRN events, sub-events, their causes, symptoms, and toxic substances. Unlike prior methodologies reliant on keyword searches and predefined categories, our approach enables a holistic analysis of textual data by capturing intricate relationships between symptoms and toxic substances. We leverage this ontology in conjunction with a tailored interrogation algorithm to detect potential CBRN incidents through social media data. The algorithm was then tested on datasets of three actual CBRN incidents, one fictional incident (TV show) that simulated a nuclear incident and one non-CBRN. The interrogation algorithm was able to detect the five CBRN incidents accurately. However, the study showcased the need to extend the algorithm to distinguish between real and fictional CBRN incidents. These findings underscore the potential of this approach to deliver timely information on potential CBRN incidents. Nevertheless, the study acknowledged the inherent challenges and limitations in utilizing social media data, including the risk of misinformation, fictional events, fake news, and interference from malicious actors, all of which can affect the accuracy and reliability of the information collected. • This paper proposes a novel approach to building a comprehensive ontology of Chemical, Biological, Radiological and Nuclear (CBRN) symptoms and toxic substances. • The proposed ontology captures the relationships among CBRN events, sub-events, event type, their related symptoms and toxic substances; enabling a more comprehensive analysis of social media data. • An ontology interrogation algorithm is proposed to analyze social media data for potential CBRN incidents. • Results show the potential of this approach to provide timely and accurate information to emergency responders. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia.
- Author
-
Hussain, Muhammad Jawad, Bai, Heming, Wasti, Shahbaz Hassan, Huang, Guangjian, and Jiang, Yuncheng
- Subjects
- *
HYPERLINKS , *ARTIFICIAL intelligence , *INFORMATION retrieval , *COGNITIVE science - Abstract
Many applications in cognitive science and artificial intelligence utilize semantic similarity and relatedness to solve difficult tasks such as information retrieval, word sense disambiguation, and text classification. Previously, several approaches for evaluating concept similarity and relatedness based on WordNet or Wikipedia have been proposed. WordNet-based methods rely on highly precise knowledge but have limited lexical coverage. In contrast, Wikipedia-based models achieve more coverage but sacrifice knowledge quality. Therefore, in this paper, we focus on developing a comprehensive semantic similarity and relatedness method based on WordNet and Wikipedia. To improve the accuracy of existing measures, we combine various taxonomic and non-taxonomic features of WordNet, including gloss, lemmas, examples, sister-terms, derivations, holonyms/meronyms, and hypernyms/hyponyms, with Wikipedia gloss and hyperlinks, to describe concepts. We present a novel technique for extracting ' is-a ' and ' part-whole ' relationships between concepts using the Wikipedia link structure. The suggested technique identifies taxonomic and non-taxonomic relationships between concepts and offers dense vector representations of concepts. To fully exploit WordNet and Wikipedia's semantic attributes, the proposed method integrates their semantic knowledge at feature-level, combining semantic similarity and relatedness into a single comprehensive measure. The experimental results demonstrate the effectiveness of the proposed method over state-of-the-art measures on various gold standard benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. Improving software maintenance with improved bug triaging.
- Author
-
Gupta, Chetna, Inácio, Pedro R.M., and Freire, Mário M
- Subjects
FUZZY decision making ,SOFTWARE maintenance ,MATHEMATICAL optimization ,JUDGMENT (Psychology) ,TOPSIS method ,LEGAL judgments - Abstract
Bug triaging is a critical and time-consuming activity of software maintenance. This paper aims to present an automated heuristic approach combined with fuzzy multi-criteria decision-making for bug triaging. To date, studies lack consideration of multi-criteria inputs to gather decisive and explicit knowledge of bug reports. The proposed approach builds a bug priority queue using the multi-criteria fuzzy Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method and combines it with Bacterial Foraging Optimization Algorithm (BFOA) and Bar Systems (BAR) optimization to select developers. A relative threshold value is computed and categorization of developers is performed using hybrid optimization techniques to make a distinction between active, inactive, or new developers for bug allocation. The harmonic mean of precision, recall, f-measure, and accuracy obtained is 92.05%, 89.21%, 85.09%, and 93.11% respectively. This indicates increased overall accuracy of 90%±2% when compared with existing approaches. Overall, it is a novel solution to improve the bug assignment process which utilizes intuitive judgment of triagers using fuzzy multi-criteria decision making and is capable of making a distinction between active, inactive, and new developers based on their relative workload categorization. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Codebook-softened product quantization for high accuracy approximate nearest neighbor search.
- Author
-
Fan, Jingya, Pan, Zhibin, Wang, Liangzhuang, and Wang, Yang
- Subjects
- *
PATTERN recognition systems , *INFORMATION retrieval - Abstract
Product quantization (PQ) is a fundamental technique for approximate nearest neighbor (ANN) search in many applications such as information retrieval, computer vision and pattern recognition. In the existing PQ-based methods for approximate nearest neighbor search, the reachable best search accuracy is achieved by using fixed codebooks. The search performance is limited by the quality of the hard codebooks. Unlike the existing methods, in this paper, we present a novel codebook-softened product quantization (CSPQ) method to achieve more quantization levels by softening codebooks. We firstly analyze how well the database vectors match the trained codebooks by examining quantization error for each database vector, and select the bad-matching database vectors. Then, we give the trained codebooks b -bit freedom to soften codebooks. Finally, by minimizing quantization errors, the bad-matching vectors are encoded by softened codebooks and the labels of best-matching codebooks are recorded. Experimental results on SIFT1M, GIST1M and SIFT10M show that, despite its simplicity, our proposed method achieves higher accuracy compared with PQ and it can be combined with other non-exhaustive frameworks to achieve fast search. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Short text classification for Arabic social media tweets.
- Author
-
Alzanin, Samah M., Azmi, Aqil M., and Aboalsamh, Hatim A.
- Subjects
MICROBLOGS ,DEEP learning ,FOLKSONOMIES ,SOCIAL media ,USER-generated content ,RADIAL basis functions ,INFORMATION retrieval ,SUPPORT vector machines - Abstract
With the rapid growth in the number of tweets published daily on Twitter, automated classification of tweets becomes necessary for broad diverse applications (e.g., information retrieval, topic labeling, sentiment analysis, rumors detection) to better understand what these tweets are, and what the users are expressing in this social platform. Text classification is the process of assigning one or more pre-defined categories to text according to its content. Tweets are short, and the short text does not have enough contextual information, which is part of the challenge in their classification. Adding to the challenge is the increase in ambiguity since the diacritical marking is not explicitly specified in most Modern Standard Arabic (MSA) texts. Not to mention the Arabic tweets are known to contain fused text of MSA and dialectal Arabic. In this paper, we propose a scheme to classify the textual tweets in the Arabic language based on its linguistic characteristics and content into five different categories. We explore two different textual representations: word embedding using Word2vec and stemmed text with term frequency-inverse document frequency (tf-idf). We tested three different classifiers: Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), and Random Forest (RF). All the classifiers had their hyperparameters tuned. We collected and manually annotated a dataset of approximately 35,600 Arabic tweets for the experiments. Statistically, the RF and the SVM with radial basis function (RBF) kernel performed equally well when used with stemming and tf-idf , achieving macro- F 1 scores ranging between 98.09% and 98.14%. The GNB with word embedding was disappointingly low performer. Our result tops the current state-of-the-art score of 92.95% using a deep learning approach, RNN-GRU (recurrent neural network-gated recurrent unit). [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Embedding and generalization of formula with context in the retrieval of mathematical information.
- Author
-
Dadure, Pankaj, Pakray, Partha, and Bandyopadhyay, Sivaji
- Subjects
INFORMATION retrieval ,GENERALIZATION ,DOCUMENT imaging systems ,TECHNOLOGICAL innovations ,MATTRESSES - Abstract
Retrieval of mathematical information from scientific documents is one of the crucial tasks. Numerous Mathematical Information Retrieval (MIR) systems have been developed, which mainly focus on the improvement over the indexing and the searching mechanism, the poor results obtained for evaluation measures depict major limitations of such systems. These enhance the scope of improvement and new innovations through the inclusion of functionalities, which can resolve the challenges of MIR system. Further, to improve the performance of the MIR systems, this paper proposed a formula embedding and generalization approach with the context, in addition to innovative relevance measurement technique. In this approach, documents are preprocessed by the document preprocessor module and extracted the formulas in Presentation MathML format with their context. The formula embedding and generalization modules of the proposed approach formed the binary vectors where '1' represents the presence, and '0' represents the absence of a particular entity in a formula, and subsequently, the vectors of formulas with context are indexed by the indexer. The innovative relevance measurement technique of the proposed approach ranked those documents first, which are retrieved by both formula embedding and generalization modules as compared to the individual one. The proposed approach has been tested on the MathTagArticles of Wikipedia of NTCIR-12, and the obtained results verify the significance of the context of the formula and the dissimilarity factor in the retrieval of mathematical information. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Framework for tasks suggestion on web search based on unsupervised learning techniques.
- Author
-
Alsulmi, Mohammad and Alshamarani, Reham
- Subjects
INTERNET searching ,INFORMATION needs ,WEB accessibility ,WEB search engines ,MACHINE translating - Abstract
Search systems have played an essential role in improving user experience and information accessibility on the web, allowing users to express their information needs (provided as search queries) and serving users with the results that satisfy those needs. However, a user's search task can be complex and may not be expressed using a single search query, requiring the user to write several queries to fulfill all the aspects of his or her needs. In such scenarios, an intelligent search system would be beneficial to identify and understand the original search task issued by a user and then suggest several search tasks (in a form of key-phrases or short topics) related to the original search task. Aiming to tackle this limitation, this paper proposes a framework for applying several unsupervised learning approaches, including topic modeling and log mining. The results of applying these approaches to large user session data show that, indeed, these approaches would be applicable in search suggestion and task recommendation, reaching a significant improvement over a strong baseline. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS.
- Author
-
Azeroual, Otmane, Schöpfel, Joachim, Ivanovic, Dragan, and Nikiforova, Anastasija
- Subjects
DATA libraries ,DATA quality ,ELECTRONIC data processing ,INFORMATION retrieval ,INFORMATION storage & retrieval systems ,DATA integration ,SCALABILITY - Abstract
Consolidation of the research information improves the quality of data integration, reducing duplicates between systems and enabling the required flexibility and scalability when processing various data sources. We assume that the combination of a data lake as a data repository and a data wrangling process should allow low-quality or "bad" data to be identified and eliminated, leaving only high-quality data, referred to as "research information" in the Research Information System (RIS) domain, allowing for the most accurate insights gained on their basis. This, however, would lead to increased value of both the data themselves and data-driven actions contributing to more accurate and aware decision-making. This cleansed research information is then entered into the appropriate target Current Research Information System (CRIS) so that it can be used for further data processing steps. In order to minimize the effort for the analysis, the proliferation and enrichment of large amounts of data and metadata, as well as to achieve far-reaching added value in information retrieval for CRIS employees, developers and end users, this paper outlines the concept of a curated data lake with the data wrangling process, showing how it can be used in CRIS to clean up data from heterogeneous data sources during their collection and integration. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Metadata implementation and data discoverability: A survey on university libraries' Dataverse portals.
- Author
-
Chiu, Tzu-Heng, Chen, Hsin-liang, and Cline, Ellen
- Subjects
- *
ACADEMIC libraries , *METADATA , *DATA management , *INFORMATION retrieval , *INSTITUTIONAL repositories - Abstract
The purpose of this practical case study is to examine the development of Dataverse, a global research data management consortium. This paper is the second in a project focusing on data discoverability and current metadata implementation on the Dataverse portals established by 27 university libraries worldwide. Five research questions were proposed to identify the most popular metadata standards and elements, search interface options, and result display formats by those portals. The data were collected from 27 university libraries worldwide between December 1, 2020 and January 31, 2021. According to the results of the descriptive analyses, the most popular metadata elements for the dataset overview were Subject and Description , while Dataset persistent ID , Publication Date , Title, Author , Contact , Deposit Date , Depositor , Description , and Subject were the most popular elements for the metadata record of each dataset. Publication Year , Author Names , and Subject were found to be the most common search facets used by the portals. English was the most common language used for the search interfaces and metadata descriptions. Based on their findings from this evidence-based study, the authors recommend future research on the development of institutional data portal infrastructure, on stakeholder outreach and training, and on user studies on dataset retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. The incomplete analytic hierarchy process and Bradley–Terry model: (In)consistency and information retrieval.
- Author
-
Gyarmati, László, Orbán-Mihálykó, Éva, Mihálykó, Csaba, Szádoczki, Zsombor, and Bozóki, Sándor
- Subjects
- *
ANALYTIC hierarchy process , *INFORMATION retrieval , *INFORMATION theory , *STOCHASTIC models , *INFORMATION modeling , *COINCIDENCE - Abstract
Several methods of preference modeling, ranking, voting and multi-criteria decision-making include pairwise comparisons. It is usually simpler to compare two objects at a time, furthermore, some relations (e.g., the outcome of sports matches) are naturally known for pairs. This paper investigates and compares pairwise comparison models and the stochastic Bradley–Terry model. It is proved that they provide the same priority vectors for consistent (complete or incomplete) comparisons. For incomplete comparisons, all filling in levels are considered. Recent results identified the optimal subsets and sequences of multiplicative/additive/reciprocal pairwise comparisons for small sizes of items (up to n = 6). Simulations of this paper show that the same subsets and sequences are optimal in the case of the Bradley–Terry and the Thurstone models as well. This somehow surprising coincidence suggests the existence of a more general result. Further models of information and preference theory are subject to future investigation to identify optimal subsets of input data. • Multiplicative/additive/reciprocal pairwise comparisons vs. Bradley–Terry model. • Their equivalence is shown in case of consistency. • All filling in levels are investigated. • The same subsets of comparisons are shown to be optimal even in inconsistent cases. • Results raise new research direction: Are such coincidences valid in further models? [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Dynamic prompt-based virtual assistant framework for BIM information search.
- Author
-
Zheng, Junwen and Fischer, Martin
- Subjects
- *
LANGUAGE models , *BUILDING information modeling , *NATURAL language processing , *VIRTUAL prototypes , *ELECTRONIC data processing , *QUESTION answering systems - Abstract
Efficient information search from building information models (BIMs) requires deep BIM knowledge or extensive engineering efforts for building natural language (NL)-based interfaces. To address this challenge, this paper introduces a dynamic prompt-based virtual assistant framework dubbed "BIMS-GPT" that integrates generative pre-trained transformer (GPT) technologies, supporting NL-based BIM search. To understand users' NL queries, extract relevant information from BIM databases, and deliver NL responses along with 3D visualizations, a dynamic prompt-based process was developed. In a case study, BIMS-GPT's functionality is demonstrated through a virtual assistant prototype for a hospital building. When evaluated with a BIM query dataset, the approach achieves accuracy rates of 99.5% for classifying NL queries with incorporating 2% of the data in prompts. This paper contributes to the advancement of effective and versatile virtual assistants for BIMs in the construction industry as it significantly enhances BIM accessibility while reducing the engineering and training data prerequisites for processing NL queries. • Introduced a dynamic prompt-based virtual assistant for BIM information search • Integrated BIM and GPT technologies for developing natural language interfaces • Explored prompt engineering for GPT to interpret NL queries and summarize NL answers • Improved BIM accessibility by enabling NL-based interactions with 3D visualizations • Reduced engineering and training data prerequisites for processing BIM NL queries [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. A review of different prediction methods for reversible data hiding.
- Author
-
Kumar, Rajeev, Sharma, Deepak, Dua, Amit, and Jung, Ki-Hyun
- Subjects
- *
DATA encryption , *PREDICTION models , *INFORMATION retrieval , *IMAGE quality analysis , *MATHEMATICAL optimization - Abstract
In recent times, prediction error expansion (PEE) based reversible data hiding (RDH) schemes have gained significant traction due to their performance in terms of embedding capacity and image quality. However, the major part of their performance is dependent on how good the prediction has been. For a good prediction, various predictors such as median edge detection (MED), rhombus mean, least square, convolution neural network based predictor (CNNP) have been introduced. In this paper, a review of the working predictors being used in PEE-RDH is presented and discussed. In addition, a new predictor using extreme gradient boosting (XGBoost) is introduced in reversible data hiding. The XGBoost predictor makes use of a machine learning algorithm, where several optimization techniques are combined to get accurate results. To evaluate the performance comprehensively, experimental results considering different test images have been used and analyzed. From the analysis, it has been found that the XGBoost provides better prediction accuracy than some of the existing predictors. However, its performance is not up to the level of some other popular predictors such as least square, CNNP. • The paper first presents a detailed a review of various predictors being used in RDH. • The rhombus mean predictor is found to be the simplest with prediction accuracy. • A new XGBoost predictor has also been introduced for RDH. • The performance analysis and comparisons are also presented. • Further, future research directions have been recommended. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Calibration-free retrieval of density information from computed tomography data.
- Author
-
Moonen, Peter, Dhaene, Jelle, Van den Bulcke, Jan, Boone, Matthieu N., and Van Hoorebeke, Luc
- Subjects
- *
COMPUTED tomography , *INFORMATION retrieval , *ATTENUATION coefficients , *MATERIALS science , *X-ray imaging , *X-ray tubes - Abstract
In many research areas, particularly in materials science, retrieving a 3D distribution of the material density in a sample is of key interest. Although standard lab-based X-ray μCT imaging is capable of retrieving the 3D distribution of local attenuation coefficients, the local density cannot be determined unambiguously without the use of calibration standards due to the polychromaticity of the used X-ray tubes. In this paper, two methods are presented to retrieve this information based on accurate knowledge of the polychromatic properties of source and detector in the used μCT system. The methods use a very different approach, one being a polychromatic pre-processing method and one relying on a novel polychromatic iterative reconstruction method, yet produce equivalent results. They are limited to objects with a homogeneous composition, but are extremely powerful in a broad range of applications in materials characterization where the density distribution is of interest. Moreover, the method relying on iterative reconstruction has the potential to be applied for multi-material objects as well. A number of application examples illustrate the potential of the methods to not only retrieve quantitative estimates of density and sub-voxel porosity, but also to greatly reduce or eliminate the spectral artifacts that often hinder conventional calibration schemes. • Two methods for quantitative reconstruction of polychromatic μCT data are presented. • The material density is directly retrieved for mono-material objects. • Using the proposed methodologies, no material-dependent calibration is required. • The methods improve the reconstruction quality for highly-attenuating materials. • One of the methods offers potential for multi-material objects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. A Domain Specific Multi-Document Reading Comprehension Method for Artificial Intelligence Application.
- Author
-
Lei, Chen, Baojin, Zhao, Xinran, Dong, and Zaixing, Cui
- Subjects
ARTIFICIAL intelligence ,READING comprehension ,DATA mining ,INFORMATION retrieval - Abstract
With the development of artificial intelligence, information retrieval and information extraction and knowledge services from large-scale texts are currently one of the most urgent needs of people. Machine reading comprehension technology is one of the key technologies that can be applied to knowledge mining. At present, multi-document reading comprehension has received a lot of attention, and its application scenarios are also very extensive. The main goal of this article is to find the answer to the question from a large number of smartphone manuals based on the questions raised by the user about the operation of the smartphone. This paper designs a pipeline structure with three modules: retrieval, extraction, and sorting. At the same time, it designs auxiliary tasks for the extraction model to improve the extraction ability, and uses a new answer scoring method to select answers. The final experiment proves that our method can effectively improve the answer's quality. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Unsupervised emotional state recognition based on clustering of EEG features.
- Author
-
Dura, Aleksandra and Wosiak, Agnieszka
- Subjects
EMOTIONAL state ,EMOTION recognition ,MACHINE learning ,CONTEXT effects (Psychology) ,LEARNING strategies ,INFORMATION retrieval - Abstract
Efficient information retrieval from the EEG sensors is a complex and challenging task, particularly in the context of psychology, including emotional states. Therefore, different machine learning strategies are considered to improve the processes based on EEG signal analysis. Most of them use supervised approaches since EEG datasets usually include metadata and descriptions that can be used for learning. However, these descriptions are mainly based on self-reports of emotional states, which means that they may not be reliable or objective. The paper proposes an approach that incorporates unsupervised learning techniques as a solution supporting classification where classification labels may be uncertain. The research proved that our approach improves the recognition of emotions and gives results with an average accuracy greater by fve percentage points. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. Musical Instrument Recognition with a Convolutional Neural Network and Staged Training.
- Author
-
Szeliga, Dominika, Tarasiuk, Paweł, Stasiak, Bartłomiej, and Szczepaniak, Piotr S.
- Subjects
CONVOLUTIONAL neural networks ,MUSICAL instruments ,RECOMMENDER systems ,MUSICAL composition ,INFORMATION retrieval - Abstract
Musical instrument recognition is an important task within the broad field of Music Information Retrieval. It helps to build recommendation systems, compute similarity between musical compositions and enable automatic search in music collections with regard to the instrument. The task has two variants differed by the difficulty level. The simpler one is classification based on the sound of a single instrument, while the more difficult challenge is to recognize the predominant instrument in polyphonic recordings. In this paper, we used a convolutional neural network to solve both of these problems. As the analysis of monotimbral recordings is relatively easy, we used the knowledge acquired during solving it to train a neural network to tackle the more complex predominant instrument recognition problem. Within this staged training scenario, we also examined the impact of introducing some intermediate stages during the training sessions. The results showed that such a training approach has a potential to improve the accuracy of classification. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. Joint-Modal Graph Convolutional Hashing for unsupervised cross-modal retrieval.
- Author
-
Meng, Hui, Zhang, Huaxiang, Liu, Li, Liu, Dongmei, Lu, Xu, and Guo, Xinru
- Subjects
- *
INFORMATION retrieval , *LEARNING modules , *COMPUTER programming education - Abstract
Cross-modal hashing retrieval has garnered significant attention for its exceptional retrieval efficiency and low storage consumption, especially in large-scale data retrieval. However, due to the difference in modality and semantic gap, the existing methods fail to fuse multi-modal information effectively or adjust weight adaptively, which further damages the discriminative ability of the generated hash code. In this paper, we propose an innovative approach called the Joint-Modal Graph Convolutional Hashing (JMGCH) method via adaptive weight assignment for unsupervised cross-modal retrieval. JMGCH consists of a Feature Encoding Module (FEM), a Joint-Modal Graph Convolutional Module (JMGCM), an Adaptive Weight Allocation Fusion Module (AWAFM), and a Hash Code Learning Module (HCLM). After the image and text have been encoded, we use the graph convolutional network to further explore the semantic structure. To consider both the intra-modal and inter-modal semantic relationships, JMGCM is proposed to capture the correlations of different modalities, and then fuse the features from uni-modality and cross-modality by designed AWAFM. Finally, in order to obtain the hash code with greater expressive capacity, the features of one modality are used to reconstruct the features of another one, so as to reduce the gap between different modalities. We conduct extensive experiments on three widely used cross-modal retrieval datasets, and the results demonstrate that our proposed framework achieves satisfactory retrieval performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Measuring information as an expanding resource: Information production and its TFP-information absorption ecosystem "multiplier".
- Author
-
Merva, Mary and Costagli, Simona
- Subjects
- *
INDUSTRIAL productivity , *INFORMATION retrieval , *DESIGN thinking , *INFORMATION technology , *INVESTMENT management - Abstract
Indices to measure the extent and penetration of information in an economy are static measures. Information, however, is a dynamic, expanding resource. Information is produced when data from digital computers articulate with human-generated systems where they are transformed into information. Economic information is effective when it fuels technological innovations contributing to total factor productivity (TFP) growth. This paper develops a theoretical model to measure effective economic information (EEI) as a dynamic process. Using systemic design thinking, design systems in data value chains are embedded within their TFP-information absorption ecosystem. The system's characteristics of human capital, economic, institutional, and regulatory factors determine the quality and amount of EEI produced. The EEI measurement model uses differential equations allowing for the dynamic expansion of information to include information-knowledge spillovers. Empirical tests for EEI on TFP for EU countries show that EEI is the key driver for TFP growth, while IT investment alone is only a necessary condition. EEI-type measures direct policy attention towards improving TFP-information absorption ecosystems and supporting the adaption of design processes that are more suitable for their TFP-information absorption ecosystems. Both are necessary to connect IT investment to TFP growth. • Information is a dynamic, expanding economic resource contributing to TFP growth. • A dynamic measure for information is developed using systemic design thinking. • Data are processed into information via TFP-information absorption ecosystems. • Systems expand information fueling tech innovations and spillovers for TFP growth. • Effective economic information drives TFP growth; IT investment alone does not. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Deep learning with the generative models for recommender systems: A survey.
- Author
-
Nahta, Ravi, Chauhan, Ganpat Singh, Meena, Yogesh Kumar, and Gopalani, Dinesh
- Subjects
RECOMMENDER systems ,INFORMATION retrieval ,TAXONOMY - Abstract
The variety of enormous information on the web encourages the field of recommender systems (RS) to flourish. In recent times, deep learning techniques have significantly impacted information retrieval tasks, including RS. The probabilistic and non-linear views of neural networks emerge to generative models for recommendation tasks. At present, there is an absence of extensive survey on deep generative models for RS. Therefore, this article aims at providing a coherent and comprehensive survey on recent efforts on deep generative models for RS. In particular, we provide an in-depth research effort in devising the taxonomy of deep generative models for RS, along with the summary of state-of-art methods. Lastly, we highlight the potential future prospects based on recent trends and new research avenues in this interesting and developing field. Public code links, papers, and popular datasets covered in this survey are accessible at: https://github.com/creyesp/Awesome-recsys?tab=readme-ov-file#papers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. A set of novel HTML document quality features for Web information retrieval: Including applications to learning to rank for information retrieval.
- Author
-
Aydın, Ahmet, Arslan, Ahmet, and Dinçer, Bekir Taner
- Subjects
- *
INFORMATION retrieval , *WEBSITES , *FUNCTIONAL groups , *SEARCH engines - Abstract
The past work on Information Retrieval (IR) targeting web document collections shows that incorporating a measure that measures the quality of web documents, or rather the document prior (e.g., PageRank), into an IR system improves the retrieval effectiveness. In this study, we introduce new document priors and empirically investigate their effect by employing them as features in a learning to rank (LTR) deployment. The experiments are performed on the two standard Web IR test collections: the ClueWeb09 and the ClueWeb12 datasets, which include 500 and 733 million web documents, respectively, and the associated TREC & NTCIR query sets with a total number of 1,204 queries. A strong baseline is formed by using standard features introduced in the previous works, with respect to which the effect of newly introduced features in this paper is empirically compared. We test our features by LambdaMART, which is state-of-the-art LTR technique. The results reveal that the features introduced in this work led improvement in retrieval performance on the test collections in use. The introduced features are classified into 5 groups with respect to functional properties and each group is also analyzed in detail. • Measuring the quality of a web document is a challenging task. • A web page contains various elements that indicate quality. • Document priors can be used as query-independent features in LTR deployments. • Introduced query-independent features can measure the quality of web documents. • Contribution of the new feature groups to the retrieval performance is examined. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. OLCH: Online Label Consistent Hashing for streaming cross-modal retrieval.
- Author
-
Peng, Shu-Juan, Yi, Jinhan, Liu, Xin, Cheung, Yiu-ming, Cui, Zhen, and Li, Taihao
- Subjects
- *
SUPERVISED learning , *ONLINE education , *INFORMATION retrieval , *COMPUTER programming education , *BATCH processing , *AUTOMATED storage retrieval systems - Abstract
Cross-modal hashing has received growing interest to facilitating efficient retrieval across large-scale multi-modal data, and existing methods still face three challenges: 1) Most offline learning works are unsuitable for processing and training with streaming multi-modal data. 2) Current online learning methods rarely consider the potential interdependency between the label categories. 3) Existing supervised methods often utilize pairwise label similarities or adopt relaxation scheme to learn hash codes, which, respectively, require much computation time or accumulate large quantization loss during the learning process. To alleviate these challenges, this paper presents an efficient Online Label Consistent Hashing (OLCH) approach for streaming cross-modal retrieval. The proposed approach first exploits the relative similarity of semantic labels and utilizes the multi-class classification to derive the common semantic vector. Then, an online semantic representation learning framework is adaptively designed to preserve the semantic similarity across different modalities, and a mini-batch online gradient descent approach associated with forward–backward splitting is developed to discriminatively optimize the hash functions. Accordingly, the hash codes are incrementally learned with high discriminative capability, while avoiding high computation complexity to process the streaming data. Extensive experiments highlight the superiority of the proposed approach and show its very competitive performance in comparison with the state-of-the-arts. • Present an efficient online label consistent hashing approach to benefit cross-modal retrieval for streaming data. • Develop an online semantic representation learning framework to preserve the semantic similarity. • Propose a mini-batch online gradient descent approach to incrementally optimize the hash functions. • Exploit a forward–backward splitting scheme to learn the discriminative hash codes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Forwarding strategies in vehicular named data networks: A survey.
- Author
-
Ahed, Kaoutar, Benamar, Maria, Lahcen, Ayoub Ait, and Ouazzani, Rajae El
- Subjects
VERNACULAR architecture ,INFORMATION retrieval ,VEHICULAR ad hoc networks - Abstract
The traditional Internet architecture provides applications with a stable end-to-end connection between a requester and a source node holding the information. However, in a vehicular environment, this connection is not stable due to the mobility of vehicles that results in location change, from the requester, the source, or both. To overcome these situations, Named Data Networking (NDN) has been proposed as new architecture for data retrieval and mobility support. This new paradigm enhances content access and dissemination by decoupling the content from its original location. In this paper, we introduce the forwarding in NDN-based VANETs and highlight its benefits and limitations. We propose a classification of NDN-based VANETs forwarding strategies, then detail the representative schemes. Furthermore, we provide a review and comparison of existing forwarding strategies in terms of various attributes such as transmission mode, forwarding strategies, changes of NDN architecture, application scenarios, problem addressed, evaluation metrics, simulation platform. Finally, we conclude our contribution by identifying main open research challenges that can be exploited for future works. We believe that this survey will help NDN-based VANETs researchers community to easily understand the forwarding in vehicular environment, overcome some repeated solutions and gives further inspiration to design a new protocols to improve NDN-based VANETs networks in this context for further relevant research works. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Modelling short-term variations of differential code bias aiding in extraction of ionospheric observables with sparse kernel learning.
- Author
-
Xu, Lei, Li, Zengke, Gao, Jingxiang, Yang, Xu, and She, Wenwen
- Subjects
- *
RADIAL basis functions , *ORBIT determination , *SOLAR activity , *STANDARD deviations , *INFORMATION retrieval - Abstract
As short-term variations existing in differential code bias of receivers (RDCB) will decrease the accuracies of extracting ionospheric observables by carrier-to-code leveling (CCL) method, a modified carrier-to code leveling (MCCL) method had been proposed to estimate RDCB offsets epoch by epoch in retrieval of ionospheric information. However, MCCL method is hampered by efficiency and precision at higher sampling intervals. To address RDCB offsets in higher sampling rate domain, a kernel model in the form of radial basis function (RBF) was chosen to model RDCB offsets produced by MCCL using lower sampling rate data in this paper. Owing to ill-posed problem induced by setting to many kernels, L1 norm regulation was employed to promote the sparsity of the problem and Fast iterative shrinkage thresholding algorithm (FISTA) was adopted to find the sparse solution. Data sets of 350 IGS stations evenly distributed in high-solar and low-solar activity years were chosen to illustrate the general existences of RDCB offsets and their dependence on solar activity. To validate the reliability and efficiency of the proposed method, two stations of them were selected out to construct the spares kernel model. As shown in experimental results, kernel model is of good performance in training data with over 90% sparsity rate and all standard deviations of the errors between original and fitting data are around 0.4 ns. It is a striking finding in terms of efficiency that the proposed method and CCL are much more efficient than MCCL in extracting ionospheric observables by means of fixed equipment. For further validations, generalized triangular series function (GTSF) was constructed using ionospheric observables derived from precise point positioning (PPP), CCL, MCCL and the proposed method. With Center for Orbit Determination in Europe (CODE) Global Ionospheric Map (GIM) regarded as reference, compared with VTEC estimates derived from CCL, those derived from MCCL and the proposed method have improved accuracies ranging from 10% to 20%. Overall, these results indicate that the proposed method has a promising application in retrieval of ionospheric observables in terms of accuracy and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Frequent itemset-based feature selection and Rider Moth Search Algorithm for document clustering.
- Author
-
Yarlagadda, Madhulika, Gangadhara Rao, K., and Srikrishna, A.
- Subjects
DOCUMENT clustering ,SEARCH algorithms ,FEATURE selection ,MOTHS ,INFORMATION retrieval - Abstract
Document clustering has recently been paid great attention in retrieval, navigation, and summarization of huge volumes of documents. With a better document clustering approach, computers can organize a document corpus automatically to a meaningful cluster for enabling efficient navigation, and browsing of the corpus. Document navigation and browsing is a valuable complement to the deficiencies of information retrieval technologies. This paper introduces Modsup-based frequent itemset and Rider Optimization-based Moth Search Algorithm (Rn-MSA) for clustering the documents. At first, the input documents are given to the pre-processing step, and then, the extraction is carried out based on TF-IDF and Wordnet features. Once the extraction is done, the feature selection is carried out based on frequent itemset for the establishment of feature knowledge. At last, the document clustering is done using the proposed Rn-MSA, which is designed by combining Rider Optimization Algorithm (ROA), and the Moth Search Algorithm (MSA). The performance of the document clustering based on proposed Modsup + Rn-MSA is evaluated in terms of precision, recall, F-Measure, and accuracy. The developed document clustering method achieves the maximal precision of 95.90%, maximal recall of 96.41%, maximal F-Measure of 96.41%, and the maximal accuracy of 95.12% that indicates its superiority. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
50. Label embedding semantic-guided hashing.
- Author
-
Long, Jun, Sun, Longzhi, Guo, Lin, Hua, Liujie, and Yang, Zhan
- Subjects
- *
SUPERVISED learning , *AUTOMATED storage retrieval systems , *CONSUMPTION (Economics) , *SCALABILITY , *INFORMATION retrieval , *MATHEMATICAL optimization , *WIKIS , *TASK performance - Abstract
[Display omitted] • Proposed a novel two-step label embedding semantic-guided hashing method. • Proposed a fast alternative optimization strategy to address the non-convex problem. • Evaluated the effectiveness of the proposed method via three famous datasets. Hashing technologies have been widely used for information retrieval tasks due to their efficient retrieval and storage capabilities. Generally, most of the current supervised learning only utilizes labels to construct a binary similarity matrix of instance pairs and ignores the rich semantic information contained in the labels. Indeed, the reason why supervised hashing is better than unsupervised hashing is that the labels itself has strong discriminative information. Therefore, how to effectively explore the label information is one of the ways to improve the performance of retrieval tasks. In addition, existing hashing methods have the problems of high time consumption and weak scalability when facing large-scale data. To remedy these problems, in this paper, we present a flexible two-step label embedding hashing method named L abel E mbedding S emantic- G uided H ashing (LESGH). In the first step, LESGH leverages an asymmetric discrete learning framework to learn discriminative compact hash codes only from label information, and adds the constraints of bit-balance and bit-decorrelation to boost the quality of the hash code generation. In the second step, LESGH learns the hash projection function through the generated hash codes in the first step. Moreover, an effective and fast iterative discrete optimization algorithm is presented to solve the discrete problem instead of using the relaxation-based scheme. In doing so, we can not only simplify the optimization process, but also easily scale to large-scale data. We conduct several experiments on three public datasets, i.e., WIKI, MIRFlickr and NUS-WIDE, demonstrate that LESGH can improve the retrieval performance over the compared state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.