2,234 results
Search Results
52. Impact of Channel Selection with Different Bandwidths on Retrieval at 50–60 GHz.
- Author
-
Zhang, Minjie, Ma, Gang, He, Jieying, and Zhang, Chao
- Subjects
- *
BANDWIDTHS , *METEOROLOGICAL satellites , *RADIATIVE transfer , *ENTROPY (Information theory) , *TRACE gases , *INFORMATION retrieval , *RADIATIVE transfer equation - Abstract
Microwave hyperspectral instruments represent one of the main atmospheric sounders of China's next-generation Fengyun meteorological satellites. In order to better apply microwave hyperspectral observations in the fields of atmospheric parameter retrieval and data assimilation, this paper analyzes the sensitivity of trace gases to five selected bandwidth channels using a radiative transfer model based on the simulated data of microwave hyperspectral radiances at 50–60 GHz. This method uses information entropy and a weighting function to select channels and analyze the impact of this on the retrieval accuracy of atmospheric profiles before and after channel selection. The experimental results show that channel selection can reduce the number of channels by approximately 74.05% while maintaining a large amount of information content, and this retrieval effect is significantly better than that of MWTS-III. After channel selection, the 10 MHz, 30 MHz, and 50 MHz bandwidths have the best retrieval results in the stratosphere, whole atmosphere, and troposphere, respectively. When considering the number of channels, computational scale, and retrieval results comprehensively, the channel selection method is effective. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
53. An advanced spatial co-registration of cloud properties for the atmospheric Sentinel missions: Application to TROPOMI.
- Author
-
Argyrouli, Athina, Loyola, Diego, Romahn, Fabian, Lutz, Ronny, García, Víctor Molina, Hedelt, Pascal, Heue, Klaus-Peter, and Siddans, Richard
- Subjects
- *
REFLECTANCE measurement , *ALBEDO , *METEOROLOGICAL satellites , *INFRARED imaging , *SPECTRAL imaging , *INFORMATION retrieval , *ORBITS (Astronomy) - Abstract
The retrieval of cloud parameters from the atmospheric Sentinel missions require Earth reflectance measurements from a set of spectral bands. Frequently, the ground pixel footprints of the involved spectral bands are not fully aligned and therefore, special treatment is required within the operational algorithms. This so-called inter-band spatial mis-registration of passive spectrometers is present when the Earth reflectance measurements in different spectral bands are captured by different spectrometers. The cloud retrieval algorithm requires reflectance measurements in the UV (ultraviolet)/VIS (visible) band, where the first cloud parameter (i.e., radiometric cloud fraction) is retrieved from the OCRA (Optical Cloud Recognition Algorithm) algorithm. In addition, Earth reflectances in the NIR (near-infrared) band are needed for the retrieval of two additional cloud parameters (i.e., cloud height and cloud albedo or cloud-top height and optical thickness) from the ROCINN (Retrieval of Cloud Information using Neural Networks) algorithm. In the former TROPOMI (TROPOspheric Monitoring Instrument)/S5P (Sentinel-5 Precursor) retrieval, a co-registration scheme of the derived cloud parameters from the source band to the target band based on pre-calculated mapping weights from UV/VIS to NIR, and vice versa, is applied. In this paper we present a new scheme for the co-registration of the TROPOMI cloud parameters using collocated VIIRS (Visible Infrared Imaging Radiometer Suite)/SNPP (Suomi National Polar-orbiting Partnership) information. A great benefit of the new co-registration scheme based on the VIIRS data is that it improves the overall quality of the TROPOMI cloud products and, in addition, it allows the re-construction of the cloud parameters on the first UV/VIS detector pixel, which was impossible with the former scheme based on the static mapping tables. The latter practically means that a significant number of valid data points are added to the TROPOMI cloud, total ozone, SO2 and HCHO product since November 26th 2023 (orbit 31705), when the UPAS version 2.6 with the new co-registration scheme was activated operationally. From a comparison analysis between the two techniques, we found that the largest differences mainly appear for inhomogeneous scenes. From a validation exercise of TROPOMI against VIIRS in the across-track flight direction, we found that the old co-registration scheme tends to smooth out cloud structures along the scanline, whereas such structures can be maintained with the new scheme. The need to implement a similar inter-band spatial co-registration scheme is foreseen for the Sentinel-4/MTG-S (Meteosat Third Generation - Sounder) and Sentinel-5/MetOp-SG (Meteorological Operational Satellite - Second Generation) missions. In the case of Sentinel-4 instrument, the external cloud information will originate from collocated FCI (Flexible Combined Imager) data, on board the MTG-I (Meteosat Third Generation - Imager) satellite. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
54. Audiovisual artefacts: the African politics of moving image loss.
- Author
-
Blaylock, Jennifer
- Subjects
- *
ANTIQUITIES , *MOTION picture industry , *NIGERIAN films , *INFORMATION retrieval , *IMAGE analysis - Abstract
Artefacts are human-made objects deemed culturally and historically significant. But they are also those scratches, burns and glitches that appear on audiovisual screens due to poor projection, improper storage or faulty processing. They are those unwanted additions that visualise the presence of loss. This paper explores the politics of audiovisual loss by looking at the history of the Ghana Film Industry Corporation film collection's demise alongside the 2020 films by Onyeka Igwe – a so-called archive and No Archive Can Restore You – that feature a similar collection in Nigeria. Igwe's exploratory camera floats around the Nigerian Film Corporation building documenting the filmic carnage within its vaults, reminding audiences that archival horror lies in the "colonial residue" of the archive's architecture. Artefacts of decay in Igwe's films, that mark the elimination of information from the image that restoration seeks to renew, are not the result of inaction, but acts of refusal. Film artefacts not only mark loss but are also traces of postcolonial affect. As such, I argue that archival neglect and the losses that it produces may also be acts of archival labour – an articulation of artefacting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
55. Ontology-Based Generalized Zero-Shot Learning with Generative Networks.
- Author
-
Akdemir, Emre and Barışçı, Necattin
- Subjects
- *
NATURAL language processing , *MACHINE learning , *GENERATIVE adversarial networks , *ONTOLOGY , *INFORMATION retrieval - Abstract
Zero-Shot Learning (ZSL) aims to classify images of new categories in the testing phase without labeled images during training, using examples from categories with labeled images and some auxiliary information. The auxiliary information includes semantic attributes, textual descriptions, word embeddings, etc., for both labeled and unlabeled classes, utilizing Natural Language Processing (NLP) approaches. The word embeddings created are limited by the semantic attributes and textual descriptions where the semantics of categories are insufficient. In this paper, introduces a study for Generalized Zero-Shot Learning (GZSL), a type of ZSL, by integrating the rich semantics offered by ontology. Semantic attributes used for semantic representation are supported by ontology. Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) network architectures are used together to synthesize visual features. Our work was evaluated on the AWA2 dataset, and improvement in GZSL performance was achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
56. Reinforcement learning inspired forwarding strategy for information centric networks using Q‐learning algorithm.
- Author
-
Delvadia, Krishna and Dutta, Nitul
- Subjects
- *
INFORMATION networks , *DATA packeting , *FREIGHT forwarders , *REINFORCEMENT learning , *INFORMATION retrieval , *AUTODIDACTICISM - Abstract
Summary: Content interest forwarding is a prominent research area in Information Centric Network (ICN). An efficient forwarding strategy can significantly improve data retrieval latency, origin server load, network congestion, and overhead. The state‐of‐the‐art work is either driven by flooding approach trying to minimize the adverse effect of Interest flooding or path‐driven approach trying to minimize the additional cost of maintaining routing information. These approaches are less efficient due to storm issues and excessive overhead. Proposed protocol aims to forward interest to the nearest cache without worrying about FIB construction and with significant improvement in latency and overhead. This paper presents the feasibility of integrating reinforcement learning based Q‐learning strategy for forwarding in ICN. By revising Q‐learning to address the inherent challenges, we introduce Q‐learning based interest packets and data packets forwarding mechanisms, namely, IPQ‐learning and DPQ‐learning. It aims to gain self‐learning through historical events and selects best next node to forward interest. Each node in the network acts as an agent with aim of forwarding packet to best next hop according to the Q value such that content can be fetched within fastest possible route and every action returns to be a learning process, which improves the accuracy of the Q value. The performance investigation of protocol in ndnSIM‐2.0 shows the improvement in a range of 10%–35% for metrics such as data retrieval delay, server hit rate, network overhead, network throughput, and network load. Outcomes are compared by integrating proposed protocol with state‐of‐the‐art caching protocols and also against recent forwarding mechanisms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
57. Chinese-Vietnamese cross-lingual event retrieval method based on knowledge distillation.
- Author
-
Gao, Shengxiang, He, Zhilei, Yu, Zhengtao, Zhu, Enchang, and Wu, Shaoyang
- Subjects
- *
INFORMATION retrieval , *KNOWLEDGE transfer , *PROBLEM solving - Abstract
Cross-lingual event retrieval is an information retrieval task aimed at cross-lingual event retrieval among multiple languages to find text or documents related to a specific event. Specific to Chinese-Vietnamese cross-language event retrieval, it involves using Chinese as a query to retrieve Vietnamese documents related to the query event. The critical issue is how to efficiently align query and document representations with limited resources. Existing cross-language pre-training models are trained on large-scale multilingual corpora, but their training goals do not include explicit language alignment tasks. Due to the uneven distribution of training corpora between different languages, these models have The problem of language bias. Therefore, this linguistic bias is also inherited in cross-lingual retrieval based on these models. To solve this problem, this paper proposes a Chinese-Vietnamese cross-lingual event retrieval method based on knowledge distillation. This approach enables the model to learn good query-document matching features from monolingual retrieval by transferring knowledge from high-resource to low-resource languages. By enhancing the alignment between queries and documents in different languages in a shared semantic space, the method improves the performance of Chinese-Vietnamese cross-lingual event retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
58. Beyond the Algorithm: Understanding How ChatGPT Handles Complex Library Queries.
- Author
-
Yang, Sharon Q. and Mason, Sarah
- Subjects
- *
WORLD Wide Web , *LIBRARY reference services , *T-test (Statistics) , *PLAGIARISM , *ARTIFICIAL intelligence , *STATISTICAL sampling , *QUESTIONNAIRES , *ACADEMIC libraries , *LIBRARIANS , *DESCRIPTIVE statistics , *INFORMATION services , *INFORMATION retrieval , *CONFIDENCE intervals , *ALGORITHMS , *REFERENCE interviews (Library science) - Abstract
The introduction of ChatGPT 3.5 in November 2022 ignited a sensation in the academic community, leaving many astounded by its capabilities. This new release more closely emulates human responses than its predecessors. Among its remarkable capabilities, it can answer questions, catalog items in MARC21, recommend reading lists, and make suggestions on a wide array of topics. To assess ChatGPT’s efficacy in aiding library users, the authors of this paper conducted an experiment comparing ChatGPT’s performance with that of librarians in answering reference questions. Thirty questions were randomly selected from the transaction log of the reference inquiries between June 1, 2023 to July 31, 2023 at the Rider University Libraries. These queries constituted 34% of the total user questions during this two-month period. The authors compared the answers by ChatGPT and those by reference librarians for their accuracy, relevance, and friendliness. The findings indicate that reference librarians markedly outperformed their robotic counterpart. An evident issue arises from ChatGPT’s deficiency in understanding local policies and practices. This consequently hinders its ability to provide satisfactory answers in those areas. OpenAI posits that ChatGPT’s proficiency can be enhanced through targeted fine-tuning using locally specific information. At the moment, ChatGPT remains a great tool for librarians. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
59. MEDLINE citation tool accuracy: an analysis in two platforms.
- Author
-
Scheinfeld, Laurel and Chung, Sunny
- Subjects
- *
DATABASES , *RESEARCH methodology evaluation , *ARTIFICIAL intelligence , *LIBRARIANS , *STATISTICAL sampling , *CITATION analysis , *AUTHORSHIP , *MEDLINE , *BIBLIOGRAPHICAL citations , *PUBLISHING , *INFORMATION literacy , *INFORMATION retrieval , *ADULT education workshops , *BIBLIOGRAPHY , *BIBLIOMETRICS , *RESEARCH , *ELECTRONIC publications , *ONLINE information services - Abstract
Background: Libraries provide access to databases with auto-cite features embedded into the services; however, the accuracy of these auto-cite buttons is not very high in humanities and social sciences databases. Case Presentation: This case compares two biomedical databases, Ovid MEDLINE and PubMed, to see if either is reliable enough to confidently recommend to students for use when writing papers. A total of 60 citations were assessed, 30 citations from each citation generator, based on the top 30 articles in PubMed from 2010 to 2020. Conclusions: Error rates were higher in Ovid MEDLINE than PubMed but neither database platform provided error-free references. The auto-cite tools were not reliable. Zero of the 60 citations examined were 100% correct. Librarians should continue to advise students not to rely solely upon citation generators in these biomedical databases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
60. Search Engine for Open Geospatial Consortium Web Services Improving Discoverability through Natural Language Processing-Based Processing and Ranking.
- Author
-
Ferrari, Elia, Striewski, Friedrich, Tiefenbacher, Fiona, Bereuter, Pia, Oesch, David, and Di Donato, Pasquale
- Subjects
- *
NATURAL language processing , *WEB services , *CONSORTIA , *WORLD Wide Web , *WEB search engines , *SEARCH engines , *INFORMATION retrieval - Abstract
The improvement of search engines for geospatial data on the World Wide Web has been a subject of research, particularly concerning the challenges in discovering and utilizing geospatial web services. Despite the establishment of standards by the Open Geospatial Consortium (OGC), the implementation of these services varies significantly among providers, leading to issues in dataset discoverability and usability. This paper presents a proof of concept for a search engine tailored to geospatial services in Switzerland. It addresses challenges such as scraping data from various OGC web service providers, enhancing metadata quality through Natural Language Processing, and optimizing search functionality and ranking methods. Semantic augmentation techniques are applied to enhance metadata completeness and quality, which are stored in a high-performance NoSQL database for efficient data retrieval. The results show improvements in dataset discoverability and search relevance, with NLP-extracted information contributing significantly to ranking accuracy. Overall, the GeoHarvester proof of concept demonstrates the feasibility of improving the discoverability and usability of geospatial web services through advanced search engine techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
61. Blockchain-Based Method for Spatial Retrieval and Verification of Remote Sensing Images.
- Author
-
Liu, Yujie and Chang, Yuanfei
- Subjects
- *
INFORMATION retrieval , *IMAGE transmission , *IMAGE retrieval , *LAND management , *BLOCKCHAINS - Abstract
Remote sensing image is a vital basis for land management decisions. The protection of remote sensing images has seen the application of blockchain's notarization function by many scholars. Yet, research on efficient retrieval of such images on the blockchain remains sparse. Addressing this issue, this paper introduces a blockchain-based spatial index verification method using Hyperledger Fabric. It linearizes the spatial information of remote sensing images via Geohash and integrates it with LSM trees for effective retrieval and verification. The system also incorporates IPFS as an underlying storage unit for Hyperledger Fabric, ensuring the safe storage and transmission of images. The experiments indicate that this method significantly reduces the latency in data retrieval and verification without impacting the write performance of Hyperledger Fabric, enhancing throughput and providing a solid foundation for efficient blockchain-based verification of remote sensing images in land registry systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
62. Curriculum work and hermeneutics.
- Author
-
Hodge, Steven
- Subjects
- *
CURRICULUM , *TEACHER attitudes , *HERMENEUTICS , *MODERN society , *INFORMATION retrieval - Abstract
The curriculum work of teachers is understood and conceptualised in different ways. A prevalent view is that teachers are an integral part of a system of transmission and their work with curriculum essentially a technical exercise. Some form of this view seems to be assumed by policymakers, parents and at least some teachers. However, when this same work is considered from the perspective of interpretation theory or 'hermeneutics', a contrasting picture emerges. Rather than positioning teachers as relays of explicit information, interpretation theory alerts us to the complexities and uncertainties inherent in the reading, appreciation and application of texts. On this view, meaning is not stable but transformed through interpretation, and the process itself has no definite end point. Interpretation theory thus undermines major assumptions of the technical‐transmission understanding of curriculum work. In this paper, the potential contribution of this body of theory to illuminating curriculum work is explored. Noting that hermeneutics is a vast area of study, a selection of concepts will be made that demonstrates some of these insights. The exploration will yield both critical and generative contributions to curriculum theory. The way hermeneutics undermines the transmission view of teachers' curriculum work will become clear, and at the same time, the inherent creativity of curriculum interpretation becomes an inescapable feature of teacher expertise that could be celebrated rather than neglected, denied or repressed. A hermeneutic analysis of curriculum work thus has implications for what teachers know and do and for their role in contemporary society. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
63. A discovery system for narrative query graphs: entity-interaction-aware document retrieval.
- Author
-
Kroll, Hermann, Pirklbauer, Jan, Kalo, Jan-Christoph, Kunz, Morris, Ruthmann, Johannes, and Balke, Wolf-Tilo
- Subjects
- *
INFORMATION retrieval , *INFORMATION needs , *RESEARCH personnel , *ACCESS to information , *NARRATIVES - Abstract
Finding relevant publications in the scientific domain can be quite tedious: Accessing large-scale document collections often means to formulate an initial keyword-based query followed by many refinements to retrieve a sufficiently complete, yet manageable set of documents to satisfy one's information need. Since keyword-based search limits researchers to formulating their information needs as a set of unconnected keywords, retrieval systems try to guess each user's intent. In contrast, distilling short narratives of the searchers' information needs into simple, yet precise entity-interaction graph patterns provides all information needed for a precise search. As an additional benefit, such graph patterns may also feature variable nodes to flexibly allow for different substitutions of entities taking a specified role. An evaluation over the PubMed document collection quantifies the gains in precision for our novel entity-interaction-aware search. Moreover, we perform expert interviews and a questionnaire to verify the usefulness of our system in practice. This paper extends our previous work by giving a comprehensive overview about the discovery system to realize narrative query graph retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
64. A Secure Face Verification Scheme Based on Fully Homomorphic Encryption with Anonymity.
- Author
-
Wang, Xingchen and Li, Peng
- Subjects
- *
ANONYMITY , *CLOUD computing , *UPLOADING of data , *INFORMATION retrieval - Abstract
With the widespread adoption of cloud computing, the face verification process often requires the client to upload the face to an untrusted cloud server to obtain the verification results. Privacy leakage issues may arise if the client's private information is not protected. This paper proposes a secure and anonymous face verification scheme using fully homomorphic encryption technology and SealPIR. Our scheme is a three-party solution that requires a third-party server trusted by the client. This scheme not only prevents the client's facial data from being obtained by untrusted data servers but also prevents the data server from learning the index corresponding to the face that the client wants to verify. In a single-face verification process, the client only needs to perform one upload operation and one download operation, with a communication volume of 264 KB. We can complete a privacy-protected anonymous face verification process in 84.91 ms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
65. Listwise learning to rank method combining approximate NDCG ranking indicator with Conditional Generative Adversarial Networks.
- Author
-
Li, Jinzhong, Zeng, Huan, Xiao, Cunwei, Ouyang, Chunjuan, and Liu, Hua
- Subjects
- *
GENERATIVE adversarial networks , *INFORMATION retrieval - Abstract
Some previous empirical studies have shown that the performances of the listwise learning to rank approaches are in general better than the pointwise or pairwise learning to rank techniques. The listwise learning to rank methods which directly optimize information retrieval indicators are a type of essential and popular method of learning to rank. However, the existing learning to rank approaches based on Generative Adversarial Networks (GAN) do not utilize a loss function based on information retrieval indicators to optimize the generator and/or discriminator. Thus, an approach of learning to rank that combines approximate Normalized Discounted Cumulative Gain (NDCG) ranking indicators with Conditional Generative Adversarial Networks (CGAN) is proposed in this paper, named NCGAN-LTR. The NCGAN-LTR approach constructs loss functions of the generator and discriminator based on the Plackett-Luce model and an approximate version of the NDCG ranking indicator, which is utilized to train the network parameters of CGAN. The experimental results on four benchmark datasets of learning to rank, i.e., TREC TD2004, OHSUMED, MQ2008, and MSLR-WEB10K demonstrate that our proposed NCGAN-LTR approach has superior performance across almost various ranking indicators of information retrieval compared with the IRGAN-List approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
66. Leveraging Large Language Models for Clinical Abbreviation Disambiguation.
- Author
-
Hosseini, Manda, Hosseini, Mandana, and Javidan, Reza
- Subjects
- *
PHILOSOPHY of education , *MEDICAL informatics , *INFORMATION retrieval , *ELECTRONIC health records , *ABBREVIATIONS , *LANGUAGE acquisition - Abstract
Clinical abbreviation disambiguation is a crucial task in the biomedical domain, as the accurate identification of the intended meanings or expansions of abbreviations in clinical texts is vital for medical information retrieval and analysis. Existing approaches have shown promising results, but challenges such as limited instances and ambiguous interpretations persist. In this paper, we propose an approach to address these challenges and enhance the performance of clinical abbreviation disambiguation. Our objective is to leverage the power of Large Language Models (LLMs) and employ a Generative Model (GM) to augment the dataset with contextually relevant instances, enabling more accurate disambiguation across diverse clinical contexts. We integrate the contextual understanding of LLMs, represented by BlueBERT and Transformers, with data augmentation using a Generative Model, called Biomedical Generative Pre-trained Transformer (BIOGPT), that is pretrained on an extensive corpus of biomedical literature to capture the intricacies of medical terminology and context. By providing the BIOGPT with relevant medical terms and sense information, we generate diverse instances of clinical text that accurately represent the intended meanings of abbreviations. We evaluate our approach on the widely recognized CASI dataset, carefully partitioned into training, validation, and test sets. The incorporation of data augmentation with the GM improves the model's performance, particularly for senses with limited instances, effectively addressing dataset imbalance and challenges posed by similar concepts. The results demonstrate the efficacy of our proposed method, showcasing the significance of LLMs and generative techniques in clinical abbreviation disambiguation. Our model achieves a good accuracy on the test set, outperforming previous methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
67. Leveraging Large Language Models for Clinical Abbreviation Disambiguation.
- Author
-
Hosseini, Manda, Hosseini, Mandana, and Javidan, Reza
- Subjects
- *
TERMS & phrases , *MEDICAL informatics , *SYSTEMATIZED Nomenclature of Medicine , *DATA analysis , *SYSTEMS design , *NATURAL language processing , *ELECTRONIC health records , *INFORMATION retrieval , *MATHEMATICAL models , *DEEP learning , *STATISTICS , *THEORY , *ONTOLOGIES (Information retrieval) , *SOFTWARE architecture , *RELIABILITY (Personality trait) ,RESEARCH evaluation - Abstract
Clinical abbreviation disambiguation is a crucial task in the biomedical domain, as the accurate identification of the intended meanings or expansions of abbreviations in clinical texts is vital for medical information retrieval and analysis. Existing approaches have shown promising results, but challenges such as limited instances and ambiguous interpretations persist. In this paper, we propose an approach to address these challenges and enhance the performance of clinical abbreviation disambiguation. Our objective is to leverage the power of Large Language Models (LLMs) and employ a Generative Model (GM) to augment the dataset with contextually relevant instances, enabling more accurate disambiguation across diverse clinical contexts. We integrate the contextual understanding of LLMs, represented by BlueBERT and Transformers, with data augmentation using a Generative Model, called Biomedical Generative Pre-trained Transformer (BIOGPT), that is pretrained on an extensive corpus of biomedical literature to capture the intricacies of medical terminology and context. By providing the BIOGPT with relevant medical terms and sense information, we generate diverse instances of clinical text that accurately represent the intended meanings of abbreviations. We evaluate our approach on the widely recognized CASI dataset, carefully partitioned into training, validation, and test sets. The incorporation of data augmentation with the GM improves the model's performance, particularly for senses with limited instances, effectively addressing dataset imbalance and challenges posed by similar concepts. The results demonstrate the efficacy of our proposed method, showcasing the significance of LLMs and generative techniques in clinical abbreviation disambiguation. Our model achieves a good accuracy on the test set, outperforming previous methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
68. On private information retrieval supporting range queries.
- Author
-
Hayata, Junichiro, Schuldt, Jacob C. N., Hanaoka, Goichiro, and Matsuura, Kanta
- Subjects
- *
INFORMATION retrieval , *MULTIDIMENSIONAL databases , *DATABASES , *IMAGE databases - Abstract
Private information retrieval (PIR) allows a client to retrieve data from a database without the database server learning what data are being retrieved. Although many PIR schemes have been proposed in the literature, almost all of these focus on retrieval of a single database element, and do not consider more flexible retrieval queries such as basic range queries. Furthermore, while practically-oriented database schemes aiming at providing flexible and privacy-preserving queries have been proposed, to the best of our knowledge, no formal treatment of range queries has been considered for these. In this paper, we firstly highlight that a simple extension of the standard PIR security notion to range queries is insufficient in many usage scenarios, and propose a stronger security notion aimed at addressing this. We then show a simple generic construction of a PIR scheme meeting our stronger security notion, and propose a more efficient direct construction based on function secret sharing—while the former has a round complexity logarithmic in the size of the database, the round complexity of the latter is constant. After that, we report on the practical performance of our direct construction. Finally, we extend the results to the case of multi-dimensional databases and show the construction of PIR scheme supporting multi-dimensional range queries. The communication round complexity of our scheme is O (k log n) in worst case, where n is the size of database and k is the number of elements retrieved by the query. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
69. Immediate Pose Recovery Method for Untracked Frames in Feature-Based SLAM.
- Author
-
Dou, Hexuan, Wang, Zhenhuan, Wang, Changhong, and Zhao, Xinyang
- Subjects
- *
INFORMATION retrieval , *MONOCULARS , *COMPUTER vision - Abstract
In challenging environments, feature-based visual SLAM encounters frequent failures in frame tracking, introducing unknown poses to robotic applications. This paper introduces an immediate approach for recovering untracked camera poses. Through the retrieval of key information from elapsed untracked frames, lost poses are efficiently restored with a short time consumption. Taking account of reconstructed poses and map points during local optimizing, a denser local map is constructed around ambiguous frames to enhance the further SLAM procedure. The proposed method is implemented in a SLAM system, and monocular experiments are conducted on datasets. The experimental results demonstrate that our method can reconstruct the untracked frames in nearly real time, effectively complementing missing segments of the trajectory. Concurrently, the accuracy and robustness for subsequent tracking are improved through the integration of recovered poses and map points. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
70. Cognitive Aspects in War-time Ukrainian Humorous Discourse.
- Author
-
Kharchenko, Oleg V.
- Subjects
- *
COGNITIVE ability , *UKRAINIANS , *WIT & humor , *LANGUAGE & languages , *INFORMATION retrieval - Abstract
The article analyses the functioning of the war-time Ukrainian humorous discourse in general and the cognitive mechanisms of humor, including the "Availability Heuristic" cognitive pattern in particular. The study aims to examine 12 Ukrainian war-time jokes and to reveal main cognitive patterns and accompanying stylistic figures producing the comic effect. All jokes manifest the application of the "Availability heuristic" cognitive pattern inherent to the war period when old pre-war realities are changed with new war-time realities and explained in a funny way. The study addresses some cognitive and pragmatic aspects of the war-time Ukrainian humor, focusing on the role of cognitive patterns in selecting stylistic humor devices, while processing the incoming information and shaping the cognitive frameworks of humor perception and creation. The paper reveals the main cognitive patterns, including the "Availability Heuristic," the "Distinct contrast," the "Negativity Thinking," the "Superiority or Illusionary Superiority," the "Easel," and their humorous actualization through such stylistic figures as paraprosdokian, irony, bathos, double entendre, pun, metaphor, and pastiche in its narrow meaning. The researched Ukrainian jokes are interwoven with the situational context of the dramatic events within the temporality of the speedy streaming news of the Russian-Ukrainian war. The paper explores the main pragmatic functions of Ukrainian war-time humorous discourse and makes additional remarks about a number of affirmations from some humor theories. Plain Language Summary: Cognitive patterns of Ukrainian wartime humor The article analyses cognitive, pragmatic and stylistic features of war-time Ukrainian Humorous discourse in general and the "Availability Heuristic" cognitive pattern in particular. The purpose of the research is to determine the role of this cognitive pattern in the creation of the comic effect in war-time verbal humor in Ukraine. The paper presents the results of the cognitive, pragmatic and stylistic analyses of twelve war-time Ukrainian jokes. The study determined that twelve researched jokes apply the "Availability Heuristic" cognitive pattern, which is characterized by solution search through pursuing fresh experience or information reflecting recent war events. It is accompanied by the "Distinct contrast" cognitive pattern, both of which are the manifestations of the incongruity mechanism. Besides, it is used together with the "Negativity Thinking" pattern, focusing on negative intensions, the cognitive "Easel" pattern, responsible for the dominance of visual pictures and images, and the cognitive pattern of "Superiority or Illusionary Superiority." All cognitive patterns have sets of stylistic figures standing behind it. In the researched jokes, the cognitive patterns are exhibited through paraprosdokian, irony, bathos, allusions, double entendre, puns, metaphors, pastiche in narrow meaning. The analyzed Ukrainian jokes are interwoven with the situational context of the dramatic events within a temporality of the speedy streaming news of the Russian-Ukrainian war. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
71. Conceptual data retrieval from FDB Databases.
- Author
-
Petraki, Evangelia, Kapetis, Chrysostomos, and Yannakoudakis, Emmanuel J.
- Subjects
- *
INFORMATION retrieval , *CONCEPTUAL models , *DATABASES , *SEARCH algorithms - Abstract
FDB is a set theoretical model which allows the definition of multilingual databases and thesauri through a universal schema. One or more multilingual thesauri can be defined in the FDB model while the linking of each frame object (data record in terms of a traditional database) with the underlying thesauri can be implemented automatically. FDB offers administration utilities at both data and interface level, the definition of variable length objects, authority control etc. The purpose of this paper is to present the implementation of conceptual searching in any FDB database by using the information provided by one or more multilingual thesauri that have been already defined in the FDB model. Many different parameters can define the conceptual searching process in an FDB database. In this paper we firstly present briefly the FDB model, and proceed to present a) the search algorithms that exploit the information provided by the multilingual thesauri and implement conceptual searching in any FDB database, b) all the parameters that the user can define in order to determine the different search criteria. [ABSTRACT FROM AUTHOR]
- Published
- 2023
72. A typology of research discovery tools.
- Author
-
Nishikawa-Pacher, Andreas
- Subjects
- *
SEARCH engines , *INFORMATION retrieval , *RECOMMENDER systems , *KEYWORD searching , *SYSTEMS theory , *HARBORS , *EXPERIMENTAL design - Abstract
There has been a proliferation of new research discovery tools that aid scientists in finding relevant publications. To obtain a general overview of this development, this article generates a conceptual typology of all possible research discovery tools by drawing from the information-theoretical concepts of redundancy/variety. Bibliometric links between scholarly publications can thus exhibit 'redundancy' (i.e. expectable linkages between academic works) or 'variety' (i.e. original co-occurrence patterns). On the redundancy-reproducing end of the typology are machines that harness extant co-citations or keyword queries, such as academic search engines and paper recommender systems. The variety end of the spectrum harbours services that enable categorial browsing or that suggest publications randomly, such as journals' tables of contents or random paper bots. The typology has implications for understanding how the design of research discovery platforms may ultimately shape aggregate citational networks of science. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
73. Steps toward preregistration of research on research integrity.
- Author
-
Sijtsma, Klaas, Emons, Wilco H. M., Steneck, Nicholas H., and Bouter, Lex M.
- Subjects
- *
INFERENTIAL statistics , *INFORMATION retrieval , *DATA analysis - Abstract
Background: A proposal to encourage the preregistration of research on research integrity was developed and adopted as the Amsterdam Agenda at the 5th World Conference on Research Integrity (Amsterdam, 2017). This paper reports on the degree to which abstracts of the 6th World Conference in Research Integrity (Hong Kong, 2019) reported on preregistered research. Methods: Conference registration data on participants presenting a paper or a poster at 6th WCRI were made available to the research team. Because the data set was too small for inferential statistics this report is limited to a basic description of results and some recommendations that should be considered when taking further steps to improve preregistration. Results: 19% of the 308 presenters preregistered their research. Of the 56 usable cases, less than half provided information on the six key elements of the Amsterdam Agenda. Others provided information that invalidated their data, such as an uninformative URL. There was no discernable difference between qualitative and quantitative research. Conclusions: Some presenters at the WCRI have preregistered their research on research integrity, but further steps are needed to increase frequency and completeness of preregistration. One approach to increase preregistration would be to make it a requirement for research presented at the World Conferences on Research Integrity. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
74. Construction and application of barrel finishing underlying database platform.
- Author
-
Gao, Wei, Yang, Shengqiang, Tian, Jianyan, Banerjee, Amit, and Yan, Fei
- Subjects
- *
ELECTRONIC records , *CASE-based reasoning , *ELECTRONIC paper , *DATABASES , *INFORMATION retrieval , *DATABASE design , *DATA structures - Abstract
The present methods of data preservation and representation for barrel finishing processes which include paper and electronic documents have several disadvantages such as restrictions in size and complexity, and limitations on query and updation speed. Aiming at these disadvantages, a new database platform for barrel finishing data has been constructed by using database technology and case-based reasoning. The design procedure of the database platform is expounded in detail, covering analysis of database platform requirements, establishment for conceptual model of database data structure, designs for logical model of database data structure, determination for physical model of database data structure, choice for network structure of database platform, data management and storage method. The application results demonstrate that the database platform can ensure the safe and convenient storage as well as the sharing of experimental data of the barrel finishing process. It can also provide guidance and technical information for scientific researchers, experts, technicians, and production site operators to choose the processing technology and the processing parameters reasonably. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
75. Ontology for the User-Learner Profile Personalizes the Search Analysis of Online Learning Resources: The Case of Thematic Digital Universities.
- Author
-
Kordahi, Marilou
- Subjects
- *
ONLINE education , *DIGITAL libraries , *RESEARCH , *USER interfaces , *RESEARCH methodology , *INTERVIEWING , *QUALITATIVE research , *INFORMATION resources , *QUESTIONNAIRES , *INFORMATION retrieval , *UNIVERSITIES & colleges , *ONTOLOGIES (Information retrieval) - Abstract
We hope to contribute to the field of research in information technology and digital libraries by analyzing the connections between Thematic Digital Universities and digital user-learner profiles. Thematic Digital Universities are similar to digital libraries, and focus on creating and indexing open educational resources, as well as improving learning in the information age. The digital user profile relates to the digital representation of a person's identity and characteristics. In this paper we present the design of an ontology for the digital User-Learner Profile (OntoULP) and its application program. OntoULP is used to structure a user-learner's digital profile. The application provides each user-learner with tailor-made analyses based on informational behaviors, needs, and preferences. We rely on an exploratory research approach and on methods of ontologies, user modeling, and semantic matching to design the OntoULP and its application program. Any user-learner could use the OntoULP and its application program. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
76. A fast retrieval method of drug information based on multidimensional data analysis.
- Author
-
Yu, Chenggong
- Subjects
- *
DATA analysis , *HOSPITAL administration , *HOSPITALS , *INFORMATION retrieval , *INFORMATION resources management , *FAST reactors , *DRUGSTORES - Abstract
The medical industry is constantly improving its own structure with the development of society. However, most of the current drug management systems cannot meet the needs of actual drug management. There are many problems such as incomplete system functions, confusion of drug management, unclear division of modules, loss and waste of human resources. At present, hospitals need a new and perfect hospital drug information management. Drug management is an indispensable part of the hospital management system. This paper completes the design of the target drug rapid retrieval system, which is realized multidimensional data analysis technology, and tests the multidimensional data analysis algorithm model used. The improved multidimensional data analysis algorithm greatly improves the accuracy. The design and improvement of the system can effectively improve the drug processing efficiency of existing pharmacies through the design and simulation experiments, enable the pharmacy department to better cooperate with other departments to work, make the cooperation between different departments more effective, and solve the work efficiency problems of the hospital. By introducing multidimensional data analysis technology into the field of drug information retrieval, this paper designs an effective and fast retrieval method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
77. Generating keyphrases for readers: A controllable keyphrase generation framework.
- Author
-
Jiang, Yi, Meng, Rui, Huang, Yong, Lu, Wei, and Liu, Jiawei
- Subjects
- *
SEMANTICS , *NATURAL language processing , *TASK performance , *CONCEPTUAL structures , *INFORMATION retrieval , *ACCESS to information , *INFORMATION science , *DESCRIPTIVE statistics , *RESEARCH funding , *ABSTRACTING & indexing services , *READING , *BLOGS , *INFORMATION technology - Abstract
With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end‐to‐end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro‐avgs of P@5, R@5, and F1@5 on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
78. Factors Influencing Use of Web for Academic Pursuit in Relation to User Categories.
- Author
-
Datta, Swati and Kumar, Shiv
- Subjects
- *
INFORMATION retrieval , *INTERNET searching , *WEB databases , *UNIVERSITY faculty , *ACADEMIC libraries , *ELECTRONIC journals - Abstract
This paper examines the factors accountable for users' tendency to use the free web resources for achieving their academic goals and also the reasons that prevented the users from using the credible sources of the library for their academic needs. Further, this study also attempted to ascertain whether user category has any influence on the factors which boost the use of web for academic purpose. A questionnaire- based survey was conducted for five universities of Chandigarh, Haryana and Punjab including post graduate students, research scholars and faculty members. The findings showed that a majority of respondents from three categories chose the web because of the speedy retrieval of information, ease of access, no need for any advanced search skills to search on internet and further it saved their time when compared with the online e-resources subscribed by the library. Most of them found it difficult to physically visit the library. They were not familiar with which e-resources were available on subscribed databases or on the web. About half of users stated that no updation was made on the library website about newly subscribed e-resources. Discontinued e-resources were not removed from library website when users searched them for information and found nothing. The paper recommends that user education programmes should be conducted and the interface of the library should be made user friendly so as to encourage the users to use authentic information from the e-resources subscribed by the library for academic purpose. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
79. Developing "Energy Access" Ontology using Protégé Tool.
- Author
-
Sharma, Reeta and Kanjilal, Uma
- Subjects
- *
ONTOLOGIES (Information retrieval) , *INFORMATION storage & retrieval systems , *WORLD Wide Web , *SEMANTIC Web , *ONTOLOGY , *KNOWLEDGE representation (Information theory) - Abstract
In this internet-based world where a large amount of information exists on the World Wide Web (WWW), the information retrieval systems are mainly centred on topic-based classification. Numerous efforts and time have been consumed for relevant information extraction through major search engines if the data is not organised correctly. Ontologies proved to be an effective technique for representing and retrieving information, which is the key idea in semantic web applications. Ontologies not only help in efficient knowledge representation and information retrieval, but it also helps in mapping the hidden knowledge about a subject. This paper discusses the process and method of building ontology on the "Energy access" domain. The methodology is based on the tools used in developing the ontology. Several tools are used to create an ontology. Protégé is a most popular tool for ontology editing and for developing ontology.1 In this paper, various aspects like super class and subclass hierarchy, creating a subclass, instances for class illustration, query retrieval process, visualisation, and graph views have been demonstrated by using protégé software. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
80. Improved Object Detection with Content and Position Separation in Transformer.
- Author
-
Wang, Yao and Ha, Jong-Eun
- Subjects
- *
TRANSFORMER models , *ATTENTION span , *INFORMATION retrieval - Abstract
In object detection, Transformer-based models such as DETR have exhibited state-of-the-art performance, capitalizing on the attention mechanism to handle spatial relations and feature dependencies. One inherent challenge these models face is the intertwined handling of content and positional data within their attention spans, potentially blurring the specificity of the information retrieval process. We consider object detection as a comprehensive task, and simultaneously merging content and positional information like before can exacerbate task complexity. This paper presents the Multi-Task Fusion Detector (MTFD), a novel architecture that innovatively dissects the detection process into distinct tasks, addressing content and position through separate decoders. By utilizing assumed fake queries, the MTFD framework enables each decoder to operate under a presumption of known ancillary information, ensuring more specific and enriched interactions with the feature map. Experimental results affirm that this methodical separation followed by a deliberate fusion not only simplifies the task difficulty of the detection process but also augments accuracy and clarifies the details of each component, providing a fresh perspective on object detection in Transformer-based architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
81. Measuring performance of metasearch engines to access information: an exploratory study based on precision metrics.
- Author
-
Bhardwaj, Raj Kumar, Kumar, Ritesh, and Nazim, Mohammad
- Subjects
- *
HEALTH information services , *WORLD Wide Web , *LIBRARIES , *INFORMATION services , *SEARCH engines , *INFORMATION retrieval , *RESEARCH , *ELECTRONIC publications , *ACCESS to information , *MEDICINE information services , *USER interfaces - Abstract
Purpose: This paper evaluates the precision of four metasearch engines (MSEs) – DuckDuckGo, Dogpile, Metacrawler and Startpage, to determine which metasearch engine exhibits the highest level of precision and to identify the metasearch engine that is most likely to return the most relevant search results. Design/methodology/approach: The research is divided into two parts: the first phase involves four queries categorized into two segments (4-Q-2-S), while the second phase includes six queries divided into three segments (6-Q-3-S). These queries vary in complexity, falling into three types: simple, phrase and complex. The precision, average precision and the presence of duplicates across all the evaluated metasearch engines are determined. Findings: The study clearly demonstrated that Startpage returned the most relevant results and achieved the highest precision (0.98) among the four MSEs. Conversely, DuckDuckGo exhibited consistent performance across both phases of the study. Research limitations/implications: The study only evaluated four metasearch engines, which may not be representative of all available metasearch engines. Additionally, a limited number of queries were used, which may not be sufficient to generalize the findings to all types of queries. Practical implications: The findings of this study can be valuable for accreditation agencies in managing duplicates, improving their search capabilities and obtaining more relevant and precise results. These findings can also assist users in selecting the best metasearch engine based on precision rather than interface. Originality/value: The study is the first of its kind which evaluates the four metasearch engines. No similar study has been conducted in the past to measure the performance of metasearch engines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
82. Reflection on future directions: a systematic review of reported limitations and solutions in interactive information retrieval user studies.
- Author
-
Jiang, Tianji and Liu, Jiqun
- Subjects
- *
INFORMATION retrieval , *RESEARCH questions , *RESEARCH personnel , *INFORMATION science , *BIBLIOGRAPHY - Abstract
Purpose: The purpose of this paper is to understand how users behave and evaluate how systems with users are essential for interactive information retrieval (IIR) research. User study methodology serves as a primary approach to answering IIR research questions. In addition to designing user study procedures, understanding the limitations of varying study designs and discussing solutions to the limitations is also critical for improving the methods and advancing the knowledge in IIR. Design/methodology/approach: Given this unresolved gap, we apply the faceted framework developed by Liu and Shah (2019) in systematically reviewing 131 IIR user studies recently published (2016–2021) on multiple IR and information science venues. Findings: Our study achieve three goals: (1) extracting and synthesizing the reported limitations on multiple aspects of user study (e.g. recruitment, tasks, study procedures, system interfaces, data analysis methods) under associated facets; (2) summarizing the reported solutions to the limitations; (3) clarifying the connections between types of limitations and types of solutions. Practical implications: The bibliography of user studies can be used by students and junior researchers who are new to user-centered IR studies as references for study design. Our results can facilitate the reflection and improvement on IR research methodology and serve as a checklist for evaluating customized IIR user studies in varying problem spaces. Originality/value: To our knowledge, this work is the first study that systematically reviews the study limitations and solutions reported by IIR researchers in articles and empirically examines their connections to different study components. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
83. Building knowledge graphs from technical documents using named entity recognition and edge weight updating neural network with triplet loss for entity normalization.
- Author
-
Jeon, Sung Hwan, Lee, Hye Jin, Park, Jihye, and Cho, Sungzoon
- Subjects
- *
KNOWLEDGE graphs , *PATENT offices , *TEXT mining , *MACHINE learning , *INFORMATION retrieval - Abstract
Attempts to express information from various documents in graph form are rapidly increasing. The speed and volume in which these documents are being generated call for an automated process, based on machine learning techniques, for cost-effective and timely analysis. Past studies responded to such needs by building knowledge graphs or technology trees from the bibliographic information of documents, or by relying on text mining techniques in order to extract keywords and/or phrases. While these approaches provide an intuitive glance into the technological hotspots or the key features of the select field, there still is room for improvement, especially in terms of recognizing the same entities appearing in different forms so as to interconnect closely related technological concepts properly. In this paper, we propose to build a patent knowledge network using the United States Patent and Trademark Office (USPTO) patent filings for the semiconductor device sector by fine-tuning Huggingface's named entity recognition (NER) model with our novel edge weight updating neural network. For the named entity normalization, we employ edge weight updating neural network with positive and negative candidates that are chosen by substring matching techniques. Experiment results show that our proposed approach performs very competitively against the conventional keyword extraction models frequently employed in patent analysis, especially for the named entity normalization (NEN) and document retrieval tasks. By grouping entities with named entity normalization model, the resulting knowledge graph achieves higher scores in retrieval tasks. We also show that our model is robust to the out-of-vocabulary problem by employing the fine-tuned BERT NER model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
84. Improving entity linking by combining semantic entity embeddings and cross-attention encoder.
- Author
-
Li, Shi and Zhang, Yongkang
- Subjects
- *
KNOWLEDGE graphs , *CONTEXTUAL learning , *INFORMATION retrieval - Abstract
Entity linking is an important task for information retrieval and knowledge graph construction. Most existing methods use a bi-encoder structure to encode mentions and entities in the same space, and learn contextual features for entity linking. However, this type of system still faces some problems: (1) the entity embedding part of the model only learns from the local context of the target entity, which is too unique for entity linking model to learn the context commonality of information; (2) the entity disambiguation part only uses similarity calculation once to determine the target entity, resulting in insufficient interaction between the mentions and candidate entities, and ineffective recall of real entities. We propose a new entity linking model based on graph neural network. Different from other bi-encoder retrieval systems, this paper introduces a fine-grained semantic enhancement information into the entity embedding part of the bi-encoder to reduce the specificity of the model. Then, the cross-attention encoder is used to re-rank the target mention and each candidate entity after the entity retrieval model. Experimental results show that although the model is not optimal in inference speed, it outperforms all baseline methods on the AIDA-CoNLL dataset, and has good generalization effects on four datasets in different fields such as MSNBC and ACE2004. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
85. Policy punctuations and agenda diversity in China: a national level analysis from 1980 to 2019.
- Author
-
Qin, Xiaolei and Huang, Jing
- Subjects
- *
PUNCTUATED equilibrium (Social science) , *DEMOCRACY , *POLITICAL science , *INFORMATION retrieval - Abstract
Based on data sources systematically tracking government activity such as budgets and bill hearings, the Punctuated Equilibrium Theory literature has demonstrated that policy processes in both democracies and nondemocracies feature long periods of stasis interrupted by dramatic changes. However, there is a lack of research that systematically examines China's policy process. In response, this article introduces a new dataset drawn from China State Council Gazettes from 1980 to 2019 to measure policy punctuations and agenda diversity in China. We find that punctuations in China's policy process are more intense than those in democracies. The findings further show that China's policy process features more positive punctuations than negative punctuations. We also find an overall increasing trend of agenda diversity and a pattern of alternation between agenda expansion and concentration across the forty years analyzed in this paper. These findings provide new long-term evidence regarding patterns of policy stability and change in the Chinese context and contribute to our understanding of China's politics of attention and its linkage with information inefficiency and survival politics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
86. Forum: The Limitations of Large Language Models and Emerging Correctives to Support Social Work Scholarship: Selecting the Right Tool for the Task.
- Author
-
Victor, Bryan G., Perron, Brian E., and Goldkind, Lauri
- Subjects
- *
LANGUAGE models , *SOCIAL work education , *SCHOLARSHIPS , *DIGITAL literacy , *CHATGPT , *INFORMATION retrieval - Abstract
The emergence of large language models (LLMs) like ChatGPT, Gemini, and Claude offers significant potential for the social work profession. However, these LLMs are not without their ethical and practical challenges, particularly concerning the accuracy of the information provided by these models. This commentary explores the importance of developing digital literacy among social work professionals to effectively navigate the capabilities and limitations of LLMs. Through an understanding that LLMs are designed to generate human-like text outputs rather than serve as tools for information retrieval, users can align their expectations and uses of these models accordingly. The paper highlights a specific instance where ChatGPT produced inaccurate scholarly references as a clear example of a model output with factually incorrect information, an occurrence often referred to as a hallucination. The authors then describe recent technology advancements such as the integration of Internet search capability with LLMs and an approach known as retrieval-augmented generation that can enhance the ability of LLMs to provide users with more accurate and relevant information. The commentary ends with a call for concerted efforts to equip social work students, practitioners, educators, and scholars with the skills needed to use emerging AI technologies ethically and effectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
87. Evaluating the Criteria for Selection of Web Resources for Academic Pursuit.
- Author
-
Datta, Swati and Kumar, Shiv
- Subjects
- *
WORLD Wide Web , *BEHAVIORAL objectives (Education) , *ACADEMIC libraries , *TEACHING , *LEARNING , *CHI-squared test , *DESCRIPTIVE statistics , *INFORMATION retrieval , *ACADEMIC achievement , *LIBRARY public services , *RESEARCH , *DATA quality , *AUTHORITY ,RESEARCH evaluation - Abstract
This paper focuses on finding the criteria adopted by users to select free information retrieved from the Web for academic use. A close-ended questionnaire was formulated to record the opinions of the respondents. A survey for various categories of users such as post graduate students, research scholars and faculty members from five universities of Chandigarh, Haryana, and Punjab was administered. The category-wise and discipline-wise analysis depicted that quite a good number of respondents applied various parameters while referring to Web resources, but a reasonable number of users did not apply certain parameters to verify the nature of free information being used for educational purposes. The convenience factor leads them to depend on free Web resources which can be accessed anywhere and saves time. The findings of the study suggest that user education programs should be conducted to create awareness regarding the credibility of the subscribed library resources and their effectiveness in enhancing the quality of teaching, learning, and research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
88. Exploring ChatGPT for next-generation information retrieval: Opportunities and challenges.
- Author
-
Huang, Yizheng and Huang, Jimmy X.
- Subjects
- *
CHATGPT , *GENERATIVE artificial intelligence , *SUPERVISED learning , *INFORMATION retrieval , *LANGUAGE models , *ARTIFICIAL intelligence - Abstract
The rapid advancement of artificial intelligence (AI) has spotlighted ChatGPT as a key technology in the realm of information retrieval (IR). Unlike its predecessors, it offers notable advantages that have captured the interest of both industry and academia. While some consider ChatGPT to be a revolutionary innovation, others believe its success stems from smart product and market strategy integration. The advent of ChatGPT and GPT-4 has ushered in a new era of Generative AI, producing content that diverges from training examples, and surpassing the capabilities of OpenAI's previous GPT-3 model. In contrast to the established supervised learning approach in IR tasks, ChatGPT challenges traditional paradigms, introducing fresh challenges and opportunities in text quality assurance, model bias, and efficiency. This paper aims to explore the influence of ChatGPT on IR tasks, providing insights into its potential future trajectory. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
89. Determining the trend of geometrical changes of a hydrotechnical object based on data in the form of LiDAR point clouds.
- Author
-
Kowalska, Maria, Zaczek-Peplinska, Janina, and Piasta, Łukasz
- Subjects
- *
LIDAR , *POINT cloud , *TECHNOLOGICAL innovations , *INFORMATION retrieval , *DATA analysis - Abstract
Monitoring the technical condition of hydrotechnical facilities is crucial for ensuring their safe usage. This process typically involves tracking environmental variables (e.g., concrete damming levels, temperatures, piezometer readings) as well as geometric and physical variables (deformation, cracking, filtration, pore pressure, etc.), whose long-term trends provide valuable information for facility managers. Research on the methods of analyzing geodetic monitoring data (manual and automatic) and sensor data is vital for assessing the technical condition and safety of facilities, particularly when utilizing new measurement technologies. Emerging technologies for obtaining data on the changes in the surface of objects employ laser scanning techniques (such as LiDAR, Light Detection, and Ranging) from various heights: terrestrial, unmanned aerial vehicles (UAVs, drones), and satellites using sensors that record geospatial and multispectral data. This article introduces an algorithm to determine geometric change trends using terrestrial laser scanning data for both concrete and earth surfaces. In the consecutive steps of the algorithm, normal vectors were utilized to analyze changes, calculate local surface deflection angles, and determine object alterations. These normal vectors were derived by fitting local planes to the point cloud using the least squares method. In most applications, surface strain and deformation analyses based on laser scanning point clouds primarily involve direct comparisons using the Cloud to Cloud (C2C) method, resulting in complex, difficult-to-interpret deformation maps. In contrast, preliminary trend analysis using local normal vectors allows for rapid threat detection. This approach significantly reduces calculations, with detailed point cloud interpretation commencing only after detecting a change on the object indicated by normal vectors in the form of an increasing deflection trend. Referred to as the cluster algorithm by the authors of this paper, this method can be applied to monitor both concrete and earth objects, with examples of analyses for different object types presented in the article. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
90. Get Spatial from Non-Spatial Information: Inferring Spatial Information from Textual Descriptions by Conceptual Spaces.
- Author
-
Abbasi, Omid Reza, Alesheikh, Ali Asghar, and Razavi-Termeh, Seyed Vahid
- Subjects
- *
UNCERTAINTY (Information theory) , *INFORMATION retrieval , *SOCIAL media , *RECOMMENDER systems - Abstract
With the rapid growth of social media, textual content is increasingly growing. Unstructured texts are a rich source of latent spatial information. Extracting such information is useful in query processing, geographical information retrieval (GIR), and recommender systems. In this paper, we propose a novel approach to infer spatial information from salient features of non-spatial nature in text corpora. We propose two methods, namely DCS and RCS, to represent place-based concepts. In addition, two measures, namely the Shannon entropy and the Moran's I, are proposed to calculate the degree of geo-indicativeness of terms in texts. The methodology is compared with a Latent Dirichlet Allocation (LDA) approach to estimate the accuracy improvement. We evaluated the methods on a dataset of rental property advertisements in Iran and a dataset of Persian Wikipedia articles. The results show that our proposed approach enhances the relative accuracy of predictions by about 10% in case of the renting advertisements and by 13% in case of the Wikipedia articles. The average distance error is about 13.3 km for the advertisements and 10.3 km for the Wikipedia articles, making the method suitable to infer the general region of the city in which a property is located. The proposed methodology is promising for inferring spatial knowledge from textual content that lacks spatial terms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
91. EVALUATION OF SAMPLE SIZE AND EFFICIENT FIELD SAMPLING PLAN IN HDP APPLE ORCHARDS.
- Author
-
Mushtaq, Tabasum, Lone, Mushtaq A., Mir, S. A., Powar, Sonali Kedar, Rather, Aafaq A., Khan, Adil H., and Danish, Faizan
- Subjects
- *
SAMPLE size (Statistics) , *INFORMATION retrieval , *APPLE orchards - Abstract
An essential stage in research is choosing an adequate sample size and sampling strategy. In order to obtain the most accurate estimates possible when surveying high density apple orchards, this paper provides the proper procedure for selecting the sample and an effective sampling strategy. For this study, primary information gathered during a two-year period from the SKUAST-Kashmir exotic apple block Plate I was employed. This investigation was conducted using the TCSA of exotic apple trees of the Gala and Fuji types. The sample was obtained using a variety of sampling techniques in order to find the parameters of population. Findings revealed that using proportional allocation of a stratified sample technique, in both the varieties, produces the most efficient population parameter estimates. [ABSTRACT FROM AUTHOR]
- Published
- 2023
92. REDESCENDING M-ESTIMATOR BASED LASSO FOR FEATURE SELECTION.
- Author
-
KRISHNAN, R. MUTHU and JAMES, C. K.
- Subjects
- *
FEATURE selection , *INFORMATION retrieval - Abstract
Aim: Regression analysis is one of the statistical methods which helps to model the data and helps in prediction, a large data set with higher number of variables will often create problem due to its dimensionality and hence create difficulties to gather important information from the data, so it is a need of a method which can simultaneously choose important variables which contains most of the information and hence helps to fit the model. Least absolute shrinkage and selection operator (LASSO) is a popular choice for shrinkage estimation and variable selection. But LASSO uses the conventional least squares technique for feature selection which is very sensitive to outliers. As a result, when the data set is contaminated with bad observations (Outliers), the LASSO technique gives unreliable results, so in this paper the focus is to create a method which can resist to outliers in the data and helps in giving a meaningful result. Method: proposed a new procedure, a LASSO method by adding weights which uses the concept of redescending M-estimator, which can resist outliers in both dependent and independent variables. The observation with greater importance receives a higher weight and less weight to the least important observation. Findings: The efficiency of the proposed method has been studied in the real and simulation environment and compared with other existing procedures with measures like Median Absolute Error (MDAE), False Positive Rate (FPR), False Negative Rate (FNR), Mean Absolute Percentage Error (MAPE). The proposed method with the redescending M-estimator shows a higher resistance to outliers compared to conventional LASSO and other robust existing procedures. Conclusion: The study reveals that the proposed method outperforms other existing procedures in terms of MDAE, FPR, FNR and MAPE, indicating its superior performance in variables selection under outlier contaminated datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
93. Privacy, power, and relationship: ethics and the home-school partnership.
- Author
-
Hart, Peter and Bracey, Elena
- Subjects
- *
ETHICS , *HOME schooling , *SECONDARY school students , *PARENT attitudes , *INFORMATION retrieval - Abstract
Research on the ethics of the home-school partnerships in secondary education is scarce. This paper uses data from three case studies to argue: students have a right to privacy which home-school partnerships can circumvent, parents can be used as a resource to leverage compliance from students which undermines young people's privacy, and developing trusting relationships between parents and teachers is complex when considering the power differentials within that relationship. This article concludes that specific areas around privacy that require greater consideration include: the use of parents to leverage behavioural change in students, the sharing of information students may legitimately believe is intimate without consent, and seeking a change in values within the home. We also consider the areas of resistance students have displayed towards an encroachment on their private spheres. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
94. Optical character recognition quality affects subjective user perception of historical newspaper clippings.
- Author
-
Kettunen, Kimmo, Keskustalo, Heikki, Kumpulainen, Sanna, Pääkkönen, Tuula, and Rautiainen, Juha
- Subjects
- *
OPTICAL character recognition , *INFORMATION retrieval , *NEWSPAPERS - Abstract
Purpose: This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper. Design/methodology/approach: This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles. Findings: The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts. Originality/value: To the best of the authors' knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
95. Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach.
- Author
-
Al-Khatib, Ra’ed M., Zerrouki, Taha, Abu Shquier, Mohammed M., and Balla, Amar
- Abstract
Stemming algorithms are crucial tools for enhancing the information retrieval process in natural language processing. This paper presents a novel Arabic light stemming algorithm called Tashaphyne0.4, the idea behind this algorithm is to extract the most precise ‘roots’, and ‘stems’ from words of an Arabic text. Thus, the proposed algorithm acts as rooter, stemmer, and segmentation tools at the same time. Our approach involves tri-fold phases (i.e., Preparation, Stems-Extractor, and Root-Extractor). Tashaphyne0.4 has shown better results than six other stemmers (i.e., Khoja, ISRI, Motaz/Light10, Tashaphyne0.3, FARASA, and Assem stemmers). The comparison is performed using four different Arabic comprehensive-benchmarks datasets. In conclusion, our proposed stemmer achieved remarkable results and outperformed other competitive stemmers in extracting ‘Roots’ and ‘Stems’. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
96. Temporal information retrieval using bitwise operators.
- Author
-
Koirala, Prasanna, Aygun, Ramazan, Mukherjee, Tathagata, and Chung, Haeyong
- Subjects
- *
INFORMATION retrieval , *DATA libraries , *PROGRAMMING languages , *DATA structures - Abstract
The plethora of available and stored temporal data necessitated the development of effective algorithms for information retrieval. The previous research on temporal information retrieval predominantly focused on the correctness of the retrieval results and supported wider types of temporal operators for retrieval. Many of these algorithmic approaches are based on high-level data structures and libraries supported by high-level programming languages, thus limiting the running time performance of these approaches. In this paper, we develop querying and information retrieval for temporal queries based on Allen's interval algebra that provides a calculus for temporal reasoning by defining thirteen basic relations between two intervals. To increase the retrieval performance, we propose using bitmaps and bitwise operations to identify all of Allen's thirteen relations between any two events across the entirety of the data where events are represented as bitmaps. The indexes in the bitmap represent various time instances in the data, and the values 1 and 0 correspond to the presence and absence of an event. Using bitwise operators such as AND, OR, and bit-shifts, in our compressed representation of the events, we establish expressions for each of Allen's relations. Our experiments show that, for two events with roughly 5 × 10 6 intervals in each, the bitwise operation-based methods are almost 42 times faster than conventional interval-based linear lookups and almost 21 times faster than conventional pattern-finding parallel techniques inherently available. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
97. Investigating better context representations for generative question answering.
- Author
-
Francis, Sumam and Moens, Marie-Francine
- Abstract
Generating natural language answers for question-answering (QA) tasks has recently surged in popularity with the rise of task-based personalized assistants. Most QA research is on extractive QA, methods that find answer spans in text passages. However, the extracted answers are often incomplete and sound unnatural in a conversational context. In contrast, generative QA systems aim to generate well-formed natural language answers. For this type of QA, the answer generation method and context play crucial roles in the model performance. A challenge of generative QA is simultaneously incorporating all facts in the context necessary to answer the question and discarding irrelevant information. In this paper, we investigate efficient ways to utilize the context and to generate better contextual answers. We present a framework for generative QA that effectively selects relevant parts from context documents by eliminating extraneous information. We first present multiple strong generative baselines that use transformer-based encoder-decoder architectures to synthesize answers. These models perform equal to or better than the current state-of-the-art generative models. We next investigate the selection of relevant information from context. The context selector component can be a summarizer, reranker, evidence extractor or a combination of these. Finally, we effectively use this filtered context information to provide the most pertinent cues to the generative model to synthesize factually correct natural language answers. This significantly boosts the model’s performance. The setting with the reranked context together with evidence gives the best performance. We also study the impact of different training strategies on the answer generation capability. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
98. Bengali document retrieval using a language modeling approach enhanced by improved cluster-based smoothing.
- Author
-
Chatterjee, Soma and Sarkar, Kamal
- Subjects
- *
LANGUAGE models , *INFORMATION retrieval , *BENGALI language , *ORAL communication , *ALGORITHMS - Abstract
Zero frequency is a fundamental problem in information retrieval using language models and smoothing is applied to deal with this problem. The cluster-based smoothing method is found to be effective for information retrieval using language models. Since the effectiveness of cluster-based smoothing depends on clustering quality, there is scope for improvement by enhancing the clustering algorithm. In this paper, we present a study on how to improve cluster-based smoothing using a histogram-based incremental clustering algorithm and word embeddings. To our knowledge, this is the first study on the cluster-based smoothing method which is integrated with a language model for developing an effective IR system for the Bengali language which is one of the most spoken Indian languages. The proposed method has been tested on two benchmark Bengali IR datasets. The experimental results show that our proposed model for Bengali document retrieval is effective and it outperforms several baseline IR models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
99. Consistency, Extent, and Validation of the Utilization of the MARC 21 Bibliographic Standard in the College Libraries of Assam in India.
- Author
-
Boruah, Bidyut Bikash, Ravikumar, S., and Lamin Gayang, Fullstar
- Subjects
- *
ACADEMIC libraries , *BIBLIOGRAPHY , *INFORMATION retrieval , *LIBRARY catalogs - Abstract
This paper brings light to the existing practice of cataloging in the college libraries of Assam in terms of utilizing the MARC 21 standard and its structure, i.e., the tags, subfield codes, and indicators. Catalog records from six college libraries are collected and a survey is conducted to understand the local users' information requirements for the catalog. Places, where libraries have scope to improve and which divisions of tags could be more helpful for them in information retrieval, are identified and suggested. This study fulfilled the need for local-level assessment of the catalogs. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
100. A mapping exercise using automated techniques to develop a search strategy to identify systematic review tools.
- Author
-
Sutton, Anthea, O'Keefe, Hannah, Johnson, Eugenie Evelynne, and Marshall, Christopher
- Subjects
- *
BOOLEAN searching , *MEDLINE , *SOFTWARE development tools , *INFORMATION retrieval - Abstract
The Systematic Review Toolbox aims provide a web‐based catalogue of tools that support various tasks within the systematic review and wider evidence synthesis process. Identifying publications surrounding specific systematic review tools is currently challenging, leading to a high screening burden for few eligible records. We aimed to develop a search strategy that could be regularly and automatically run to identify eligible records for the SR Toolbox, thus reducing time on task and burden for those involved. We undertook a mapping exercise to identify the PubMed IDs of papers indexed within the SR Toolbox. We then used the Yale MeSH Analyser and Visualisation of Similarities (VOS) Viewer text‐mining software to identify the most commonly used MeSH terms and text words within the eligible records. These MeSH terms and text words were combined using Boolean Operators into a search strategy for Ovid MEDLINE. Prior to the mapping exercise and search strategy development, 81 software tools and 55 'Other' tools were included within the SR Toolbox. Since implementation of the search strategy, 146 tools have been added. There has been an increase in tools added to the toolbox since the search was developed and its corresponding auto‐alert in MEDLINE was originally set up. Developing a search strategy based on a mapping exercise is an effective way of identifying new tools to support the systematic review process. Further research could be conducted to help prioritise records for screening to reduce reviewer burden further and to adapt the strategy for disciplines beyond healthcare. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.