50,530 results
Search Results
2. A Year of Papers Using Biomedical Texts.
- Author
-
Grouin C and Grabar N
- Subjects
- Information Storage and Retrieval methods, Data Mining methods, Electronic Health Records, Natural Language Processing
- Abstract
Objectives: Analyze papers published in 2019 within the medical natural language processing (NLP) domain in order to select the best works of the field., Methods: We performed an automatic and manual pre-selection of papers to be reviewed and finally selected the best NLP papers of the year. We also propose an analysis of the content of NLP publications in 2019., Results: Three best papers have been selected this year including the generation of synthetic record texts in Chinese, a method to identify contradictions in the literature, and the BioBERT word representation., Conclusions: The year 2019 was very rich and various NLP issues and topics were addressed by research teams. This shows the will and capacity of researchers to move towards robust and reproducible results. Researchers also prove to be creative in addressing original issues with relevant approaches., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (Georg Thieme Verlag KG Stuttgart.)
- Published
- 2020
- Full Text
- View/download PDF
3. The plan to mine the world's research papers.
- Author
-
Pulla P
- Subjects
- Big Data economics, Data Mining trends, Datasets as Topic economics, Datasets as Topic legislation & jurisprudence, India, Open Access Publishing economics, Research Report, Unsupervised Machine Learning legislation & jurisprudence, Unsupervised Machine Learning trends, Big Data supply & distribution, Data Mining methods, Datasets as Topic supply & distribution, Information Dissemination legislation & jurisprudence, Information Dissemination methods, Open Access Publishing legislation & jurisprudence, Research
- Published
- 2019
- Full Text
- View/download PDF
4. Incidences of problematic cell lines are lower in papers that use RRIDs to identify cell lines.
- Author
-
Babic Z, Capes-Davis A, Martone ME, Bairoch A, Ozyurt IB, Gillespie TH, and Bandrowski AE
- Subjects
- Cell Line, Humans, Periodicals as Topic, PubMed, Bibliometrics, Biomedical Research standards, Cell Line Authentication statistics & numerical data, Data Mining methods
- Abstract
The use of misidentified and contaminated cell lines continues to be a problem in biomedical research. Research Resource Identifiers (RRIDs) should reduce the prevalence of misidentified and contaminated cell lines in the literature by alerting researchers to cell lines that are on the list of problematic cell lines, which is maintained by the International Cell Line Authentication Committee (ICLAC) and the Cellosaurus database. To test this assertion, we text-mined the methods sections of about two million papers in PubMed Central, identifying 305,161 unique cell-line names in 150,459 articles. We estimate that 8.6% of these cell lines were on the list of problematic cell lines, whereas only 3.3% of the cell lines in the 634 papers that included RRIDs were on the problematic list. This suggests that the use of RRIDs is associated with a lower reported use of problematic cell lines., Competing Interests: ZB, TG No competing interests declared, AC runs the cell bank in Australia and heads the ICLAC consortium. MM, AB heads the RRID project, and founded SciCrunch, a company that supports the RRID project. AB develops the Cellosaurus database. IO works as a consultant for SciCrunch., (© 2019, Babic et al.)
- Published
- 2019
- Full Text
- View/download PDF
5. 基于机器阅读理解的论文 辅助阅读系统构建.
- Author
-
秘蓉新, 姚文文, and 阮宏坤
- Subjects
- *
LANGUAGE models , *SCIENTIFIC literature , *LITERATURE reviews , *DATA mining , *READING comprehension - Abstract
In the era of informatization and digitization, the rapid increase in the number of scientific papers has given rise to various challenges, such as lengthy articles, difficulty in information extraction and high time costs associated with reading. Literature reading challenges for researchers are increasingly tedious and time-consuming. By utilizing the language models, the assited reading system of scientific papers has been designed to address these challenges. By adopting machine reading comprehension technology as the core, the system parses scientific texts and offers some common questions to achieve automated response capabilities. By fully utilizing the pre-trained language model PERT, the system enhances its capabilities in semantic understanding and information extraction, effectively resolving various challenges in reading scientific papers and helping readers improve the efficiency of scientific literature review. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Artificial intelligence for knowledge management : second IFIP WG 12.6 International Workshop, AI4KM 2014, Warsaw, Poland, September 7-10, 2014, revised selected papers.
- Author
-
Boulanger, Danielle, Mercier-Laurent, Eunika, and Owoc, Mieczysław Lech
- Subjects
Artificial intelligence ,Data mining ,Database management ,Knowledge management - Abstract
Summary: This book features a selection of papers presented at the Second IFIP WG 12.6 International Workshop on Artificial Intelligence for Knowledge Management, AI4KM 2014, held in Wroclaw, Poland, in September 2014, in the framework of the Federated Conferences on Computer Science and Information Systems, FedCSIS 2014. The 9 revised and extended papers and one invited paper were carefully reviewed and selected for inclusion in this volume. They present new research and innovative aspects in the field of knowledge management and are organized in the following topical sections: tools and methods for knowledge acquisition; models and functioning of knowledge management; techniques of artificial intelligence supporting knowlege management; and components of knowledge flow.
- Published
- 2016
7. Identification of data mining research frontier based on conference papers
- Author
-
Huang, Yue, Liu, Hu, and Pan, Jing
- Published
- 2021
- Full Text
- View/download PDF
8. Identification of data mining research frontier based on conference papers
- Author
-
Yue Huang, Hu Liu, and Jing Pan
- Subjects
data mining ,bibliometrics ,citespace ,conference papers ,research frontier ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
PurposeIdentifying the frontiers of a specific research field is one of the most basic tasks in bibliometrics and research published in leading conferences is crucial to the data mining research community, whereas few research studies have focused on it. The purpose of this study is to detect the intellectual structure of data mining based on conference papers.Design/methodology/approachThis study takes the authoritative conference papers of the ranking 9 in the data mining field provided by Google Scholar Metrics as a sample. According to paper amount, this paper first detects the annual situation of the published documents and the distribution of the published conferences. Furthermore, from the research perspective of keywords, CiteSpace was used to dig into the conference papers to identify the frontiers of data mining, which focus on keywords term frequency, keywords betweenness centrality, keywords clustering and burst keywords.FindingsResearch showed that the research heat of data mining had experienced a linear upward trend during 2007 and 2016. The frontier identification based on the conference papers showed that there were five research hotspots in data mining, including clustering, classification, recommendation, social network analysis and community detection. The research contents embodied in the conference papers were also very rich.Originality/valueThis study detected the research frontier from leading data mining conference papers. Based on the keyword co-occurrence network, from four dimensions of keyword term frequency, betweeness centrality, clustering analysis and burst analysis, this paper identified and analyzed the research frontiers of data mining discipline from 2007 to 2016.
- Published
- 2021
- Full Text
- View/download PDF
9. References.
- Author
-
Schmidt, Julia, Pilgrim, Graham, and Mourougane, Annabelle
- Subjects
WORKING papers ,LABOR market ,DATA mining - Published
- 2023
- Full Text
- View/download PDF
10. Incidences of problematic cell lines are lower in papers that use RRIDs to identify cell lines.
- Author
-
Babic, Zeljana, Capes-Davis, Amanda, Martone, Maryann E, Bairoch, Amos, Ozyurt, I Burak, Gillespie, Thomas H, and Bandrowski, Anita E
- Subjects
Cell Line ,Humans ,Biomedical Research ,Bibliometrics ,PubMed ,Periodicals as Topic ,Data Mining ,Cell Line Authentication ,authentication ,cell line ,computational biology ,none ,reproducibility ,rigor ,software ,systems biology ,text mining ,Biochemistry and Cell Biology - Abstract
The use of misidentified and contaminated cell lines continues to be a problem in biomedical research. Research Resource Identifiers (RRIDs) should reduce the prevalence of misidentified and contaminated cell lines in the literature by alerting researchers to cell lines that are on the list of problematic cell lines, which is maintained by the International Cell Line Authentication Committee (ICLAC) and the Cellosaurus database. To test this assertion, we text-mined the methods sections of about two million papers in PubMed Central, identifying 305,161 unique cell-line names in 150,459 articles. We estimate that 8.6% of these cell lines were on the list of problematic cell lines, whereas only 3.3% of the cell lines in the 634 papers that included RRIDs were on the problematic list. This suggests that the use of RRIDs is associated with a lower reported use of problematic cell lines.
- Published
- 2019
11. Personalized paper recommendation for postgraduates using multi-semantic path fusion.
- Author
-
Xiao, Xia, Jin, Bo, and Zhang, Chengde
- Subjects
INTERGENERATIONAL mobility ,EDUCATIONAL mobility ,GRADUATE education ,DATA mining ,ELECTRONIC data processing ,SHIFT registers - Abstract
During graduate education, postgraduates have to spend considerable time finding papers to explore the development branches of their field. However, existing paper recommendation methods focus on several attributes (title, author, keyword, venue, etc.). The network schema constructed by these attributes is extremely sparse, which easily causes the loss of important semantic paths between attributes. This results in a lack of correlations among relevant papers, which affects paper recommendation efficiency. Moreover, the relationships between multiple semantic paths can be found through common homogeneous and heterogeneous attributes. These relationships can establish many correlations among relevant papers. To address the above problems, this paper proposes a new approach to fuse multi-semantic paths into a heterogeneous educational network (HEN) for personalized paper recommendation. After data processing, a new HEN schema is built by enriching nodes and edges in heterogeneous networks. Then, different semantic meta-paths are generated by projection sub-nets. Next, a new HEN embedding method is proposed by multi-semantic path fusion to generate rich HEN node sequences. Finally, personalized paper recommendation for postgraduates by targeted path similarity. The proposed method was performed on two paper datasets in the fields of educational intergenerational mobility from 1987 to 2021 and data mining and intelligent media from 1997 to 2021. Substantial experiments demonstrate that the proposed approach is effective. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Automatic extraction of significant terms from the title and abstract of scientific papers using the machine learning algorithm: A multiple module approach.
- Author
-
Mukherjee, Bhaskar and Majhi, Debasis
- Abstract
Keyword extraction is the task of identifying important terms or phrase that are most representative of the source document. Although the process of automatic extraction of keywords from title is an old method, it was mainly for extraction from a single web document. Our approach differs from previous research works on keyword extraction in several aspects. For those who are non-expert of the scientific fields, understating scientific research trends is difficult. The purpose of this study is to develop an automatic method of obtaining overviews of a scientific field for non-experts by capturing research trends. This empirical study excavates significant term extraction using Natural Language Processing (NLP) tools. More than 15000 titles saved in a .csv file was our dataset and scripts written in Python were our process to compare how far significant terms of scientific title corpus are similar or different to the terms available in the abstract of that same scientific article corpus. A light-weight unsupervised title extractor, Yet Another Keyword Extractor (YAKE) was used to extract the results. Based on our analysis, it can be concluded that these algorithms can be used for other fields too by the non-experts of that subject field to perform automatic extraction of significant words and understanding trends. Our algorithm could be a solution to reduce the labour-intensive manual indexing process. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Development of an Embedding Framework for Clustering Scientific Papers
- Author
-
Songhee Kim, Suyeong Lee, and Byungun Yoon
- Subjects
Clustering method ,data mining ,text mining ,text analysis ,scientific publishing ,fuel cells ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In this era, research and development are becoming a continuous and accelerating process because technology changes rapidly with a short lifecycle. As a result, various methodologies are being developed to monitor these rapidly changing research trends; In particular, clustering method-related studies in science and technology documents are being developed with a variety of approaches. However, previous studies on document clustering methods focus on a specific field or language but do not take into consideration certain important pieces of information in science and technology documents. Therefore, this study proposes an embedding methodology that uses important content from scientific and technical documents. We took into consideration the importance of information containing core structures in science and technology documents and proposed a clustering methodology that analyzes structured and unstructured data, such as textual information, author information, and citation information. The proposed method combines both textual and structural data from the paper, using a method that focuses on screening important information by sections in science and technology documents. Then, Girvan-Newman clustering and Louvain clustering models are applied to generate embedding vectors and show evaluation results through the clustering indices. As a practical example, we applied the proposed methodology using paper data from the field of hydrogen cell vehicles. The results of this study will be effective in identifying gaps in technology for new technological development, identifying technology trends, and presenting directional information for future technology development.
- Published
- 2022
- Full Text
- View/download PDF
14. Extracting laboratory test information from paper-based reports.
- Author
-
Ma, Ming-Wei, Gao, Xian-Shu, Zhang, Ze-Yu, Shang, Shi-Yu, Jin, Ling, Liu, Pei-Lin, Lv, Feng, Ni, Wei, Han, Yu-Chen, and Zong, Hui
- Subjects
- *
OPTICAL character recognition , *NATURAL language processing , *HEALTH information systems , *TEXT recognition , *RANDOM fields , *DATA mining - Abstract
Background: In the healthcare domain today, despite the substantial adoption of electronic health information systems, a significant proportion of medical reports still exist in paper-based formats. As a result, there is a significant demand for the digitization of information from these paper-based reports. However, the digitization of paper-based laboratory reports into a structured data format can be challenging due to their non-standard layouts, which includes various data types such as text, numeric values, reference ranges, and units. Therefore, it is crucial to develop a highly scalable and lightweight technique that can effectively identify and extract information from laboratory test reports and convert them into a structured data format for downstream tasks. Methods: We developed an end-to-end Natural Language Processing (NLP)-based pipeline for extracting information from paper-based laboratory test reports. Our pipeline consists of two main modules: an optical character recognition (OCR) module and an information extraction (IE) module. The OCR module is applied to locate and identify text from scanned laboratory test reports using state-of-the-art OCR algorithms. The IE module is then used to extract meaningful information from the OCR results to form digitalized tables of the test reports. The IE module consists of five sub-modules, which are time detection, headline position, line normalization, Named Entity Recognition (NER) with a Conditional Random Fields (CRF)-based method, and step detection for multi-column. Finally, we evaluated the performance of the proposed pipeline on 153 laboratory test reports collected from Peking University First Hospital (PKU1). Results: In the OCR module, we evaluate the accuracy of text detection and recognition results at three different levels and achieved an averaged accuracy of 0.93. In the IE module, we extracted four laboratory test entities, including test item name, test result, test unit, and reference value range. The overall F1 score is 0.86 on the 153 laboratory test reports collected from PKU1. With a single CPU, the average inference time of each report is only 0.78 s. Conclusion: In this study, we developed a practical lightweight pipeline to digitalize and extract information from paper-based laboratory test reports in diverse types and with different layouts that can be adopted in real clinical environments with the lowest possible computing resources requirements. The high evaluation performance on the real-world hospital dataset validated the feasibility of the proposed pipeline. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Analyzing the Accuracy of Answer Sheet Data in Paper-based Test Using Decision Tree
- Author
-
Edy Suharto, Aris Puji Widodo, and Suryono Suryono
- Subjects
data mining ,decision tree ,paper-based test ,education ,Electronic computers. Computer science ,QA75.5-76.95 ,Economic growth, development, planning ,HD72-88 - Abstract
In education quality assurance, the accuracy of test data is crucial. However, there is still a problem regarding to the possibility of incorrect data filled by test taker during paper-based test. On the contrary, this problem does not appear in computer-based test. In this study, a method was proposed in order to analyze the accuracy of answer sheet filling out in paper-based test using data mining technique. A single layer of data comprehension was added within the method instead of raw data. The results of the study were a web-based program for data pre-processing and decision tree models. There were 374 instances which were analyzed. The accuracy of answer sheet filling out attained 95.19% while the accuracy of classification varied from 99.47% to 100% depend on evaluation method chosen. This study could motivate the administrators for test improvement since it preferred computer-based test to paper-based.
- Published
- 2019
- Full Text
- View/download PDF
16. Emotion Mining: from Unimodal to Multimodal Approaches
- Author
-
Zucco, Chiara, Calabrese, Barbara, Cannataro, Mario, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amunts, Katrin, editor, Grandinetti, Lucio, editor, Lippert, Thomas, editor, and Petkov, Nicolai, editor
- Published
- 2021
- Full Text
- View/download PDF
17. A pipeline for the retrieval and extraction of domain-specific information with application to COVID-19 immune signatures.
- Author
-
Newton, Adam J. H., Chartash, David, Kleinstein, Steven H., and McDougal, Robert A.
- Subjects
DATA mining ,COVID-19 ,GENE expression ,SARS-CoV-2 - Abstract
Background: The accelerating pace of biomedical publication has made it impractical to manually, systematically identify papers containing specific information and extract this information. This is especially challenging when the information itself resides beyond titles or abstracts. For emerging science, with a limited set of known papers of interest and an incomplete information model, this is of pressing concern. A timely example in retrospect is the identification of immune signatures (coherent sets of biomarkers) driving differential SARS-CoV-2 infection outcomes. Implementation: We built a classifier to identify papers containing domain-specific information from the document embeddings of the title and abstract. To train this classifier with limited data, we developed an iterative process leveraging pre-trained SPECTER document embeddings, SVM classifiers and web-enabled expert review to iteratively augment the training set. This training set was then used to create a classifier to identify papers containing domain-specific information. Finally, information was extracted from these papers through a semi-automated system that directly solicited the paper authors to respond via a web-based form. Results: We demonstrate a classifier that retrieves papers with human COVID-19 immune signatures with a positive predictive value of 86%. The type of immune signature (e.g., gene expression vs. other types of profiling) was also identified with a positive predictive value of 74%. Semi-automated queries to the corresponding authors of these publications requesting signature information achieved a 31% response rate. Conclusions: Our results demonstrate the efficacy of using a SVM classifier with document embeddings of the title and abstract, to retrieve papers with domain-specific information, even when that information is rarely present in the abstract. Targeted author engagement based on classifier predictions offers a promising pathway to build a semi-structured representation of such information. Through this approach, partially automated literature mining can help rapidly create semi-structured knowledge repositories for automatic analysis of emerging health threats. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Comparison of three filter paper-based devices for safety and stability of viral sample collection in poultry
- Author
-
Suwarak Wannaratana, Aunyaratana Thontiravong, and Somsak Pakpinyo
- Subjects
General Immunology and Microbiology ,Food Animals ,Filter paper ,DNA stability ,viruses ,Stability (learning theory) ,Animal Science and Zoology ,Sample collection ,Data mining ,Biology ,computer.software_genre ,computer - Abstract
General diagnosis of poultry viruses primarily relies on detection of viruses in samples, but many farms are located in remote areas requiring logistic transportation. Filter paper cards are a usef...
- Published
- 2020
19. A Text Mining Approach to Covid-19 Literature.
- Author
-
Liu, Fangyao, Ergu, Daji, Li, Biao, Deng, Wei, Chen, Zhengxin, Lu, Guoqing, and Shi, Yong
- Subjects
TEXT mining ,SARS-CoV-2 ,COVID-19 ,MEDICAL research personnel ,DATA mining - Abstract
The novel coronavirus disease — COVID-19 is a historic catastrophe that has caused many devastating impacts on human life and wellness. Researchers in academia and industry strive to understand the causes of this pandemic disease and find new therapeutics combating it. Consequently, the number of COVID-19 related publications increases rapidly, and it is too difficult for medical researchers and practitioners to keep up with the latest research and development. Literature filtering and categorization, and knowledge discovery can use text mining as a powerful tool. In this paper, we propose a text mining method to explore the categories of COVID-19 related themes and identify the standard methodologies that have been used. We discuss the potential limitations of this preliminary study and present future perspectives related to COVID-19 research. This paper provides an quantitative and qualitative mixed analysis example of using some research papers by data mining method to dig out several hidden information and set up a foundation for data scientists to develop more effective algorithms to deal with COVID-19 related problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. A Review Paper on Big Data and Data Mining Concepts and Techniques
- Author
-
Prasdika Prasdika and Bambang Sugiantoro
- Subjects
data ,big data ,data mining ,Electronic computers. Computer science ,QA75.5-76.95 ,Economic growth, development, planning ,HD72-88 - Abstract
In the digital era like today the growth of data in the database is very rapid, all things related to technology have a large contribution to data growth as well as social media, financial technology and scientific data. Therefore, topics such as big data and data mining are topics that are often discussed. Data mining is a method of extracting information through from big data to produce an information pattern or data anomaly
- Published
- 2018
- Full Text
- View/download PDF
21. Analyzing the Accuracy of Answer Sheet Data in Paper-based Test Using Decision Tree
- Author
-
Aris Puji Widodo, Edy Suharto, and Suryono Suryono
- Subjects
education ,Computer science ,Decision tree ,Paper based ,data mining ,computer.software_genre ,lcsh:QA75.5-76.95 ,lcsh:HD72-88 ,Test (assessment) ,lcsh:Economic growth, development, planning ,Comprehension ,paper-based test ,Order (business) ,decision tree ,Data mining ,lcsh:Electronic computers. Computer science ,Raw data ,computer ,Single layer ,Test data - Abstract
In education quality assurance, the accuracy of test data is crucial. However, there is still a problem regarding to the possibility of incorrect data filled by test taker during paper-based test. On the contrary, this problem does not appear in computer-based test. In this study, a method was proposed in order to analyze the accuracy of answer sheet filling out in paper-based test using data mining technique. A single layer of data comprehension was added within the method instead of raw data. The results of the study were a web-based program for data pre-processing and decision tree models. There were 374 instances which were analyzed. The accuracy of answer sheet filling out attained 95.19% while the accuracy of classification varied from 99.47% to 100% depend on evaluation method chosen. This study could motivate the administrators for test improvement since it preferred computer-based test to paper-based.
- Published
- 2019
22. The Arquive of Tatuoca Magnetic Observatory Brazil: from paper to intelligent bytes
- Author
-
Cristian Berrio-Zapata, Ester Ferreira da Silva, Mayara Costa Pinheiro, Vinicius Augusto Carvalho de Abreu, Cristiano Mendel Martins, Mario Augusto Gongora, and Kelso Dunman
- Subjects
Big Data ,records management ,Observatories ,Deep learning ,Geomagnetism ,geophysics computing ,information retrieval systems ,Collaboration ,Data mining - Abstract
The Magnetic Observatory of Tatuoca (TTB) was installed by Observatório Nacional (ON) in 1957, near Belém city in the state of Pará, Brazilian Amazon. Its history goes back to 1933, when a Danish mission used this location to collect data, due to its privileged position near the terrestrial equator. Between 1957 and 2007, TTB produced 18,000 magnetograms on paper using photographic variometers, and other associated documents like absolute value forms and yearbooks. Data was obtained manually from these graphs with rulers and grids, taking 24 average readings per day, that is, one per hour. In 2017, the Federal University of Pará (UFPA in the Portuguese acronym) and ON collaborated to rescue this physical archive. In 2022 UFPA took a step forward and proposed not only digitizing the documents but also developing an intelligent agent capable of reading and extracting the information of the curves with a resolution better than an hour, being this the central goal of the project. If the project succeeds, it will rescue 50 years of data imprisoned in paper, increasing measurement sensitivity far beyond what these sources used to give. This will also open the possibility of applying the same AI to similar documents in other observatories or disciplines like seismography. This article recaps the project, and the complex challenges faced in articulating Archival Science principles with AI and Geoscience.
- Published
- 2022
- Full Text
- View/download PDF
23. Predicting rank for scientific research papers using supervised learning.
- Author
-
El Mohadab, Mohamed, Bouikhalene, Belaid, and Safi, Said
- Subjects
ELECTRONIC data processing ,SUPERVISED learning ,MACHINE learning ,INFORMATION & communication technologies ,INFORMATION technology ,ELECTRONIC services - Abstract
Automatic data processing represents the future for the development of any system, especially in scientific research. In this paper, we describe one of the automatic classification methods applied to scientific research as a supervised learning task. Throughout the process, we identify the main features that are used as keys to play a significant role in terms of predicting the new rank under the supervised learning setup. First, we propose an overview of the work that has been realized in ranking scientific research papers. Second, we evaluate and compare some of state-of-the-art for the classification by supervised learning, semi-supervised learning and non-supervised learning. During the preliminary tests, we have obtained good results for performance on realistic corpus then we have compared performance metrics, such as NDCG, MAP, GMAP, F-Measure, Precision and Recall in order to define the influential features in our work. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. Using a Data-Mining Approach to Unveil Greenhouse Gas Emission Intensities of Different Pulp and Paper Products.
- Author
-
Nabinger, Alec, Tomberlin, Kristen, Venditti, Richard, and Yao, Yuan
- Abstract
Life Cycle Assessment (LCA) has been used to evaluate the life-cycle Greenhouse Gas (GHG) emissions of pulp and paper production, and most previous studies rely on process-based models for specific product types (e.g., printing paper), industry-average data, or information from a few mills. In this work, a data-mining approach is used to quantify GHG emissions intensities of different paper products manufactured by the U.S. mills. Facility-level emission data collected from publically available governmental databases and mill-level production data collected from the private sector were integrated to track the GHG emissions for different product lines and paper products in mills (in total, 165 mills were matched and analyzed). The results highlight the ranges of GHG emissions intensities by different product groups and categories, and can be used as a transparent data source for LCA practitioners, policymakers, and the pulp and paper industry to perform further analysis on carbon accounting and strategic planning for GHG mitigation. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
25. Classification models for likelihood prediction of diabetes at early stage using feature selection
- Author
-
Oladimeji, Oladosu Oyebisi, Oladimeji, Abimbola, and Oladimeji, Olayanju
- Published
- 2024
- Full Text
- View/download PDF
26. Auto-generated Test Paper Based on Knowledge Embedding
- Author
-
Guo-Sheng Hao, Fang Luo, Yi-Yang He, Xiao-Dan He, Zeng-Hui Duan, and Xing-Liu Hu
- Subjects
Computer science ,Embedding ,Data mining ,Paper based ,computer.software_genre ,computer ,Computer Science Applications ,Education ,Test (assessment) - Published
- 2019
27. CORE: A Global Aggregation Service for Open Access Papers.
- Author
-
Knoth, Petr, Herrmannova, Drahomira, Cancellieri, Matteo, Anastasiou, Lucas, Pontika, Nancy, Pearce, Samuel, Gyawali, Bikash, and Pride, David
- Subjects
ELECTRONIC journals ,OPEN access publishing ,TEXT mining ,SCIENTIFIC knowledge ,SCIENTIFIC literature ,DATA mining - Abstract
This paper introduces CORE, a widely used scholarly service, which provides access to the world's largest collection of open access research publications, acquired from a global network of repositories and journals. CORE was created with the goal of enabling text and data mining of scientific literature and thus supporting scientific discovery, but it is now used in a wide range of use cases within higher education, industry, not-for-profit organisations, as well as by the general public. Through the provided services, CORE powers innovative use cases, such as plagiarism detection, in market-leading third-party organisations. CORE has played a pivotal role in the global move towards universal open access by making scientific knowledge more easily and freely discoverable. In this paper, we describe CORE's continuously growing dataset and the motivation behind its creation, present the challenges associated with systematically gathering research papers from thousands of data providers worldwide at scale, and introduce the novel solutions that were developed to overcome these challenges. The paper then provides an in-depth discussion of the services and tools built on top of the aggregated data and finally examines several use cases that have leveraged the CORE dataset and services. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. Experimental Comparison in Sensing Breast Cancer Mutations by Signal ON and Signal OFF Paper-Based Electroanalytical Strips
- Author
-
Emily P. Nguyen, Fabiana Arduini, Claudio Parolo, Giulia Cinotti, Danila Moscone, Stefano Cinti, Arben Merkoçi, Veronica Caratelli, Cinti, S., Cinotti, G., Parolo, C., Nguyen, E. P., Caratelli, V., Moscone, D., Arduini, F., and Merkoci, A.
- Subjects
Paper ,DNA, Single-Stranded ,Breast Neoplasms ,STRIPS ,Biosensing Techniques ,010402 general chemistry ,computer.software_genre ,01 natural sciences ,Signal ,Field (computer science) ,Analytical Chemistry ,law.invention ,Biosensing Technique ,DNA-based biosensors ,Breast cancer ,Settore CHIM/01 ,Design and Development ,law ,Experimental comparison ,Detection methods ,medicine ,Humans ,Liquid biopsy ,Protocol (science) ,Electrochemical Technique ,Chemistry ,010401 analytical chemistry ,Analytical performance ,Electrochemical Techniques ,medicine.disease ,Signal on ,0104 chemical sciences ,Emerging technologies ,Mutation ,Single strand DNA ,Female ,Data mining ,Detection protocols ,Biosensor ,computer ,Breast Neoplasm ,Human - Abstract
Altres ajuts: the ICN2 is funded by the CERCA Programme/Generalitat de Catalunya. The development of paper-based electroanalytical strips as powerful diagnostic tools has gained a lot of attention within the sensor community. In particular, the detection of nucleic acids in complex matrices represents a trending topic, especially when focused toward the development of emerging technologies, such as liquid biopsy. DNA-based biosensors have been largely applied in this direction, and currently, there are two main approaches based on target/probe hybridization reported in the literature, namely Signal ON and Signal OFF. In this technical note, the two approaches are evaluated in combination with paper-based electrodes, using a single strand DNA relative to H1047R (A3140G) missense mutation in exon 20 in breast cancer as the model target. A detailed comparison among the analytical performances, detection protocol, and cost associated with the two systems is provided, highlighting the advantages and drawbacks depending on the application. The present work is aimed to a wide audience, particularly for those in the field of point-of-care, and it is intended to provide the know-how to manage with the design and development stages, and to optimize the platform for the sensing of nucleic acids using a paper-based detection method.
- Published
- 2019
29. Learning Embeddings for Academic Papers
- Author
-
Zhang, Yi
- Subjects
Skip-gram ,Academic papers ,Networks ,Data mining ,Embeddings - Abstract
Academic papers contain both text and citation links. Representing such data is crucial for many downstream tasks, such as classification, disambiguation, duplicates detection, recommendation and influence prediction. The success of Skip-gram with Negative Sampling model (hereafter SGNS) has inspired many algorithms to learn embeddings for words, documents, and networks. However, there is limited research on learning the representation of linked documents such as academic papers. This dissertation first studies the norm convergence issue in SGNS and propose to use an L2 regularization to fix the problem. Our experiments show that our method improves SGNS and its variants on different types of data. We observe improvements upto 17.47% for word embeddings, 1.85% for document embeddings, and 46.41% for network embeddings. To learn the embeddings for academic papers, we propose several neural network based algorithms that can learn high-quality embeddings from different types of data. The algorithms we proposed are N2V (network2vector) for networks, D2V (document2vector) for documents, and P2V (paper2vector) for academic papers. Experiments show that our models outperform traditional algorithms and the state-of-the-art neural network methods on various datasets under different machine learning tasks. With the high quality embeddings, we design and present four applications on real-world datasets, i.e., academic paper and author search engines, author name disambiguation, and paper influence prediction.
- Published
- 2019
30. Low-Rank and Sparse Matrix Factorization for Scientific Paper Recommendation in Heterogeneous Network
- Author
-
Li Zhu, Xiaoyan Cai, Tianyu Gao, Shirui Pan, and Tao Dai
- Subjects
General Computer Science ,Rank (linear algebra) ,Computer science ,heterogeneous network ,General Engineering ,02 engineering and technology ,low rank and sparse matrix factorization ,Recommender system ,computer.software_genre ,Matrix decomposition ,Matrix (mathematics) ,Paper recommendation ,Cold start ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Collaborative filtering ,020201 artificial intelligence & image processing ,General Materials Science ,Data mining ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,computer ,lcsh:TK1-9971 ,Heterogeneous network ,Sparse matrix - Abstract
© 2013 IEEE. With the rapid growth of scientific publications, it is hard for researchers to acquire appropriate papers that meet their expectations. Recommendation system for scientific articles is an essential technology to overcome this problem. In this paper, we propose a novel low-rank and sparse matrix factorization-based paper recommendation (LSMFPRec) method for authors. The proposed method seamlessly combines low-rank and sparse matrix factorization method with fine-grained paper and author affinity matrixes that are extracted from heterogeneous scientific network. Thus, it can effectively alleviate the sparsity and cold start problems that exist in traditional matrix factorization based collaborative filtering methods. Moreover, LSMFPRec can significantly reduce the error propagated from intermediate outputs. In addition, the proposed method essentially captures the low-rank and sparse characteristics that exist in scientific rating activities; therefore, it can generate more reasonable predicted ratings for influential and uninfluential papers. The effectiveness of the proposed LSMFPRec is demonstrated by the recommendation evaluation conducted on the AAN and CiteULike data sets.
- Published
- 2018
31. Data science fundamentals for Python and MongoDB.
- Author
-
Paper, David
- Subjects
MongoDB ,Data mining ,Python (Computer program language) ,COMPUTERS -- General ,Programming & scripting languages - Abstract
Summary: Build the foundational data science skills necessary to work with and better understand complex data science algorithms. This example-driven book provides complete Python coding examples to complement and clarify data science concepts, and enrich the learning experience. Coding examples include visualizations whenever appropriate. The book is a necessary precursor to applying and implementing machine learning algorithms. The book is self-contained. All of the math, statistics, stochastic, and programming skills required to master the content are covered. In-depth knowledge of object-oriented programming isn't required because complete examples are provided and explained. Data Science Fundamentals with Python and MongoDB is an excellent starting point for those interested in pursuing a career in data science. Like any science, the fundamentals of data science are a prerequisite to competency. Without proficiency in mathematics, statistics, data manipulation, and coding, the path to success is "rocky" at best. The coding examples in this book are concise, accurate, and complete, and perfectly complement the data science concepts introduced. What You'll Learn: Prepare for a career in data science Work with complex data structures in Python Simulate with Monte Carlo and Stochastic algorithms Apply linear algebra using vectors and matrices Utilize complex algorithms such as gradient descent and principal component analysis Wrangle, cleanse, visualize, and problem solve with data Use MongoDB and JSON to work with data.
- Published
- 2018
32. Identification of data mining research frontier based on conference papers
- Author
-
Jing Pan, Yue Huang, and Hu Liu
- Subjects
Computer science ,020206 networking & telecommunications ,Sample (statistics) ,02 engineering and technology ,Bibliometrics ,computer.software_genre ,Field (computer science) ,Ranking (information retrieval) ,Identification (information) ,Betweenness centrality ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Business, Management and Accounting (miscellaneous) ,020201 artificial intelligence & image processing ,Decision Sciences (miscellaneous) ,Data mining ,Cluster analysis ,Social network analysis ,computer - Abstract
Purpose Identifying the frontiers of a specific research field is one of the most basic tasks in bibliometrics and research published in leading conferences is crucial to the data mining research community, whereas few research studies have focused on it. The purpose of this study is to detect the intellectual structure of data mining based on conference papers. Design/methodology/approach This study takes the authoritative conference papers of the ranking 9 in the data mining field provided by Google Scholar Metrics as a sample. According to paper amount, this paper first detects the annual situation of the published documents and the distribution of the published conferences. Furthermore, from the research perspective of keywords, CiteSpace was used to dig into the conference papers to identify the frontiers of data mining, which focus on keywords term frequency, keywords betweenness centrality, keywords clustering and burst keywords. Findings Research showed that the research heat of data mining had experienced a linear upward trend during 2007 and 2016. The frontier identification based on the conference papers showed that there were five research hotspots in data mining, including clustering, classification, recommendation, social network analysis and community detection. The research contents embodied in the conference papers were also very rich. Originality/value This study detected the research frontier from leading data mining conference papers. Based on the keyword co-occurrence network, from four dimensions of keyword term frequency, betweeness centrality, clustering analysis and burst analysis, this paper identified and analyzed the research frontiers of data mining discipline from 2007 to 2016.
- Published
- 2021
33. ANALYSIS METHOD OF RESEARCH PAPERS PUBLISHED FOR AUDIT
- Author
-
GREAVU-ȘERBAN VALERICĂ
- Subjects
audit ,Google Scholar ,web scraping ,data mining ,DataMiner ,Commercial geography. Economic geography ,HF1021-1027 ,Economics as a science ,HB71-74 - Abstract
Representing a strong instrument of control and feedback used by top management executives, regulators institutions or independent bodies, the audit, its methods and techniques incite the interest of specialists, professionals, professors and researchers from all socio-economic activities. The way domain experts write about audit itself is often reflected in the manner in which they choose the keywords for the title and for the article. This study is a detailed analysis of assignment to the specific thematic areas of articles published in "Financial Audit" journal, for all public appearances in electronic format from the period 2003-2015. The study is different from other similar researches by the methodology and the type of information extracted addressed. The main purpose is to identify the most used keywords in the title and content of articles published over time and insight traceability to future research directions. The conclusions of the analysis from this article give a comprehensive picture of audit multidisciplinary, thus providing researchers, on several economic fields, an image about the content of the publication, quality information for readers, authors and future authors.
- Published
- 2015
34. Application of COReS to Compute Research Papers Similarity
- Author
-
Muhammad Abdul Qadir, Muhammad Afzal, and Qamar Mahmood
- Subjects
General Computer Science ,Process (engineering) ,Computer science ,content based similarity ,02 engineering and technology ,Ontology (information science) ,computer.software_genre ,Semantics ,ranking ,Similarity (network science) ,Comprehensive similarity computation ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,research paper similarity ,General Materials Science ,ontology ,Cluster analysis ,Measure (data warehouse) ,Information retrieval ,General Engineering ,Encyclopedia ,Ontology ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Data mining ,lcsh:TK1-9971 ,computer - Abstract
Over the decades, the immense growth has been reported in research publications due to continuous developments in science. To date, various approaches have been proposed that find similarity between research papers by applying different similarity measures collectively or individually based on the content of research papers. However, the contemporary schemes are not conceptualized enough to find related research papers in a coherent manner. This paper is aimed at finding related research papers by proposing a comprehensive and conceptualized model via building ontology named COReS: Content-based Ontology for Research Paper Similarity. The ontology is built by finding the explicit relationships (i.e., super-type sub-type, disjointedness, and overlapping) between state-of-the-art similarity techniques. This paper presents the applications of the COReS model in the form of a case study followed by an experiment. The case study uses InText citation-based and vector space-based similarity measures and relationships between these measures as defined in COReS. The experiment focuses on the computation of comprehensive similarity and other content-based similarity measures and rankings of research papers according to these measures. The obtained Spearman correlation coefficient results between ranks of research papers for different similarity measures and user study-based measure, justify the application of COReS for the computation of document similarity. The COReS is in the process of evaluation for ontological errors. In the future, COReS will be enriched to provide more knowledge to improve the process of comprehensive research paper similarity computation.
- Published
- 2017
35. A Hybrid Model Based on LFM and BiGRU Toward Research Paper Recommendation
- Author
-
Ziqing Nie, Xu Zhao, Chenkun Meng, Tie Feng, and Hui Kang
- Subjects
Word embedding ,General Computer Science ,Computer science ,Feature vector ,Feature extraction ,02 engineering and technology ,Semantics ,computer.software_genre ,LFM ,Matrix decomposition ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Recommender systems ,General Materials Science ,BiGRU ,user attention ,Artificial neural network ,business.industry ,Deep learning ,General Engineering ,deep learning ,TK1-9971 ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,Electrical engineering. Electronics. Nuclear engineering ,business ,computer ,Word (computer architecture) - Abstract
To improve the accuracy of user implicit rating prediction, we combine the traditional latent factor model (LFM) and bidirectional gated recurrent unit neural network (BiGRU) model to propose a hybrid model that deeply mines the latent semantics in the unstructured content of the text and generates a more accurate rating matrix. First, we utilize the user’s historical behavior (favorites records) to build a user rating matrix and decompose the matrix to obtain the latent factor vectors of users and literature. We also apply the BERT model for word embedding of the research papers to obtain the sequence of word vectors. Then, we apply the BiGRU with the user attention mechanism to mine the research paper textual content and to generate the new literature latent feature vectors that are used to replace the original literature latent factor vectors decomposed from the rating matrix. Finally, a new rating matrix is generated to obtain users’ ratings of noninteractive research papers and to generate the recommendation list according to the user latent factor vector. We design experiments on the real datasets and verify that the research paper recommendation model is superior to traditional recommendation models in terms of precision, recall, F1-value, coverage, popularity and diversity.
- Published
- 2020
36. Findings Seminal Papers Using Data Mining Techniques
- Author
-
Debrayan Bravo Hidalgo and Alexander Báez Hernández
- Subjects
Entrepreneurship ,business.industry ,Computer science ,Scopus ,Space (commercial competition) ,computer.software_genre ,Publish or perish ,Software ,Index (publishing) ,Similarity (psychology) ,Anomaly detection ,Data mining ,business ,computer - Abstract
The aim of this contribution is to show the detection of seminal papers using data mining techniques. To achieve the objective of this research, Rapidminer Studio software and its data mining tools are used, based on data created with information extracted from Google Scholar and Scopus, in three different areas of knowledge. In this process, other softwares such as Microsoft Excel and Publish or Perish are used. Comparing the results obtained for the searches in Knowledge Management, Entrepreneurship and Marketing, it was obtained that there is no marked similarity between the sets of articles that were obtained in Google Scholar and Scopus. The values for the Similarity Index remained below 0.52%, similar between Knowledge Management and Entrepreneurship but decreasing for Marketing. The detection of outliers using Data Mining techniques and in particular using Rapidminer, allowed to determine the seminals papers for the three search terms analyzed and allowed to characterize these in the space, in Google Scholar and Scopus. It was shown that the seminal articles can be different if Google Scholar or Scopus is used. The results suggest determining for other search terms whether the trend found is maintained or not.
- Published
- 2020
37. (Short Paper) Effectiveness of Entropy-Based Features in High- and Low-Intensity DDoS Attacks Detection
- Author
-
Abigail Koay, Winston K. G. Seah, and Ian Welch
- Subjects
Rényi entropy ,Computer science ,ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS ,Short paper ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,Denial-of-service attack ,02 engineering and technology ,Data mining ,computer.software_genre ,computer - Abstract
DDoS attack detection using entropy-based features in network traffic has become a popular approach among researchers in the last five years. The use of traffic distribution features constructed using entropy measures has been proposed as a better approach to detect Distributed Denial of Service (DDoS) attacks compared to conventional volumetric methods, but it still lacks in the generality of detecting various intensity DDoS attacks accurately. In this paper, we focus on identifying effective entropy-based features to detect both high- and low-intensity DDoS attacks by exploring the effectiveness of entropy-based features in distinguishing the attack from normal traffic patterns. We hypothesise that using different entropy measures, window sizes, and entropy-based features may affect the accuracy of detecting DDoS attacks. This means that certain entropy measures, window sizes, and entropy-based features may reveal attack traffic amongst normal traffic better than the others. Our experimental results show that using Shannon, Tsallis and Zhou entropy measures can achieve a clearer distinction between DDoS attack traffic and normal traffic than Renyi entropy. In addition, the window size setting used in entropy construction has minimal influence in differentiating between DDoS attack traffic and normal traffic. The result of the effectiveness ranking shows that the commonly used features are less effective than other features extracted from traffic headers.
- Published
- 2019
38. [Paper] Multimodal Stress Estimation Using Multibiological Information: Towards More Accurate and Detailed Stress Estimation
- Author
-
Takumi Nagasawa, Norimichi Tsumura, Ryo Takahashi, and Keiko Ogawa-Ochiai
- Subjects
Computer science ,Signal Processing ,Stress estimation ,Media Technology ,Data mining ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,computer - Published
- 2021
39. Reproducibility Companion Paper
- Author
-
Zhenzhong Kuang, Xinke Li, Zekun Tong, Cise Midoglu, Yabang Zhao, Yuqing Liao, and Andrew Lim
- Subjects
Source code ,Computer science ,business.industry ,media_common.quotation_subject ,Deep learning ,Point cloud ,computer.software_genre ,File format ,Replication (computing) ,Photogrammetry ,Benchmark (surveying) ,Segmentation ,Artificial intelligence ,Data mining ,business ,computer ,media_common - Abstract
This companion paper is to support the replication of paper "Campus3D: A Photogrammetry Point Cloud Benchmark for Outdoor Scene Hierarchical Understanding", which was presented at ACM Multimedia 2020. The supported paper's main purpose was to provide a photogrammetry point cloud-based dataset with hierarchical multilabels to facilitate the area of 3D deep learning. Based on this provided dataset and source code, in this work, we build a complete package to reimplement the proposed methods and experiments (i.e., the hierarchical learning framework and the benchmarks of the hierarchical semantic segmentation task). Specifically, this paper contains the technical details of the package, including file structure, dataset preparation, installation package, and the conduction of the experiment. We also present the replicated experiment results and indicate our contributions to the original implementation.
- Published
- 2021
40. ANALYSIS OF ENERGY SAVING AND EMISSION REDUCTION OF SECONDARY FIBER MILL BASED ON DATA MINING.
- Author
-
Song HU, Jigeng LI, Mengna HONG, and Yi MAN
- Subjects
PAPER recycling ,WASTE paper ,DATA mining ,PAPER pulp ,ENVIRONMENTAL protection ,COMMERCIAL buildings ,BLEACHING (Chemistry) - Abstract
Waste paper recycling is an important way to realize the environmental protection development of the papermaking industry. The quality of the pulp will affect the pulp sales of the secondary fiber paper mills. The waste paper pulp can be adjusted by controlling the pulping process working conditions, but the working conditions of the waste paper pulping process have too many parameters. And the parameters are coupled with each other, it is difficult to control. In order to find the best working conditions and improve the quality of the pulp, this study uses the association rules algorithm to optimize the parameters for the waste paper pulping process. These parameters are power of refiner, waste paper concentration of refiner, the volume of slurry that enters deinked process, deinking agent amount, deinking time, deinking temperature, bleaching agent amount, bleaching time, and bleaching temperature. The test results show that the qualified rate of the pulp produced under the improved working conditions is 92.56%, an increase of 6.93%, and the average electricity consumption per ton of pulp is reduced by 5.76 kWh/t. In addition to potential economic benefits, this method can reduce carbon emissions. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
41. Image Matching Across Wide Baselines: From Paper to Practice
- Author
-
Yuhe Jin, Kwang Moo Yi, Pascal Fua, Eduard Trulls, Jiri Matas, Dmytro Mishkin, and Anastasiia Mishchuk
- Subjects
FOS: Computer and information sciences ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,computer.software_genre ,benchmark ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,dataset ,Structure from motion ,local features ,3d reconstruction ,structure from motion ,stereo ,Benchmarking ,Pipeline (software) ,Pattern recognition (psychology) ,Metric (mathematics) ,Benchmark (computing) ,Embedding ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Data mining ,Heuristics ,computer ,performance ,Software - Abstract
We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task -- the accuracy of the reconstructed camera pose -- as our primary metric. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of Structure from Motion (SfM) pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online https://github.com/vcg-uvic/image-matching-benchmark, providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge https://vision.uvic.ca/image-matching-challenge., Comment: Added: KeyNet-SOSNet, AffNet-HardNet, TFeat, MKD from kornia
- Published
- 2020
42. Keywords-Driven and Weight-aware Paper Recommendation via Paper Correlation Pattern Mining
- Author
-
Jun Hou, Qianmu Li, Jian Jiang, and Hanwen Liu
- Subjects
Correlation ,Computer science ,Data mining ,computer.software_genre ,computer - Abstract
Currently, readers often prefer to search for their interested papers based on a set of typed query keywords. As the keywords of a paper is often limited, paper recommender systems often need to recommend a set of papers which collectively satisfy the readers’ keyword query. However, the topics of recommended papers are probably not correlated with each other, which fail to meet the readers’ requirements on in-depth and continuous academic research. Furthermore, although existing paper citation graphs can model the papers’ correlations, they often face the data sparse problem which blocks accurate paper recommendations. To address these issues, we propose a keywords-driven and weight-aware paper recommendation approach, named LP-PRk+w (link prediction-paper recommendation), based on a weighted paper correlation graph. Concretely, we firstly optimize the existing paper citation graph modes by introducing a weighted similarity, after which we obtain a weighted paper correlation graph. Then we recommend a set of correlated papers based on the weighted paper correlation graph and the query keywords from readers. At last, we conduct large-scale experiments on a real-world Hep-Th dataset. Experimental results demonstrate that our proposal can improve the paper recommendation performances considerably, compared to other related solutions.
- Published
- 2021
43. Position paper: Sensitivity analysis of spatially distributed environmental models- a pragmatic framework for the exploration of uncertainty sources
- Author
-
Takuya Iwanaga, Xifu Sun, Anthony Jakeman, Tianxiang Yue, Barry Croke, Xin Li, Hyeongmo Koo, Xintao Liu, Wenping Yuan, Guonian Lü, Min Chen, Jing Yang, and Hsiao-Hsuan Wang
- Subjects
Environmental Engineering ,010504 meteorology & atmospheric sciences ,Soil and Water Assessment Tool ,Spatial structure ,Computer science ,Ecological Modeling ,media_common.quotation_subject ,0208 environmental biotechnology ,02 engineering and technology ,computer.file_format ,computer.software_genre ,01 natural sciences ,020801 environmental engineering ,Environmental modeling ,Information system ,Position paper ,Quality (business) ,Data mining ,Sensitivity (control systems) ,Raster graphics ,computer ,Software ,0105 earth and related environmental sciences ,media_common - Abstract
Sensitivity analysis (SA) has been used to evaluate the behavior and quality of environmental models by estimating the contributions of potential uncertainty sources to quantities of interest (QoI) in the model output. Although there is an increasing literature on applying SA in environmental modeling, a pragmatic and specific framework for spatially distributed environmental models (SD-EMs) is lacking and remains a challenge. This article reviews the SA literature for the purposes of providing a step-by-step pragmatic framework to guide SA, with an emphasis on addressing potential uncertainty sources related to spatial datasets and the consequent impact on model predictive uncertainty in SD-EMs. The framework includes: identifying potential uncertainty sources; selecting appropriate SA methods and QoI in prediction according to SA purposes and SD-EM properties; propagating perturbations of the selected potential uncertainty sources by considering the spatial structure; and verifying the SA measures based on post-processing. The proposed framework was applied to a SWAT (Soil and Water Assessment Tool) application to demonstrate the sensitivities of the selected QoI to spatial inputs, including both raster and vector datasets - for example, DEM and meteorological information - and SWAT (sub)model parameters. The framework should benefit SA users not only in environmental modeling areas but in other modeling domains such as those embraced by geographical information system communities.
- Published
- 2020
44. Position paper Statin intolerance – an attempt at a unified definition. Position paper from an International Lipid Expert Panel
- Author
-
Dimitri P. Mikhailidis, Assen Goudev, Manfredi Rizzo, Robert S. Greenfield, Daniel Lighezan, Karam Kostner, Richard Ceska, Dragan M. Djuric, Maciej Banach, Wilbert S. Aronow, Daniel Pella, Michael H. Davidson, Nathan D. Wong, Corina Serban, Marlena Broncel, Marat V. Ezhov, Jacek Rysz, Stephen J. Nicholls, Steven R. Jones, Peter P. Toth, Kausik K. Ray, Raman Puri, Vasilis G. Athyros, Paul Muntner, Zlatko Fras, Michel Farnier, Laszlo Bajnok, Dragana Nikolic, Khalid Al-Rasadi, Patrick M. Moriarty, and G. Kees Hovingh
- Subjects
medicine.medical_specialty ,Statin ,business.industry ,medicine.drug_class ,Alternative medicine ,Placebo-controlled study ,nutritional and metabolic diseases ,General Medicine ,Disease ,computer.software_genre ,3. Good health ,law.invention ,Randomized controlled trial ,law ,Post-hoc analysis ,medicine ,Position paper ,lipids (amino acids, peptides, and proteins) ,cardiovascular diseases ,Data mining ,business ,Intensive care medicine ,Adverse effect ,computer - Abstract
Statins are one of the most commonly prescribed drugs in clinical practice. They are usually well tolerated and effectively prevent cardiovascular events. Most adverse effects associated with statin therapy are muscle-related. The recent statement of the European Atherosclerosis Society (EAS) has focused on statin associated muscle symptoms (SAMS), and avoided the use of the term 'statin intolerance'. Although muscle syndromes are the most common adverse effects observed after statin therapy, excluding other side effects might underestimate the number of patients with statin intolerance, which might be observed in 10-15% of patients. In clinical practice, statin intolerance limits effective treatment of patients at risk of, or with, cardiovascular disease. Knowledge of the most common adverse effects of statin therapy that might cause statin intolerance and the clear definition of this phenomenon is crucial to effectively treat patients with lipid disorders. Therefore, the aim of this position paper was to suggest a unified definition of statin intolerance, and to complement the recent EAS statement on SAMS, where the pathophysiology, diagnosis and the management were comprehensively presented.
- Published
- 2015
45. Editorial Note to the Special Issue: "Trends and Applications in Information Systems and Technologies".
- Author
-
Ilieva, Galina and Tsihrintzis, George A.
- Subjects
INFORMATION storage & retrieval systems ,DIGITAL technology ,DATA mining ,BLOCKCHAINS ,INFORMATION processing - Abstract
This document is an editorial note introducing a special issue of the journal Electronics on the topic of "Trends and Applications in Information Systems and Technologies." The note highlights the importance of information technologies in reshaping business models and improving operational efficiency. It also discusses the challenges associated with information systems, such as cybersecurity issues and data-centric decision-making processes. The special issue includes fourteen selected papers covering topics such as the adoption of blockchain technology, digital business solutions, ICT penetration, processing and information extraction in information systems, and innovative applications of information systems. The note concludes by expressing the hope that readers will find the special issue informative and inspiring for further research. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
46. Brief Paper: Simplified Tag Identification Algorithm by Modifying Tag Collection Command in Active RFID System
- Author
-
Intaek Lim
- Subjects
Identification (information) ,Computer science ,Data mining ,computer.software_genre ,computer - Published
- 2020
47. Survey on Research Paper Classification based on TF-IDF and Stemming Technique using Classification Algorithm
- Author
-
S. A. and Kshitija G.
- Subjects
Computer science ,Data mining ,tf–idf ,computer.software_genre ,computer - Published
- 2020
48. Automatic Identification of Compare Paper Relations
- Author
-
Yuliant Sibaroni, Masayu Leylia Khodra, and Dwi H. Widyantoro
- Subjects
Computer science ,General Engineering ,Identification (biology) ,Data mining ,computer.software_genre ,computer - Published
- 2020
49. Adaptive weights clustering of research papers
- Author
-
Kirill Efimov, Larisa Adamyan, Wolfgang Karl Härdle, and Cathy Yi-Hsuan Chen
- Subjects
JEL system ,Adaptive algorithm ,Point (typography) ,Computer science ,330 Wirtschaft ,05 social sciences ,Nonparametric statistics ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Clustering ,Weighting ,0502 economics and business ,ddc:330 ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Economic articles ,Nonparametric ,Data mining ,050207 economics ,Cluster analysis ,computer ,Research center - Abstract
The JEL classification system is a standard way of assigning key topics to economic articles to make them more easily retrievable in the bulk of nowadays massive literature. Usually the JEL (Journal of Economic Literature) is picked by the author(s) bearing the risk of suboptimal assignment. Using the database of the Collaborative Research Center from Humboldt-Universität zu Berlin we employ a new adaptive clustering technique to identify interpretable JEL (sub)clusters. The proposed Adaptive Weights Clustering (AWC) is available on http://www.quantlet.de/ and is based on the idea of locally weighting each point (document, abstract) in terms of cluster membership. Comparison with $$k$$ k -means or CLUTO reveals excellent performance of AWC.
- Published
- 2020
50. An Intelligent Paper Currency Recognition System
- Author
-
Muhammad Sarfraz
- Subjects
Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Paper currency ,classification ,computer.software_genre ,image processing ,radial basis function network ,Currency ,Pattern recognition (psychology) ,Recognition system ,General Earth and Planetary Sciences ,Data mining ,intelligent system ,computer ,General Environmental Science - Abstract
Paper currency recognition (PCR) is an important area of pattern recognition. A system for the recognition of paper currency is one kind of intelligent system which is a very important need of the current automation systems in the modern world of today. It has various potential applications including electronic banking, currency monitoring systems, money exchange machines, etc. This paper proposes an automatic paper currency recognition system for paper currency. A method of recognizing paper currencies has been introduced. This is based on interesting features and correlation between images. It uses Radial Basis Function Network for classification. The method uses the case of Saudi Arabian paper currency as a model. The method is quite reasonable in terms of accuracy. The system deals with 110 images, 10 of which are tilted with an angle less than 15o. The rest of the currency images consist of mixed including noisy and normal images 50 each. It uses fourth series (1984–2007) of currency issued by Saudi Arabian Monetary Agency (SAMA) as a model currency under consideration. The system produces accuracy of recognition as 95.37%, 91.65%, and 87.5%, for the Normal Non-Tilted Images, Noisy Non-Tilted Images, and Tilted Images respectively. The overall Average Recognition Rate for the data of 110 images is computed as 91.51%. The proposed algorithm is fully automatic and requires no human intervention. The proposed technique produces quite satisfactory results in terms of recognition and efficiency.
- Published
- 2015
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.