12 results
Search Results
2. Peaks, Slopes, Canyons and Plateaus: Identifying Technology Trends Throughout the Life Cycle.
- Author
-
Efimenko, Irina V. and Khoroshevsky, Vladimir F.
- Subjects
TECHNOLOGICAL innovations ,TRENDS ,TEXT files ,NATURAL language processing ,ALGORITHMS ,SEMANTICS ,DATA extraction ,DATA mining - Abstract
A novel domain-independent approach to technology trend monitoring is presented in the paper. It is based on the ontology of a technology trend, hype cycles methodology, and semantic indicators which provide evidence of a maturity level of a technology. This approach forms the basis for implementation of text-mining software tools. Algorithms behind these tools allow users to escape from getting too general or garbage results which make it impossible to identify promising technologies at early stages (early detection, weak signals). Besides, these algorithms provide high-quality results in extraction of complex multiword terms which correspond to technological concepts forming a trend. Methodology and software developed as a result of this study are applicable to various industries with minor adjustments and require no deep expert knowledge from a user. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
3. Technical Review: Sarcasm Detection Algorithms.
- Author
-
Yavanoglu, Uraz, Ibisoglu, Taha Yasin, and Wıcana, Setra Genyang
- Subjects
DATA mining ,PSYCHOLINGUISTICS ,MACHINE learning - Abstract
In this paper, we want to review one of the challenging problems for the opinion mining task, which is sarcasm detection. To be able to do that, many researchers tried to explore such properties in sarcasm like theories of sarcasm, syntactical properties, psycholinguistic of sarcasm, lexical feature, semantic properties, etc. Studies conducted within last 15 years have not only made progress in semantic features but have also shown increasing amounts of methods of analysis using a machine-learning approach to process data. Therefore, this paper will try to explain the most currently used methods to detect sarcasm. Lastly, we will present a result of our finding, which might help other researchers to gain a better result in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
4. AN EFFICIENT OBJECT ORIENTED TEXT ANALYSIS (OOTA) APPROACH TO CONSTRUCT STATIC STRUCTURE WITH DYNAMIC BEHAVIOR.
- Author
-
EL-SAID, ASMAA M., ELDESOKY, ALI I., and ARAFAT, HESHAM A.
- Subjects
OBJECT-oriented methods (Computer science) ,ELECTRONIC data processing ,INFORMATION overload ,DATA mining ,MACHINE learning ,NATURAL language processing ,INFORMATION retrieval ,KNOWLEDGE management - Abstract
In many fields of science, IT applications and business environments successfully evolved systems to receive vast amount of electronic data and information. Due to increasing electronic data and information, most recent researches have tried to find a solution to resolve the crisis of information overload. These solutions include a combination of techniques of data mining, machine learning, natural language processing and information retrieval, information extraction, and knowledge management. A great challenge is how to exploit those information and knowledge resources and turn them into useful knowledge available to concerned people. The value of knowledge increases when people can share and capitalize on it. Thus, approaches that can help researchers to benefit from existing hidden knowledge are needed. For this, tools that can analyze, extract and explore relevant and useful information with relations are required. So, the main contribution of this paper is to integrate the technology of XML with text analysis for introducing an efficient concept-based structure model, where this model can represent the text in a form that can be easily understood, shared, managed and mined. This paper describes an efficient object oriented text analysis (OOTA) approach by generating an object oriented model that transforms unstructured text to a specific structured form and stored in XML format. The experimental results show that this approach has a good promotion on results. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
5. Extraction of Meaningful Information from Unstructured Clinical Notes Using Web Scraping.
- Author
-
Varshini, K. Sukanya and Uthra, R. Annie
- Subjects
NATURAL language processing ,MACHINE learning ,DATA mining ,FEATURE selection ,RANDOM forest algorithms ,MEDICAL transcription - Abstract
In the medical field, the clinical notes taken by the doctor, nurse, or medical practitioner are considered to be one of the most important medical documents. These documents hold information regarding the patient including the patient's current condition, family history, disease, symptoms, medications, lab test reports, and other vital information. Despite these documents holding important information regarding the patients, they cannot be used as the data are unstructured. Organizing a huge amount of data without any mistakes is highly impossible for humans, so ignoring unstructured data is not advisable. Hence, to overcome this issue, the web scraping method is used to extract the clinical notes from the Medical Transcription (MT) samples which hold many transcripted clinical notes of various departments. In the proposed method, Natural Language Processing (NLP) is used to pre-process the data, and the variants of the Term Frequency-Inverse Document Frequency (TF-IDF)-based vector model are used for the feature selection, thus extracting the required data from the clinical notes. The performance measures including the accuracy, precision, recall and F1 score are used in the identification of disease, and the result obtained from the proposed system is compared with the best performing machine learning algorithms including the Logistic Regression, Multinomial Naive Bayes, Random Forest classifier and Linear SVC. The result obtained proves that the Random Forest Classifier obtained a higher accuracy of 90% when compared to the other algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. A NEW METHODOLOGY FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB.
- Author
-
FRIKH, BOUCHRA, DJAANFAR, AHMED SAID, and OUHBI, BRAHIM
- Subjects
ONTOLOGIES (Information retrieval) ,WEB services ,NATURAL language processing ,INFORMATION retrieval ,INTERNET ,DATA mining ,INFORMATION resources ,ALGORITHMS - Abstract
Resources like ontologies are used in a number of applications, including natural language processing, information retrieval(especially from the Internet). Different methods have been proposed to build such resources. This paper proposes a new method to extract information from the Web to build a taxonomy of terms and Web resources for a given domain. Firstly, a (CHIR) method is used to identify candidat terms. Then a similarity (SIM) measure is introduced to select relevant concepts to build the ontology. Our new algorithm, called (CHIRSIM), is easy to implement and can be efficiently integrated into an information retrieval system to help improve the retrieval performance. Experimental results show that the proposed approach can effectively and efficiently construct a cancer domain ontology from unstructured text documents. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
7. Mining Semantics Structures from Syntactic Structures in Web Document Corpora.
- Author
-
Mousavi, Hamid, Gao, Shi, Kerr, Deirdre, Iseli, Markus, and Zaniolo, Carlo
- Subjects
WORLD Wide Web ,DATA mining ,SEMANTIC computing ,DATA management ,DATA science - Abstract
The Web is making possible many advanced text-mining applications, such as news summarization, essay grading, question answering, semantic search and structured queries on corpora of Web documents. For many of such applications, statistical text-mining techniques are of limited effectiveness since they do not utilize the morphological structure of the text. On the other hand, many approaches use NLP-based techniques that parse the text into parse trees, and then use patterns to mine and analyze parse trees which are often unnecessarily complex. To reduce this complexity and ease the entire process of text mining, we propose a weighted-graph representation of text, called TextGraphs, which captures the grammatical and semantic relations between words and terms in the text. TextGraphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves coreferences by a novel technique, generates domain-specific TextGraphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining TextGraphs. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
8. A Hybrid Multilingual Fuzzy-Based Approach to the Sentiment Analysis Problem Using SentiWordNet.
- Author
-
Madani, Youness, Erritali, Mohammed, Bengourram, Jamaa, and Sailhan, Francoise
- Subjects
- *
SENTIMENT analysis , *NATURAL language processing , *FUZZY logic , *SOCIAL network analysis , *DATA mining , *PRODUCT reviews - Abstract
Sentiment Analysis or in particular social network analysis (SNA) is a new research area which is increased explosively. This domain has become a very active research issue in data mining and natural language processing. Sentiment analysis (opinion mining) consists in analyzing and extracting emotions, opinions or attitudes from product's reviews, movie's reviews, etc., and classify them into classes such as positive, negative and neutral, or extract the degree of importance (polarity). In this paper, we propose a new hybrid approach for classifying tweets into classes based on fuzzy logic and a lexicon based approach using SentiWordnet. Our approach consists in classifying tweets according to three classes: positive, negative or neutral, using SentiWordNet and the fuzzy logic with its three important steps: Fuzzification, Rule Inference/aggregation, and Defuzzification. The dataset of tweets to classify and the result of the classification are stored in the Hadoop Distributed File System (HDFS), and we use the Hadoop MapReduce for the application of our proposal. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
9. AUTOMATIC GENERATION OF CROSSWORD PUZZLES.
- Author
-
RIGUTINI, LEONARDO, DILIGENTI, MICHELANGELO, MAGGINI, MARCO, and GORI, MARCO
- Subjects
AUTOMATION ,CROSSWORD puzzles ,HUMAN-computer interaction ,CONSTRAINT satisfaction ,COMPUTER programming ,DATA mining ,NATURAL language processing - Abstract
Crossword puzzles are used everyday by millions of people for entertainment, but have applications also in educational and rehabilitation contexts. Unfortunately, the generation of ad-hoc puzzles, especially on specific subjects, typically requires a great deal of human expert work. This paper presents the architecture of WebCrow-generation, a system that is able to generate crosswords with no human intervention, including clue generation and crossword compilation. In particular, the proposed system crawls information sources on the Web, extracts definitions from the downloaded pages using state-of-the-art natural language processing techniques and, finally, compiles the crossword schema with the extracted definitions by constraint satisfaction programming. The system has been tested on the creation of Italian crosswords, but the extensive use of machine learning makes the system easily portable to other languages. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
10. Solving Arithmetic Word Problems by Object Oriented Modeling and Query-Based Information Processing.
- Author
-
Mandal, Sourav and Naskar, Sudip Kumar
- Subjects
- *
INFORMATION processing , *INFORMATION modeling , *LEARNING Management System , *DATA mining , *ONLINE education , *NATURAL language processing - Abstract
The paper presents an Object Oriented Analysis and Design (OOAD) approach to modeling, reasoning and a database query based approach to processing and solving addition-subtraction (Add-Sub) type arithmetic Mathematical Word Problems (MWP) of elementary school level. The system identifies and extracts the key entities in a word problem like owners, items and their attributes and quantities, verbs, from all the input sentences, using a rule based Information Extraction (IE) approach based on Semantic Role Labeling (SRL) technique. These information are then stored in predefined templates which are further modeled to represent an MWP in the object-oriented paradigm and processed using query based approach to generate the answer. These kind of applications are based on Natural Language Processing (NLP), Natural Language Understanding (NLU) and Artificial Intelligence (AI), and can be used as intelligent dynamic mathematical tutoring tools as part of E-Learning systems, Learning Management Systems, on-line education, etc. The proposed object oriented mathematical word problem solver can solve arithmetic MWPs involving only addition-subtraction operations and it has produced an accuracy of 94.35% on a subset of the AI2 arithmetic questions dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
11. TEXT AND DATA MINING FOR BIOMEDICAL DISCOVERY.
- Author
-
GONZALEZ, GRACIELA, COHEN, KEVIN BRETONNEL, GREENE, CASEY S., KANN, MARICEL G., LEAMAN, ROBERT, SHAH, NIGAM, and JIEPING YE
- Subjects
DATA mining ,TEXT mining ,PROTEOMICS ,INDIVIDUALIZED medicine ,NATURAL language processing - Published
- 2013
12. SOCIAL MEDIA MINING SHARED TASK WORKSHOP.
- Author
-
SARKER, ABEED, NIKFARJAM, AZADEH, and GONZALEZ, GRACIELA
- Subjects
SOCIAL media ,DATA mining ,DRUG side effects ,ADULT education workshops ,NATURAL language processing - Published
- 2015
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.