Descriptor: "Doc2Vec" / Journal: ieee access - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Doc2Vec"' showing total 8 results

Start Over Descriptor "Doc2Vec" Journal ieee access

8 results on '"Doc2Vec"'

1. LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach

Author: Aqsa Khalid, Maria Hanif, Abdul Hameed, Zeeshan Ashraf, Mrim M. Alnfiai, and Salma M. Mohsen Alnefaie
Subjects: TF-IDF, Word2Vec, Doc2Vec, LogiTriBlend, SVM, XGBoost, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Email phishing remains a prevalent and sophisticated cyber threat, targeting individuals and organizations by disguising malicious intent in seemingly legitimate communications. Effective classification of phishing and legitimate emails is crucial for cybersecurity. In this study, we investigated various text vectorization techniques and machine learning models to address the challenge of email classification. We utilized three vectorization techniques: TF-IDF, Word2Vec, and Doc2Vec. These techniques were applied to traditional machine learning algorithms, and their performance was evaluated against a proposed stacking model, LogiTriBlend. The dataset comprised 501 phishing and 4090 legitimate emails, undergoing preprocessing steps like stemming, lemmatization, and noise removal. To handle the dataset’s imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was employed. The model combines multiple base learners, including Support Vector Machine (SVM), Logistic Regression, Random Forest, and XGBoost, with a Logistic Regression meta-learner. The experimental results indicated that the LogiTriBlend model achieved an accuracy of 99.34% using Doc2Vec, outperforming Word2Vec and TF-IDF feature extraction methods, which obtained accuracies of 99.12% and 98.80%, respectively. The Doc2Vec method resulting in superior email classification performance. Among the models tested, the proposed stacking model, LogiTriBlend, demonstrated robust results; however, the highest accuracy was consistently achieved using Doc2Vec.
Published: 2024
Full Text: View/download PDF

2. Learning Software Project Management From Analyzing Q&A’s in the Stack Exchange

Author: Alireza Ahmadi, Fatemeh Delkhosh, Gouri Deshpande, Raymond A. Patterson, and Guenther Ruhe
Subjects: Software project management, PMBOK, stack exchange, BERT, Doc2Vec, learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Software Project Management (SPM) is considered the key driver for the success or failure of software projects. Project failure is caused by various factors, the most important of which is poor SPM. Thus, we investigated the needs of practitioners by focusing on Project Management Q&A communities. More precisely, we targeted Stack Exchange to identify the primary needs of software project managers. More than 5000 SPM questions were analyzed from the conceptual model given by the Project Management Body of Knowledge PMBOK. For pre-training of the Machine Learning classifiers, we implemented Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec text embedding and compared their performance. Our results showed that BERT outperforms Doc2Vec for pre-training in almost all scenarios. Schedule management, followed by resource management, are the main PMBOK knowledge areas of concern for project managers. Among the process groups, the emphasis of the questions is on planning. We compared the findings with the learning and training status quo in 11 top Canadian universities. We analyzed 46 SPM-related courses and found that the rank correlation of PMBOK knowledge areas is 0.23 between the key content of the analyzed courses and the focus of Q&A’s knowledge areas analyzed from Stack Exchange.
Published: 2023
Full Text: View/download PDF

3. An in-Depth Analysis of the Software Features’ Impact on the Performance of Deep Learning-Based Software Defect Predictors

Author: Diana-Lucia Miholca, Vlad-Ioan Tomescu, and Gabriela Czibula
Subjects: Deep learning, Doc2vec, latent semantic indexing, software defect prediction, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Software Defects Prediction represents an essential activity during software development that contributes to continuously improving software quality and software maintenance and evolution by detecting defect-prone modules in new versions of a software system. In this paper, we are conducting an in-depth analysis on the software features’ impact on the performance of deep learning-based software defect predictors. We further extend a large-scale feature set proposed in the literature for detecting defect-proneness, by adding conceptual software features that capture the semantics of the source code, including comments. The conceptual features are automatically engineered using Doc2Vec, an artificial neural network based prediction model. A broad evaluation performed on the Calcite software system highlights a statistically significant improvement obtained by applying deep learning-based classifiers for detecting software defects when using conceptual features extracted from the source code for characterizing the software entities.
Published: 2022
Full Text: View/download PDF

4. A WeChat Official Account Reading Quantity Prediction Model Based on Text and Image Feature Extraction

Author: Zijian Bai, Shuangyi Ma, and Geng Li
Subjects: Feature extraction, neural network, WeChat official accounts, Doc2Vec, user engagement, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: This paper describes a study that built a neural network prediction model based on feature extraction, focusing on text analysis and image analysis of WeChat official accounts reading quantity. Based on the embedding method of the deep learning model, we extracted the text features in the title and the image features in the cover picture, explored the relationship between these features and the reading quantity, and built a neural network model based on these features to predict the reading quantity. The results show that there is a phenomenon of sentiment fusion in the text, and a sentence vector model based on Doc2Vec and a neural network model both had a good performance. This paper proposes a tool that can predict the reading quantity in advance and help administrators adjust the titles and images according to the predicted results.
Published: 2022
Full Text: View/download PDF

5. An Approximate Model for Event Detection From Twitter Data

Author: Aarzoo Dhiman and Durga Toshniwal
Subjects: Graph based event detection, social media data, uncertain clustering, word2vec, doc2vec, Jose twitter graph, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The abundance and real-time availability of Twitter data have proved beneficial in detecting events in various domains such as emergency situations, crime detection, public health, place recommendations, etc. Nevertheless, two critical challenges occur while detecting events using social media data. First, the uncertainty in capturing the contextual relationship among tweets, which is the result of the limited availability of the contextual information due to the small length of tweets. Second, the high computation cost required in event detection due to massive data processing. Earlier research works, addressing these challenges, have tried to capture the contextual information by using the dense vector representations of texts leveraging deep neural word embedding generation models such as Word2Vec and GloVe. However, these models are trained on the Euclidean vector space which fails to amalgamate the directional information of the vectors with the semantic information in text, incurring high computational costs. To target both the problems simultaneously, we propose modeling Twitter data as a graph-of-sentences which retains the contextual relationships while maintaining lower computational cost. The proposed model captures contextual information using JoSE, a spherical vector representation leveraging the word-word and word-paragraph semantic co-occurrence statistics in a spherical generative model. Furthermore, the framework uses the weighted-graph model to capture all the relationships among the Twitter data efficiently. The graph is further pruned with the help of the graph component filtering approach. The graph clustering model, employed to detect the events, leverages the edge weights and the partial-k clustering approach maintaining low computation costs. The experimentation on the annotated benchmark Twitter data set and the real-world datasets show improved run-time performance up to 30% while maintaining the qualitative performance (F1-score) comparable to the state-of-the-art models.
Published: 2020
Full Text: View/download PDF

6. Investment Universe Construction Based on the Theme Keyword Search

Author: Do-Guk Kim and Bonggyun Ko
Subjects: Doc2vec, investment universe, keyword search, 10-K report, theme keywords, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Investment universe means a pool of selected assets likely to be profitable. In general, assets related to a common theme or concept are selected to form an investment universe. A well organized investment universe has a strong common theme and reduces the risk by diversifying the assets included in the universe. To form or revise an investment universe, human efforts are required by using domain knowledge about the theme, but it is hard to suggest an investment universe that reflects the latest market trends. In this paper, we propose an automated investment universe selection method based on theme keywords. The theme keywords are extracted from the news articles and the business section on the companies which are included in the S&P 500 index. After that, securities are selected to form an investment universe for each theme keywords. We employ a similarity value between the security vector and the theme keyword vector to select the related securities for the certain keywords. Regarding the vector representations, word and document embeddings are carried out using both news articles and the business section on the companies. Stock price movements of the selected securities are similar which means that the investment universes are well organized to suggest the assets tightly associated with the theme. The experimental results show that the proposed method has high future returns on the investment universes with low or high historical stock price returns.
Published: 2019
Full Text: View/download PDF

7. Semantic-Aware Visual Abstraction of Large-Scale Social Media Data With Geo-Tags

Author: Zhiguang Zhou, Xinlong Zhang, Xiaoyun Zhou, and Yuhua Liu
Subjects: Doc2vec, social media, geo-tagged, blue noise sampling, visual abstraction, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the rapid growth of geo-tagged social media data, it has become feasible to explore topics across different areas through text mining and geographical visualization. However, the visual elements of social media data always overlap with each other in the map view, which largely disturbs visual perception of semantic features and their geographical distribution. Thus, it is of great significance to reduce the visual clutter of large-scale social media data, and enhance the visibility of semantic features across local areas. In this paper, we utilize a doc2vec model to transform geo-tagged social media data into high-dimensional vectors, and the semantic correlation can be easily characterized in the dimensionality reduction space. Aiming at the reduction of visual clutter of geographical visualization, a dual-objective blue noise sampling model is proposed to select a subset of social media data, by means of which both the semantic correlation and spatial distribution of large scale social media data are well retained. A rich set of visual designs are implemented enabling users to evaluate the sampled results from multiple perspectives and explore the changes of semantic features across areas, such as heatmap, word cloud and text stream. The effectiveness and validity of the proposed visualization system are further demonstrated through case studies and expert reviews.
Published: 2019
Full Text: View/download PDF

8. An Approximate Model for Event Detection From Twitter Data

Author: Durga Toshniwal and Aarzoo Dhiman
Subjects: Jose twitter graph, Word embedding, General Computer Science, Computer science, doc2vec, Machine learning, computer.software_genre, social media data, uncertain clustering, General Materials Science, Word2vec, Cluster analysis, Clustering coefficient, Euclidean vector, Graph based event detection, business.industry, Event (computing), General Engineering, word2vec, Generative model, Graph (abstract data type), Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, computer, lcsh:TK1-9971
Abstract: The abundance and real-time availability of Twitter data have proved beneficial in detecting events in various domains such as emergency situations, crime detection, public health, place recommendations, etc. Nevertheless, two critical challenges occur while detecting events using social media data. First, the uncertainty in capturing the contextual relationship among tweets, which is the result of the limited availability of the contextual information due to the small length of tweets. Second, the high computation cost required in event detection due to massive data processing. Earlier research works, addressing these challenges, have tried to capture the contextual information by using the dense vector representations of texts leveraging deep neural word embedding generation models such as Word2Vec and GloVe. However, these models are trained on the Euclidean vector space which fails to amalgamate the directional information of the vectors with the semantic information in text, incurring high computational costs. To target both the problems simultaneously, we propose modeling Twitter data as a graph-of-sentences which retains the contextual relationships while maintaining lower computational cost. The proposed model captures contextual information using JoSE, a spherical vector representation leveraging the word-word and word-paragraph semantic co-occurrence statistics in a spherical generative model. Furthermore, the framework uses the weighted-graph model to capture all the relationships among the Twitter data efficiently. The graph is further pruned with the help of the graph component filtering approach. The graph clustering model, employed to detect the events, leverages the edge weights and the partial-k clustering approach maintaining low computation costs. The experimentation on the annotated benchmark Twitter data set and the real-world datasets show improved run-time performance up to 30% while maintaining the qualitative performance (F1-score) comparable to the state-of-the-art models.
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Doc2Vec"'

1. LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach

2. Learning Software Project Management From Analyzing Q&A’s in the Stack Exchange

3. An in-Depth Analysis of the Software Features’ Impact on the Performance of Deep Learning-Based Software Defect Predictors

4. A WeChat Official Account Reading Quantity Prediction Model Based on Text and Image Feature Extraction

5. An Approximate Model for Event Detection From Twitter Data

6. Investment Universe Construction Based on the Theme Keyword Search

7. Semantic-Aware Visual Abstraction of Large-Scale Social Media Data With Geo-Tags

8. An Approximate Model for Event Detection From Twitter Data

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

8 results on '"Doc2Vec"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources