20 results on '"Web mining"'
Search Results
2. Measuring Innovation in Mauritius' ICT Sector Using Unsupervised Machine Learning: A Web Mining and Topic Modeling Approach.
- Author
-
Böhmecke-Schwafert, Moritz and Dörries, Colin
- Abstract
Measuring innovation accurately and efficiently is crucial for policymakers to encourage innovation activity. However, the established indicator landscape lacks timeliness and accuracy. In this study, we focus on the country of Mauritius that is transforming its economy towards the information and communication technology (ICT) sector. We seek to extend the knowledge base on innovation activity and the status quo of innovation in Mauritius by applying an unsupervised machine learning approach. Building on previous work on new experimental innovation indicators, we combine recent advances in web mining and topic modeling and address the following research questions: What are potential areas of innovation activity in the ICT sector of Mauritius? Furthermore, do web mining and topic modeling provide sufficient indicators to understand innovation activities in emerging countries? To answer these questions, we apply the natural language processing (NLP) technique of Latent Dirichlet Allocation (LDA) to ICT companies' website text data. We then generate topic models from the scraped text data. As a result, we derive seven categories that describe the innovation activities of ICT firms in Mauritius. Albeit the model approach fulfills the requirements for innovation indicators as suggested in the Oslo Manual, it needs to be combined with additional metrics for innovation, for example, with traditional indicators such as patents, to unfold its potential. Furthermore, our approach carries methodological implications and is intended to be reproduced in similar contexts of scarce or unavailable data or where traditional metrics have demonstrated insufficient explanatory power. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Sharing economy from the sustainable development goals perspective: a path to global prosperity
- Author
-
Aref, Mayada
- Published
- 2024
- Full Text
- View/download PDF
4. Measuring corporate digital divide through websites: insights from Italian firms
- Author
-
Leonardo Mazzoni, Fabio Pinelli, and Massimo Riccaboni
- Subjects
Digital divide ,Website data ,Web mining ,Corporate big data ,Digital business capabilities ,Digital transformation ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract With the increasing pervasiveness of Information and Communication Technology (ICT) in the fabric of economic activities, the corporate digital divide has become a crucial issue for the assessment of Information Technology (IT) competencies and the digital gap between firms and territories. With little granular data available to measure the phenomenon, most studies have used survey data. To address this empirical gap, we scanned the homepages of 182,705 Italian companies and extracted ten characteristics related to their digital footprint to develop a new index for the corporate digital assessment. Our results show a significant digital divide between Italian companies according to size, sector and geographical location, opening new perspectives for monitoring and data-driven analysis.
- Published
- 2024
- Full Text
- View/download PDF
5. Comprehensive selective improvements in agri-informatics semantics.
- Author
-
Ishaq, Muhammad, Khan, Abdullah, Asim, Muhammad, Khan, Asfandyar, and Iqbal Bangash, Javed
- Subjects
- *
WORLD Wide Web , *INFORMATION technology , *SEMANTIC Web , *PROJECT finance , *WEB development - Abstract
The advent of information technology re-innovates all sectors of bio-sciences. Researchers use Semantic Web to improve web searching, mining and integration, which alleviates the time-consuming task of finding relevant and high-quality content. Semantics is improved through ontology engineering in any domain. Amended and developed ontologies will be uploaded to existing standardised and approved biomedical repositories. The establishment of a World Wide Web Consortium (W3C) approved and standardised ontology repository is the most ambitious goal. This work will solely focus on some selected agri-ontologies. The main objective is to promote outcome-based research and transformation styles of relevant expertise sharing. The intended goal is to win project funding to train and equip students with relevant skills and expertise. Need-based and market-oriented training and professional grooming are a tangible asset for students. The majority of traditional Web development freelancers are unaware of ontology or semantic web market demand. Freelancing is another option for expert Ontology developers. However, agriculture students are used to all the research vocabulary and terminologies in their area, but they do not know how to contribute their expertise to improve the efficiency of the Semantic Web in their domain. If the improvement in relevant ontology becomes a part of the Semantic Web, then it is termed 'Real-time Web semantics enhancement'. In other words, the target ontology becomes a part of the future Web of meaning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Artificial intelligence trend analysis on healthcare podcasts using topic modeling and sentiment analysis: a data-driven approach.
- Author
-
Dumbach, Philipp, Schwinn, Leo, Löhr, Tim, Do, Phi Long, and Eskofier, Bjoern M.
- Abstract
Over the past few decades, the topic of artificial intelligence (AI) has gained considerable attention in both research and industry. In particular, the healthcare sector has witnessed a surge in the use of AI applications, as the maturity of these methods increased. However, as the use of machine learning (ML) in healthcare continues to grow, we believe it will become increasingly important to examine public perceptions of this trend to identify potential impediments and future directions. Current work focuses mainly on academic data sources and industrial applications of AI. However, to gain a comprehensive understanding of the increased societal interest in AI, digital media such as podcasts should be consulted, as they are accessible to a broader audience. In order to examine this hypothesis, we investigate the AI trend development in healthcare from 2015 until 2021. In this study, we propose a web mining approach to collect a novel data set consisting of 29 healthcare podcasts with 3449 episodes. We identify 102 AI-related buzzwords that were extracted from various glossaries and hype cycles. These buzzwords were used to conduct an extensive trend detection and analysis study on the collected data using machine learning-based approaches. We successfully detect an AI trend and follow its evolution in healthcare podcasts over several years. Besides the focus area of AI, we are able to detect 14 topic clusters and visualize the trending or decreasing dominant topics over the whole period under consideration. In addition, we analyze the sentiments in podcasts towards the identified topics and deliver further insights for trend detection in healthcare. Finally, the collected data set can be used for trend detection besides AI-related topics using topic clustering. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Measuring corporate digital divide through websites: insights from Italian firms.
- Author
-
Mazzoni, Leonardo, Pinelli, Fabio, and Riccaboni, Massimo
- Subjects
INFORMATION technology ,DIGITAL transformation ,DIGITAL footprint ,INFORMATION & communication technologies ,TECHNOLOGY assessment ,DIGITAL divide - Abstract
With the increasing pervasiveness of Information and Communication Technology (ICT) in the fabric of economic activities, the corporate digital divide has become a crucial issue for the assessment of Information Technology (IT) competencies and the digital gap between firms and territories. With little granular data available to measure the phenomenon, most studies have used survey data. To address this empirical gap, we scanned the homepages of 182,705 Italian companies and extracted ten characteristics related to their digital footprint to develop a new index for the corporate digital assessment. Our results show a significant digital divide between Italian companies according to size, sector and geographical location, opening new perspectives for monitoring and data-driven analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Mixed methods examination of risk perception on vaccination intentions: The perspective of doctor–patient communication.
- Author
-
Zhou, Haichun, Zhao, Wenli, Ma, Rong, Zheng, Yishu, Guo, Yuxuan, Wei, Liangyu, and Wang, Mingyi
- Subjects
- *
RISK perception , *SOCIAL media , *NATURAL language processing , *VACCINATION , *INTENTION , *TRUST - Abstract
From the perspective of doctor–patient communication, this research used multiple methods combined natural language processing (NLP), a cross-sectional survey and an online experiment to investigated how risk perception influenced people's vaccination intention. In Study 1, we used Python to crawl 335,045 comments about COVID-19 vaccine published in a social media platform Sina Weibo (equivalent of Twitter in China) from 31 December 2020 to 31 December 2021. Text analysis and sentiment analysis was used to examine how vaccination intention, as measured by linguistic features from the LIWC dictionary, changed with individuals' perceptions of pandemic risk. In Study 2, we adopted a cross-sectional questionnaire survey to further test the relation of risk perception, vaccination intention, trust in physicians, and perceived medical recommendations in a Chinese sample (n = 386). In Study 3, we conducted an online experiment where we recruited 127 participants with high trust in physicians and 127 participants with low trust, and subsequently randomly allocated them into one of three conditions: control, rational recommendation, or perceptual recommendation. Text and sentiment analysis revealed that the use of negative words towards COVID-19 vaccine had a significant decrease at high (vs. low) risk perception level time (Study 1). Trust in physicians mediated the effect of risk perception on vaccination intention and this effect was reinforced for participants with low (vs. high) level of perceived medical recommendation (Study 2), especially for the rational (vs. perceptual) recommendation condition (Study 3). Risk perception increased vaccination intention through the mediating effect of trust in physicians and the moderating effect of perceived medical recommendations. Rational (vs. perceptual) recommendation is more effective in increasing intention to get vaccinated in people with low trust in physicians. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. The Detection of Misstated Financial Reports Using XBRL Mining and Intelligible MLP
- Author
-
Trigueiros, Duarte, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Daimi, Kevin, editor, and Al Sadoon, Abeer, editor
- Published
- 2024
- Full Text
- View/download PDF
10. Web Scams Detection System
- Author
-
Badawi, Emad, Jourdan, Guy-Vincent, Onut, Iosif-Viorel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mosbah, Mohamed, editor, Sèdes, Florence, editor, Tawbi, Nadia, editor, Ahmed, Toufik, editor, Boulahia-Cuppens, Nora, editor, and Garcia-Alfaro, Joaquin, editor
- Published
- 2024
- Full Text
- View/download PDF
11. Web Mining for Estimating Regulatory Blockchain Readiness
- Author
-
Vlachos, Andreas, Iosif, Elias, Christodoulou, Klitos, van der Aalst, Wil, Series Editor, Ram, Sudha, Series Editor, Rosemann, Michael, Series Editor, Szyperski, Clemens, Series Editor, Guizzardi, Giancarlo, Series Editor, Papadaki, Maria, editor, Themistocleous, Marinos, editor, Al Marri, Khalid, editor, and Al Zarouni, Marwan, editor
- Published
- 2024
- Full Text
- View/download PDF
12. Information diffusion in referral networks: an empirical investigation of the crypto asset landscape
- Author
-
Vasudevan, Srinidhi, Piazza, Anna, and Ghinoi, Stefano
- Published
- 2024
- Full Text
- View/download PDF
13. Links Evaluation and Ranking Based on Semantic Metadata Analysis.
- Author
-
Abdulmunim, Matheel E. and Naamha, Esraa Q.
- Subjects
- *
METADATA , *WORLD Wide Web , *WEB search engines , *WEBSITES , *SEARCH engines - Abstract
There is a vast quantity of information contained within the billions of web pages that make up the World Wide Web (WWW). Search engines carry out a variety of activities depending on their own architectures for retrieving the necessary information from the WWW. The search engine typically returns a huge number of pages in response to a user's query when the user submits one. Numerous ranking techniques have been utilized on search results to aid consumers in navigating the result list. The majority of ranking algorithms described in literature are either content-based or link-based, and they do not take user usage patterns into account. The presented study discusses web mining ranking algorithms depending on structures, contents, and usages and suggests a new ranking method to assess the significance of links with the use of semantic metadata analysis, which considers the number of links visited across nearly all regions, time periods, and related topics and queries. Furthermore, the suggested system uses the user's query to find more relevant information. The most valuable pages are thus displayed at the top of the result list based on user browsing behavior, which significantly decreases the search space. Results showed better ranking output based on different criteria, such as the number of links visited yearly, the number of links visited hourly, the number of links visited by region, the number of links visited by related topics, and the number of links visited by related queries. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Web mining and sentiment analysis of COVID-19 discourse in online forum communities.
- Author
-
Mohamad, Masurah, Masrom, Suraya, Salleh, Khairulliza Ahmad, Alfat, Lathifah, Nasucha, Muhammad, and Uddin, Nur
- Subjects
INTERNET forums ,SENTIMENT analysis ,DISCOURSE analysis ,COVID-19 ,VIRTUAL communities ,SUPPORT vector machines - Abstract
Recently, various discussions, solutions, data, and methods related to coronavirus disease 2019 (COVID-19) have been posted in online forum communities. Although a vast amount of posting on COVID-19 analytical projects are available in the online forum communities, much of them remain untapped due to limited overview and profiling that focuses on COVID-19 analytic techniques. Thus, it is quite challenging for information diggers and researchers to distinguish the recent trends and challenges of COVID-19 analytic for initiating different and critical studies to fight against the coronavirus. This paper presents the findings of a study that executed a web mining process on COVID-19 data analytical projects from the Stack Overflow and GitHub online community platforms for data scientists. This study provides an insight on what activities can be conducted by novice researchers and others who are interested in data analysis, especially in sentiment analysis. The classification results via Naïve Bayes (NB), support vector machine (SVM) and logistic regression (LR) have returned high accuracy, indicating that the constructed model is efficient in classifying the sentiment data of COVID-19. The findings reported in this paper not only enhance the understanding of COVID-19 related content and analysis but also provides promising framework that can be applied in diverse contexts and domains. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Web Mining of Online Resources for German Labor Market Research and Education: Finding the Ground Truth?
- Author
-
Fischer, Andreas and Dörpinghaus, Jens
- Subjects
LABOR market ,GERMAN language ,NATURAL language processing ,GOVERNMENT websites ,EDUCATION marketing ,MARKETING research - Abstract
The labor market is highly dependent on vocational and academic education, training, retraining, and further education in order to master challenges such as advancing digitalization and sustainability. Further training is a key factor in ensuring a qualified workforce, the employability of all employees, and, thus, national competitiveness and innovation. In the contribution at hand, we explore an innovative way to derive knowledge about learning pathways by connecting the dots from different data sources of the German labor market. In particular, we focus on the web mining of online resources for German labor market research and education, such as online advertisements, information portals, and official government websites. A key question for working with different data sources is how to find the ground truth and common data structures that can be used to make the data interoperable. We discuss how to classify and summarize web data from different platforms and which methods can be used for extracting data, entities and relationships from online resources on the German labor market to build a network of educational pathways. Our proposed solution is based on the classification of occupations (KldB) and related document codes (DKZ), and combines natural language processing and knowledge graph technologies. Our research provides the foundation for further investigation into educational pathways and linked data for labor market research. While our work focuses on German data, it is also useful for other German-speaking countries and could easily be extended to other languages such as English. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Web Mining of Online Resources for German Labor Market Research and Education: Finding the Ground Truth?
- Author
-
Andreas Fischer and Jens Dörpinghaus
- Subjects
web mining ,knowledge discovery and data mining ,knowledge discovery ,labor market research ,research and development towards society ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The labor market is highly dependent on vocational and academic education, training, retraining, and further education in order to master challenges such as advancing digitalization and sustainability. Further training is a key factor in ensuring a qualified workforce, the employability of all employees, and, thus, national competitiveness and innovation. In the contribution at hand, we explore an innovative way to derive knowledge about learning pathways by connecting the dots from different data sources of the German labor market. In particular, we focus on the web mining of online resources for German labor market research and education, such as online advertisements, information portals, and official government websites. A key question for working with different data sources is how to find the ground truth and common data structures that can be used to make the data interoperable. We discuss how to classify and summarize web data from different platforms and which methods can be used for extracting data, entities and relationships from online resources on the German labor market to build a network of educational pathways. Our proposed solution is based on the classification of occupations (KldB) and related document codes (DKZ), and combines natural language processing and knowledge graph technologies. Our research provides the foundation for further investigation into educational pathways and linked data for labor market research. While our work focuses on German data, it is also useful for other German-speaking countries and could easily be extended to other languages such as English.
- Published
- 2024
- Full Text
- View/download PDF
17. Scraping Relevant Images from Web Pages without Download.
- Author
-
Uzun, Erdinç
- Subjects
WEBSITES ,ERROR rates ,MACHINE learning ,NEWS websites - Abstract
Automatically scraping relevant images from web pages is an error-prone and time-consuming task, leading experts to prefer manually preparing extraction patterns for a website. Existing web scraping tools are built on these patterns. However, this manual approach is laborious and requires specialized knowledge. Automatic extraction approaches, while a potential solution, require large training datasets and numerous features, including width, height, pixels, and file size, that can be difficult and time-consuming to obtain. To address these challenges, we propose a semi-automatic approach that does not require an expert, utilizes small training datasets, and has a low error rate while saving time and storage. Our approach involves clustering web pages from a website and suggesting several pages for a non-expert to annotate relevant images. The approach then uses these annotations to construct a learning model based on textual data from the HTML elements. In the experiments, we used a dataset of 635,015 images from 200 news websites, each containing 100 pages, with 22,632 relevant images. When comparing several machine learning methods for both automatic approaches and our proposed approach, the AdaBoost method yields the best performance results. When using automatic extraction approaches, the best f-Measure that can be achieved is 0.805 with a learning model constructed from a large training dataset consisting of 120 websites (12,000 web pages). In contrast, our approach achieved an average f-Measure of 0.958 for 200 websites with only six web pages annotated per website. This means that a non-expert only needs to examine 1,200 web pages to determine the relevant images for 200 websites. Our approach also saves time and storage space by not requiring the download of images and can be easily integrated into currently available web scraping tools, because it is based on textual data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Spectral clustering algorithm based web mining and quadratic support vector machine for learning style prediction in E-learning platform
- Author
-
K.N. Prashanth Kumar, B.T. Harish Kumar, and A. Bhuvanesh
- Subjects
E-Learning ,Learning style prediction ,Machine learning ,Web mining ,And quadratic SVM ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 - Abstract
A learning system, which is composed of a computer and the internet as the major elements, is termed an e-learning platform. It also promotes the education standard with the utilization of modern technology and equipment. Meanwhile, to enhance the standard of education significantly, it is important to predict the learning style of the users with the assistants of feedback and supervision. Nevertheless, it will avert the inherent correlation among e-learning behaviors. Hence, to predict the learning style automatically we propose a novel Spectral Clustering algorithm based Quadratic Support Vector Machine (E-SVM) approach. Our proposed approach employs two phases: (i) Utilizing the Web usage mining approach the learning secrets are extracted from the log files of learners. (ii) The classification of learning styles of learners is effectuated with the proposed approach. Experiments are demonstrated with Python package and analyzed the performance. For simulation, we have utilized real-time dataset and compared the results with the state-of-art approaches. Our approach surpasses all the other approaches.
- Published
- 2024
- Full Text
- View/download PDF
19. How to obtain product green design requirements based on sentiment analysis and topic analysis: Using washing machine online reviews as an example.
- Author
-
Xuan, Yan, Zhang, Lei, Bao, Hong, and Hu, Jiaqi
- Subjects
- *
SUSTAINABLE design , *WASHING machines , *GREEN products , *PRODUCT design , *SENTIMENT analysis - Abstract
Green design involves the entire life cycle of a product, including stages such as raw material acquisition, production and manufacturing, sales and transportation, use, recycling, and disposal. Extracting customer requirements (CRs) related to product green design (PGD) is one of the necessary conditions for achieving the dual carbon goal. However, only a few studies have evaluated CRs for PGD from a full life cycle perspective. This study obtained 20,000 online reviews of washing machines from e-commerce platforms. The customers' sentiment tendencies toward the requirements of washing machines at various stages of their life cycle are analyzed and evaluated. The CRs contained in online washing machine reviews were identified through cluster analysis. Based on the life cycle theory, the product green design requirements (PGDRs) of CRs were extracted and analyzed. This study can provide theoretical and methodological support for green product design. [Display omitted] • Calculated the emotion score of each sentence based on the emotion word matching algorithm and constructed a dataset consisting of negative and neutral review data. • Topic analysis of online reviews on washing machines based on the LDA model and the CRs information contained in online reviews was mined through keywords of each topic. • Based on the whole life cycle design theory, the explicit and implicit green design requirements information of each stage of the product life cycle contained in the CRs information was analyzed. • Calculate the attention and satisfaction of different requirements information to obtain users' expectations and demands for product green design. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Demystifying The Hosting Infrastructure of The Free Content Web: A Security Perspective
- Author
-
Alqadhi, Mohammed
- Subjects
- Web Security, Web Mining, Geographical Distribution Analysis, Hosting Infrastructure, Free Content, Premium Content
- Abstract
This dissertation delves into the security of free content websites, a crucial internet component that presents significant security challenges due to their susceptibility to exploitation by malicious actors. While prior research has highlighted the security disparities between free and premium content websites, it has not delved into the underlying causes. This study aims to address this gap by examining the security infrastructure of free content websites. The research commences with an analysis of the content management systems (CMSs) employed by these websites and their role. Data from 1,562 websites encompassing free and premium categories is collected to identify CMS usage and its association with malicious activities. Various metrics are employed, including unpatched vulnerabilities, total vulnerabilities, malicious counts, and percentiles. The findings reveal widespread CMS usage, even among websites with custom code, underscoring the potential for a small number of unpatched vulnerabilities in popular CMSs to lead to significant maliciousness. The study further explores the global distribution of free content websites, considering factors such as hosting network scale, cloud service provider utilization, and country-level distribution. Notably, free and premium content websites are predominantly hosted in medium-scale networks, known for their high concentration of malicious websites. Moreover, the research delves into the geographical distribution of these websites and their presence in different countries. It examines the occurrence of malicious websites and their correlation with the National Cyber Security Index (NCSI), a measure of a country's cybersecurity maturity. The United States emerges as the primary host for most investigated websites, with countries exhibiting higher rates of malicious websites tending to have lower NCSI scores, primarily due to weaker privacy policy development. In conclusion, this dissertation uncovers correlations in the infrastructure, distribution, and geographical aspects of free content websites, offering valuable insights for mitigating their associated threats.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.