Descriptor: "Web mining" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Web mining"' showing total 7,705 results

Start Over Descriptor "Web mining"

7,705 results on '"Web mining"'

1. Enhancing the website usage using process mining

Author: Choudhary, Chetna, Mehrotra, Deepti, and Shrivastava, Avinash K.
Published: 2024
Full Text: View/download PDF

2. Measuring Innovation in Mauritius' ICT Sector Using Unsupervised Machine Learning: A Web Mining and Topic Modeling Approach.

Author: Böhmecke-Schwafert, Moritz and Dörries, Colin
Abstract: Measuring innovation accurately and efficiently is crucial for policymakers to encourage innovation activity. However, the established indicator landscape lacks timeliness and accuracy. In this study, we focus on the country of Mauritius that is transforming its economy towards the information and communication technology (ICT) sector. We seek to extend the knowledge base on innovation activity and the status quo of innovation in Mauritius by applying an unsupervised machine learning approach. Building on previous work on new experimental innovation indicators, we combine recent advances in web mining and topic modeling and address the following research questions: What are potential areas of innovation activity in the ICT sector of Mauritius? Furthermore, do web mining and topic modeling provide sufficient indicators to understand innovation activities in emerging countries? To answer these questions, we apply the natural language processing (NLP) technique of Latent Dirichlet Allocation (LDA) to ICT companies' website text data. We then generate topic models from the scraped text data. As a result, we derive seven categories that describe the innovation activities of ICT firms in Mauritius. Albeit the model approach fulfills the requirements for innovation indicators as suggested in the Oslo Manual, it needs to be combined with additional metrics for innovation, for example, with traditional indicators such as patents, to unfold its potential. Furthermore, our approach carries methodological implications and is intended to be reproduced in similar contexts of scarce or unavailable data or where traditional metrics have demonstrated insufficient explanatory power. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Sharing economy from the sustainable development goals perspective: a path to global prosperity

Author: Aref, Mayada
Published: 2024
Full Text: View/download PDF

4. Illegal Online Gambling Site Detection using Multiple Resource-Oriented Machine Learning.

Author: Min, Moohong and Lee, Donggi Augustine
Subjects: *UNIFORM Resource Locators, *INTERNET gambling, *INTERNET security, *MACHINE learning, *IMAGE analysis
Abstract: The COVID-19 pandemic has led to faster digitalization and illegal online gambling has become popular. As illegal online gambling brings not only financial threats but also breaches in overall cyber security, this study defines the concept of absolute illegal online gambling (AIOG) using a machine-learning-driven approach with information gathered from public webpages. By analysing 11,172 sites to detect illegal online gambling, the proposed model classifies key features such as URLs (Uniform Resource Locator), WHOIS, INDEX, and landing page information. With a combination of text and image analyses with machine learning-driven approach, the proposed model offers the ensemble combination of attributes for high detection performance with the verification of common attributes from metadata in online gambling. This study suggests a strategy for dynamic resource utilization to increase the classification accuracy of the current environment. As a result, this research expands the scope of hybrid web mining through constant updating of data to achieve content-based filtering. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Measuring corporate digital divide through websites: insights from Italian firms

Author: Leonardo Mazzoni, Fabio Pinelli, and Massimo Riccaboni
Subjects: Digital divide, Website data, Web mining, Corporate big data, Digital business capabilities, Digital transformation, Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Abstract With the increasing pervasiveness of Information and Communication Technology (ICT) in the fabric of economic activities, the corporate digital divide has become a crucial issue for the assessment of Information Technology (IT) competencies and the digital gap between firms and territories. With little granular data available to measure the phenomenon, most studies have used survey data. To address this empirical gap, we scanned the homepages of 182,705 Italian companies and extracted ten characteristics related to their digital footprint to develop a new index for the corporate digital assessment. Our results show a significant digital divide between Italian companies according to size, sector and geographical location, opening new perspectives for monitoring and data-driven analysis.
Published: 2024
Full Text: View/download PDF

6. Comprehensive selective improvements in agri-informatics semantics.

Author: Ishaq, Muhammad, Khan, Abdullah, Asim, Muhammad, Khan, Asfandyar, and Iqbal Bangash, Javed
Subjects: *WORLD Wide Web, *INFORMATION technology, *SEMANTIC Web, *PROJECT finance, *WEB development
Abstract: The advent of information technology re-innovates all sectors of bio-sciences. Researchers use Semantic Web to improve web searching, mining and integration, which alleviates the time-consuming task of finding relevant and high-quality content. Semantics is improved through ontology engineering in any domain. Amended and developed ontologies will be uploaded to existing standardised and approved biomedical repositories. The establishment of a World Wide Web Consortium (W3C) approved and standardised ontology repository is the most ambitious goal. This work will solely focus on some selected agri-ontologies. The main objective is to promote outcome-based research and transformation styles of relevant expertise sharing. The intended goal is to win project funding to train and equip students with relevant skills and expertise. Need-based and market-oriented training and professional grooming are a tangible asset for students. The majority of traditional Web development freelancers are unaware of ontology or semantic web market demand. Freelancing is another option for expert Ontology developers. However, agriculture students are used to all the research vocabulary and terminologies in their area, but they do not know how to contribute their expertise to improve the efficiency of the Semantic Web in their domain. If the improvement in relevant ontology becomes a part of the Semantic Web, then it is termed 'Real-time Web semantics enhancement'. In other words, the target ontology becomes a part of the future Web of meaning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Artificial intelligence trend analysis on healthcare podcasts using topic modeling and sentiment analysis: a data-driven approach.

Author: Dumbach, Philipp, Schwinn, Leo, Löhr, Tim, Do, Phi Long, and Eskofier, Bjoern M.
Abstract: Over the past few decades, the topic of artificial intelligence (AI) has gained considerable attention in both research and industry. In particular, the healthcare sector has witnessed a surge in the use of AI applications, as the maturity of these methods increased. However, as the use of machine learning (ML) in healthcare continues to grow, we believe it will become increasingly important to examine public perceptions of this trend to identify potential impediments and future directions. Current work focuses mainly on academic data sources and industrial applications of AI. However, to gain a comprehensive understanding of the increased societal interest in AI, digital media such as podcasts should be consulted, as they are accessible to a broader audience. In order to examine this hypothesis, we investigate the AI trend development in healthcare from 2015 until 2021. In this study, we propose a web mining approach to collect a novel data set consisting of 29 healthcare podcasts with 3449 episodes. We identify 102 AI-related buzzwords that were extracted from various glossaries and hype cycles. These buzzwords were used to conduct an extensive trend detection and analysis study on the collected data using machine learning-based approaches. We successfully detect an AI trend and follow its evolution in healthcare podcasts over several years. Besides the focus area of AI, we are able to detect 14 topic clusters and visualize the trending or decreasing dominant topics over the whole period under consideration. In addition, we analyze the sentiments in podcasts towards the identified topics and deliver further insights for trend detection in healthcare. Finally, the collected data set can be used for trend detection besides AI-related topics using topic clustering. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Measuring corporate digital divide through websites: insights from Italian firms.

Author: Mazzoni, Leonardo, Pinelli, Fabio, and Riccaboni, Massimo
Subjects: INFORMATION technology, DIGITAL transformation, DIGITAL footprint, INFORMATION & communication technologies, TECHNOLOGY assessment, DIGITAL divide
Abstract: With the increasing pervasiveness of Information and Communication Technology (ICT) in the fabric of economic activities, the corporate digital divide has become a crucial issue for the assessment of Information Technology (IT) competencies and the digital gap between firms and territories. With little granular data available to measure the phenomenon, most studies have used survey data. To address this empirical gap, we scanned the homepages of 182,705 Italian companies and extracted ten characteristics related to their digital footprint to develop a new index for the corporate digital assessment. Our results show a significant digital divide between Italian companies according to size, sector and geographical location, opening new perspectives for monitoring and data-driven analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Mixed methods examination of risk perception on vaccination intentions: The perspective of doctor–patient communication.

Author: Zhou, Haichun, Zhao, Wenli, Ma, Rong, Zheng, Yishu, Guo, Yuxuan, Wei, Liangyu, and Wang, Mingyi
Subjects: *RISK perception, *SOCIAL media, *NATURAL language processing, *VACCINATION, *INTENTION, *TRUST
Abstract: From the perspective of doctor–patient communication, this research used multiple methods combined natural language processing (NLP), a cross-sectional survey and an online experiment to investigated how risk perception influenced people's vaccination intention. In Study 1, we used Python to crawl 335,045 comments about COVID-19 vaccine published in a social media platform Sina Weibo (equivalent of Twitter in China) from 31 December 2020 to 31 December 2021. Text analysis and sentiment analysis was used to examine how vaccination intention, as measured by linguistic features from the LIWC dictionary, changed with individuals' perceptions of pandemic risk. In Study 2, we adopted a cross-sectional questionnaire survey to further test the relation of risk perception, vaccination intention, trust in physicians, and perceived medical recommendations in a Chinese sample (n = 386). In Study 3, we conducted an online experiment where we recruited 127 participants with high trust in physicians and 127 participants with low trust, and subsequently randomly allocated them into one of three conditions: control, rational recommendation, or perceptual recommendation. Text and sentiment analysis revealed that the use of negative words towards COVID-19 vaccine had a significant decrease at high (vs. low) risk perception level time (Study 1). Trust in physicians mediated the effect of risk perception on vaccination intention and this effect was reinforced for participants with low (vs. high) level of perceived medical recommendation (Study 2), especially for the rational (vs. perceptual) recommendation condition (Study 3). Risk perception increased vaccination intention through the mediating effect of trust in physicians and the moderating effect of perceived medical recommendations. Rational (vs. perceptual) recommendation is more effective in increasing intention to get vaccinated in people with low trust in physicians. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. The Detection of Misstated Financial Reports Using XBRL Mining and Intelligible MLP

Author: Trigueiros, Duarte, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Daimi, Kevin, editor, and Al Sadoon, Abeer, editor
Published: 2024
Full Text: View/download PDF

11. Web Scams Detection System

Author: Badawi, Emad, Jourdan, Guy-Vincent, Onut, Iosif-Viorel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mosbah, Mohamed, editor, Sèdes, Florence, editor, Tawbi, Nadia, editor, Ahmed, Toufik, editor, Boulahia-Cuppens, Nora, editor, and Garcia-Alfaro, Joaquin, editor
Published: 2024
Full Text: View/download PDF

12. Web Mining for Estimating Regulatory Blockchain Readiness

Author: Vlachos, Andreas, Iosif, Elias, Christodoulou, Klitos, van der Aalst, Wil, Series Editor, Ram, Sudha, Series Editor, Rosemann, Michael, Series Editor, Szyperski, Clemens, Series Editor, Guizzardi, Giancarlo, Series Editor, Papadaki, Maria, editor, Themistocleous, Marinos, editor, Al Marri, Khalid, editor, and Al Zarouni, Marwan, editor
Published: 2024
Full Text: View/download PDF

13. Information diffusion in referral networks: an empirical investigation of the crypto asset landscape

Author: Vasudevan, Srinidhi, Piazza, Anna, and Ghinoi, Stefano
Published: 2024
Full Text: View/download PDF

14. Links Evaluation and Ranking Based on Semantic Metadata Analysis.

Author: Abdulmunim, Matheel E. and Naamha, Esraa Q.
Subjects: *METADATA, *WORLD Wide Web, *WEB search engines, *WEBSITES, *SEARCH engines
Abstract: There is a vast quantity of information contained within the billions of web pages that make up the World Wide Web (WWW). Search engines carry out a variety of activities depending on their own architectures for retrieving the necessary information from the WWW. The search engine typically returns a huge number of pages in response to a user's query when the user submits one. Numerous ranking techniques have been utilized on search results to aid consumers in navigating the result list. The majority of ranking algorithms described in literature are either content-based or link-based, and they do not take user usage patterns into account. The presented study discusses web mining ranking algorithms depending on structures, contents, and usages and suggests a new ranking method to assess the significance of links with the use of semantic metadata analysis, which considers the number of links visited across nearly all regions, time periods, and related topics and queries. Furthermore, the suggested system uses the user's query to find more relevant information. The most valuable pages are thus displayed at the top of the result list based on user browsing behavior, which significantly decreases the search space. Results showed better ranking output based on different criteria, such as the number of links visited yearly, the number of links visited hourly, the number of links visited by region, the number of links visited by related topics, and the number of links visited by related queries. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Web mining and sentiment analysis of COVID-19 discourse in online forum communities.

Author: Mohamad, Masurah, Masrom, Suraya, Salleh, Khairulliza Ahmad, Alfat, Lathifah, Nasucha, Muhammad, and Uddin, Nur
Subjects: INTERNET forums, SENTIMENT analysis, DISCOURSE analysis, COVID-19, VIRTUAL communities, SUPPORT vector machines
Abstract: Recently, various discussions, solutions, data, and methods related to coronavirus disease 2019 (COVID-19) have been posted in online forum communities. Although a vast amount of posting on COVID-19 analytical projects are available in the online forum communities, much of them remain untapped due to limited overview and profiling that focuses on COVID-19 analytic techniques. Thus, it is quite challenging for information diggers and researchers to distinguish the recent trends and challenges of COVID-19 analytic for initiating different and critical studies to fight against the coronavirus. This paper presents the findings of a study that executed a web mining process on COVID-19 data analytical projects from the Stack Overflow and GitHub online community platforms for data scientists. This study provides an insight on what activities can be conducted by novice researchers and others who are interested in data analysis, especially in sentiment analysis. The classification results via Naïve Bayes (NB), support vector machine (SVM) and logistic regression (LR) have returned high accuracy, indicating that the constructed model is efficient in classifying the sentiment data of COVID-19. The findings reported in this paper not only enhance the understanding of COVID-19 related content and analysis but also provides promising framework that can be applied in diverse contexts and domains. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Web Mining of Online Resources for German Labor Market Research and Education: Finding the Ground Truth?

Author: Fischer, Andreas and Dörpinghaus, Jens
Subjects: LABOR market, GERMAN language, NATURAL language processing, GOVERNMENT websites, EDUCATION marketing, MARKETING research
Abstract: The labor market is highly dependent on vocational and academic education, training, retraining, and further education in order to master challenges such as advancing digitalization and sustainability. Further training is a key factor in ensuring a qualified workforce, the employability of all employees, and, thus, national competitiveness and innovation. In the contribution at hand, we explore an innovative way to derive knowledge about learning pathways by connecting the dots from different data sources of the German labor market. In particular, we focus on the web mining of online resources for German labor market research and education, such as online advertisements, information portals, and official government websites. A key question for working with different data sources is how to find the ground truth and common data structures that can be used to make the data interoperable. We discuss how to classify and summarize web data from different platforms and which methods can be used for extracting data, entities and relationships from online resources on the German labor market to build a network of educational pathways. Our proposed solution is based on the classification of occupations (KldB) and related document codes (DKZ), and combines natural language processing and knowledge graph technologies. Our research provides the foundation for further investigation into educational pathways and linked data for labor market research. While our work focuses on German data, it is also useful for other German-speaking countries and could easily be extended to other languages such as English. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Web Mining of Online Resources for German Labor Market Research and Education: Finding the Ground Truth?

Author: Andreas Fischer and Jens Dörpinghaus
Subjects: web mining, knowledge discovery and data mining, knowledge discovery, labor market research, research and development towards society, Electronic computers. Computer science, QA75.5-76.95
Abstract: The labor market is highly dependent on vocational and academic education, training, retraining, and further education in order to master challenges such as advancing digitalization and sustainability. Further training is a key factor in ensuring a qualified workforce, the employability of all employees, and, thus, national competitiveness and innovation. In the contribution at hand, we explore an innovative way to derive knowledge about learning pathways by connecting the dots from different data sources of the German labor market. In particular, we focus on the web mining of online resources for German labor market research and education, such as online advertisements, information portals, and official government websites. A key question for working with different data sources is how to find the ground truth and common data structures that can be used to make the data interoperable. We discuss how to classify and summarize web data from different platforms and which methods can be used for extracting data, entities and relationships from online resources on the German labor market to build a network of educational pathways. Our proposed solution is based on the classification of occupations (KldB) and related document codes (DKZ), and combines natural language processing and knowledge graph technologies. Our research provides the foundation for further investigation into educational pathways and linked data for labor market research. While our work focuses on German data, it is also useful for other German-speaking countries and could easily be extended to other languages such as English.
Published: 2024
Full Text: View/download PDF

18. Scraping Relevant Images from Web Pages without Download.

Author: Uzun, Erdinç
Subjects: WEBSITES, ERROR rates, MACHINE learning, NEWS websites
Abstract: Automatically scraping relevant images from web pages is an error-prone and time-consuming task, leading experts to prefer manually preparing extraction patterns for a website. Existing web scraping tools are built on these patterns. However, this manual approach is laborious and requires specialized knowledge. Automatic extraction approaches, while a potential solution, require large training datasets and numerous features, including width, height, pixels, and file size, that can be difficult and time-consuming to obtain. To address these challenges, we propose a semi-automatic approach that does not require an expert, utilizes small training datasets, and has a low error rate while saving time and storage. Our approach involves clustering web pages from a website and suggesting several pages for a non-expert to annotate relevant images. The approach then uses these annotations to construct a learning model based on textual data from the HTML elements. In the experiments, we used a dataset of 635,015 images from 200 news websites, each containing 100 pages, with 22,632 relevant images. When comparing several machine learning methods for both automatic approaches and our proposed approach, the AdaBoost method yields the best performance results. When using automatic extraction approaches, the best f-Measure that can be achieved is 0.805 with a learning model constructed from a large training dataset consisting of 120 websites (12,000 web pages). In contrast, our approach achieved an average f-Measure of 0.958 for 200 websites with only six web pages annotated per website. This means that a non-expert only needs to examine 1,200 web pages to determine the relevant images for 200 websites. Our approach also saves time and storage space by not requiring the download of images and can be easily integrated into currently available web scraping tools, because it is based on textual data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. The digital layer: alternative data for regional and innovation studies.

Author: Abbasiharofteh, Milad, Krüger, Miriam, Kinne, Jan, Lenz, David, and Resch, Bernd
Subjects: NATURAL language processing, HYPERLINKS
Abstract: The lack of large-scale data revealing the interactions among firms has constrained empirical studies. Utilizing relational web data has remained unexplored as a remedy for this data problem. We constructed a Digital Layer by scraping the inter-firm hyperlinks of 600,000 German firms and linked the Digital Layer with several traditional indicators. We showcase the use of this developed dataset by testing whether the Digital Layer data can replicate several theoretically motivated and empirically supported stylized facts. The results show that the intensity and quality of firms' hyperlinks are strongly associated with the innovation capabilities of firms and, to a lesser extent, with hyperlink relations to geographically distant and cognitively close firms. Finally, we discuss the implications of the Digital Layer approach for an evidence-based assessment of sectoral and place-based innovation policies. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. A Machine Learning Approach to Enterprise Matchmaking Using Multilabel Text Classification Based on Semi-structured Website Content

Author: Vellmer, Jan, Mandl, Peter, Bellmann, Tobias, Balluff, Maximilian, Weber, Manuel, Döschl, Alexander, Keller, Max-Emanuel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Delir Haghighi, Pari, editor, Pardede, Eric, editor, Dobbie, Gillian, editor, Yogarajan, Vithya, editor, ER, Ngurah Agus Sanjaya, editor, Kotsis, Gabriele, editor, and Khalil, Ismail, editor
Published: 2023
Full Text: View/download PDF

21. Multi-criteria-Based Page Ranking Using Metaheuristic Swarm Optimization

Author: Kumar, Santosh, Verma, Sanjai Mohan, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Borah, Samarjeet, editor, Gandhi, Tapan K., editor, and Piuri, Vincenzo, editor
Published: 2023
Full Text: View/download PDF

22. Machine Learning Approaches for Educational Data Mining

Author: Toradmal, Mahesh Bapusaheb, Mehta, Mita, Mehendale, Smita, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Suma, V., editor, Lorenz, Pascal, editor, and Baig, Zubair, editor
Published: 2023
Full Text: View/download PDF

23. Security Risks, Fake Degrees, and Other Fraud: A Topic Modelling Approach

Author: Carmichael, Jamie J., Eaton, Sarah Elaine, Eaton, Sarah Elaine, Series Editor, Foltýnek, Tomáš, Editorial Board Member, Glendinning, Irene, Editorial Board Member, Khan, Zeenath Reza, Editorial Board Member, Howard, Rebecca Moore, Editorial Board Member, Israel, Mark, Editorial Board Member, Parnther, Ceceilia, Editorial Board Member, Stoesz, Brenda, Editorial Board Member, Carmichael, Jamie J., editor, and Pethrick, Helen, editor
Published: 2023
Full Text: View/download PDF

24. A Short Paper on Web Mining Using Google Techniques

Author: Gupta, Kriti, Shrivastava, Vishal, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Joshi, Amit, editor, Mahmud, Mufti, editor, and Ragel, Roshan G., editor
Published: 2023
Full Text: View/download PDF

25. Spectral clustering algorithm based web mining and quadratic support vector machine for learning style prediction in E-learning platform

Author: K.N. Prashanth Kumar, B.T. Harish Kumar, and A. Bhuvanesh
Subjects: E-Learning, Learning style prediction, Machine learning, Web mining, And quadratic SVM, Electric apparatus and materials. Electric circuits. Electric networks, TK452-454.4
Abstract: A learning system, which is composed of a computer and the internet as the major elements, is termed an e-learning platform. It also promotes the education standard with the utilization of modern technology and equipment. Meanwhile, to enhance the standard of education significantly, it is important to predict the learning style of the users with the assistants of feedback and supervision. Nevertheless, it will avert the inherent correlation among e-learning behaviors. Hence, to predict the learning style automatically we propose a novel Spectral Clustering algorithm based Quadratic Support Vector Machine (E-SVM) approach. Our proposed approach employs two phases: (i) Utilizing the Web usage mining approach the learning secrets are extracted from the log files of learners. (ii) The classification of learning styles of learners is effectuated with the proposed approach. Experiments are demonstrated with Python package and analyzed the performance. For simulation, we have utilized real-time dataset and compared the results with the state-of-art approaches. Our approach surpasses all the other approaches.
Published: 2024
Full Text: View/download PDF

26. Local 2-connected bow-tie structure of the Web and of social networks

Author: Perekhodko, Eugenia
Published: 2024
Full Text: View/download PDF

27. ANALYSIS OF THE REFLECTIONS OF VIOLENCE AGAINST HEALTHCARE WORKERS IN THE MEDIA USING TEXT MINING METHOD.

Author: DÖKME YAĞAR, Sema, TÜRKDOĞAN GÖRGÜN, Ceren, and AKYÜREK, Çağdaş Erkan
Subjects: *TEXT mining, *MEDICAL personnel, *NEWSPAPER circulation, *VIOLENCE, *HOSPITAL emergency services
Abstract: One of the important communication channels today, the news can inform and manipulate individuals. In order to reveal the public reflections of violence in health care, which is one of the important problems encountered in the health system in Turkey, the news on the subject were examined in detail within this study. The scope of the research consists of 946 news reports from the top five newspapers with the highest circulation (Hürriyet, Sabah, Sözcü, Milliyet and Posta). Web mining was used to obtain the data. In the analysis of the data, word clouds, time graphs and trigram were created using text mining method. In addition, using manual content analysis, the news reports were classified under some basic headings. It was determined that the most frequently used common words in the news headlines, abstracts and contents were "health", "violence", "doctor" and "hospital". When examined in terms of content, it was emphasized in the news that violence in health care occurred mostly in hospitals and emergency departments, violence was committed mostly by patient relatives and patients, the group most exposed to violence was physicians, and violence usually resulted in injury. It is thought that these findings will be beneficial in terms of contributing to the accurate determination of the issues to be prioritized in policymaking processes. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Improved Decision Tree Model for Prediction in Equity Market Using Heterogeneous Data.

Author: Agrawal, Lalit and Adane, Dattatraya
Subjects: *DECISION trees, *FINANCIAL markets, *PREDICTION models, *STOCK price indexes, *MARKETING forecasting
Abstract: The popularity of decision tree forest is due to its superior performance and accuracy. The accuracy of the decision tree forest algorithm depends upon the base learner and its diversification. As far as the literature is concerned, a large number of academia's and researchers proposed various methods which are mostly based on pre-filtering and post-filtering of the decision tree forest technique. In this research work, a novel technique is proposed for increasing the mixture of individual decision tree present in the forest, which will improvise the final precision. In the proposed method, throughout the training, each tree of the forest is being trained to use dissimilar sets of rotation spaces which are linked together to an elevated space at the parent node. After linking each rotation space, the search for the good split is done within the elevated space. The decision of selecting a rotation technique for all the succeeding nodes depends upon the placement of a good split. Conventional equity market forecasting methods are mostly based on historical data and it is used to predict the upcoming pattern. As internet information is growing in an exponential manner, few authors have proposed work based on financial news and technical indicators for improving the prediction on the equity market. In this research work, heterogeneous information from various sources like social media, world market performance, global news, financial news and historical data have been considered for improving the prediction of Indian market indices. The performance of the proposed technique is evaluated on the Indian stock market indices with significant accuracy. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

29. IRPDP_HT2: a scalable data pre-processing method in web usage mining using Hadoop MapReduce.

Author: Srivastava, Atul Kumar and Srivastava, Mitali
Subjects: *WEB analytics, *PARALLEL algorithms, *DATA scrubbing, *PARALLEL programming, *BIG data, *DATA warehousing
Abstract: Data preparation is a vital step in the web usage mining process since it provides structured data for the subsequent stages. Hence, it is necessary to convert raw server logs into user sessions to generate structured data for pattern discovery phase. In recent decade, popular websites' server log production has risen to many terabytes to petabytes each day. As a result, server logs possess big data issues such as storage and processing. This study focuses on initial phases of web usage mining process such as data cleaning, user identification, and session identification. These phases are classified as data-intensive processes and deemed-computation intensive. In the last decade, MapReduce emerges as one of the best parallel programming frameworks for data-intensive applications. An efficient MapReduce-based data pre-processing algorithm, i.e. IRPDP_HT2, is proposed in this study. Previous parallel data pre-processing algorithms either include partial phases or lack with efficient robot detection approaches. IRPDP_HT2 algorithm uses a variety of efficient heuristics in all three phases of data pre-processing to identify both ethical and unethical robots. The suggested IRPDP_HT2 approach is found to be effective and scalable for larger datasets after various experiments on a cluster of nodes. The effectiveness of suggested heuristics is also examined during session identification phase. Three variants of IRPDP_HT2 such as PDP_HT2, IPDP_HT2, and RPDP_HT2 are also developed and tested. Impact of robots' requests and internal dummy connections' requests on session count by IRPDP_HT2 algorithm is 45.81% which is more than in PDP_HT2, IPDP_HT2, and RPDP_HT2 algorithms. Further speed-up and size-up are also analysed to demonstrate scalability of algorithm. In the presence of larger datasets, the algorithm's running time falls, while the number of data nodes grows. The size-up of IRPDP_HT2 demonstrates that even after doubling the input data, the algorithm's running time does not grow in that ratio for the fixed number of data nodes. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

30. Data-Driven Methodology for Knowledge Graph Generation Within the Tourism Domain

Author: Alessandro Chessa, Gianni Fenu, Enrico Motta, Francesco Osborne, Diego Reforgiato Recupero, Angelo Salatino, and Luca Secchi
Subjects: Knowledge graphs, ontology design, tourism ontology, web science, web mining, tourism, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The tourism and hospitality sectors have become increasingly important in the last few years and the companies operating in this field are constantly challenged with providing new innovative services. At the same time, (big-) data has become the “new oil” of this century and Knowledge Graphs are emerging as the most natural way to collect, refine, and structure this heterogeneous information. In this paper, we present a methodology for semi-automatic generating a Tourism Knowledge Graph (TKG), which can be used for supporting a variety of intelligent services in this space, and a new ontology for modelling this domain, the Tourism Analytics Ontology (TAO). Our approach processes and integrates data from Booking.com, Airbnb, DBpedia, and GeoNames. Due to its modular structure, it can be easily extended to include new data sources or to apply new enrichment and refinement functions. We report a comprehensive evaluation of the functional, logical, and structural dimensions of TKG and TAO.
Published: 2023
Full Text: View/download PDF

31. Uncovering the impact of COVID-19 on shipping and logistics

Author: Hirata, Enna and Matsuda, Takuma
Published: 2022
Full Text: View/download PDF

32. ICIS 2023 Hyderabad, An Exploratory Study of German Higher Education Institutions Transfer Activities: New Measurements Based on Web Mining.

Author: Schmitt, Michelle, Schröder, Christian, Beck, Günter W., and Werner, Arndt
Subjects: UNIVERSITIES & colleges, KNOWLEDGE transfer, ARTIFICIAL intelligence, NATURAL language processing, RESEARCH methodology
Abstract: In recent years, higher education institutions (HEI) have expanded their involvement in diverse transfer activities (TA), extending beyond traditional teaching and research roles. These TA are often heterogeneous and informal, which makes measuring their full scope and effects challenging. In this article, we propose a new and straightforward to implement approach for mastering this task. In a first step, we theoretically derive three different dimensions of transfer, namely the transfer of knowledge, technology and personnel. For each of these categories, we develop an artificial intelligence (AI) optimized keyword list. Finally, we use these lists and apply web mining techniques and natural language processing (NLP) to measure TA from German HEI. To this end, we analyze a total of 299,229 texts from 376 German HEI websites. Our study shows that our proposed approach represents an effective and valuable tool for measuring TA from HEI and provides a foundation for further research. [ABSTRACT FROM AUTHOR]
Published: 2023

33. Artificial intelligence trend analysis in German business and politics: a web mining approach

Author: Dumbach, Philipp, Schwinn, Leo, Löhr, Tim, Elsberger, Tassilo, and Eskofier, Bjoern M.
Published: 2023
Full Text: View/download PDF

34. Identifying New Perspectives on Geothermal Energy Investments

Author: Adalı, Zafer, author, Dinçer, Hasan, author, Eti, Serkan, author, Mikhaylov, Alexey, author, and Yüksel, Serhat, author
Published: 2022
Full Text: View/download PDF

35. Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model

Author: Chang, Chia-Hui, Liao, Yu-Ching, Yeh, Ting, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chbeir, Richard, editor, Huang, Helen, editor, Silvestri, Fabrizio, editor, Manolopoulos, Yannis, editor, and Zhang, Yanchun, editor
Published: 2022
Full Text: View/download PDF

36. Knowledge Discovery in Web Usage Patterns Using Pageviews and Data Mining Association Rule

Author: Vijaiprabhu, G., Arivazhagan, B., Shunmuganathan, N., Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Karuppusamy, P., editor, García Márquez, Fausto Pedro, editor, and Nguyen, Tu N., editor
Published: 2022
Full Text: View/download PDF

37. Mining Web User Behavior: A Systematic Mapping Study

Author: Taşgetiren, Nail, Aktas, Mehmet S., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Gervasi, Osvaldo, editor, Murgante, Beniamino, editor, Misra, Sanjay, editor, Rocha, Ana Maria A. C., editor, and Garau, Chiara, editor
Published: 2022
Full Text: View/download PDF

38. Mining Top-K Competitors by Eliminating the K-Least Items from Unstructured Dataset

Author: Pawar, Mahendra Eknath, Saini, Satish, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Saini, H. S., editor, Singh, R. K., editor, Tariq Beg, Mirza, editor, Mulaveesala, Ravibabu, editor, and Mahmood, Md Rashid, editor
Published: 2022
Full Text: View/download PDF

39. Web Usage Mining—Process, Tools and Practices

Author: Mittal, Ruchi, Malik, Varun, Singh, Jaiteg, Singh, Vikram, Mittal, Amit, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Singh, Pradeep Kumar, editor, Singh, Yashwant, editor, Kolekar, Maheshkumar H., editor, Kar, Arpan Kumar, editor, and Gonçalves, Paulo J. S., editor
Published: 2022
Full Text: View/download PDF

40. Web Page Classification Based on an Accurate Technique for Key Data Extraction

Author: Lassri, Safae, Benlahmar, El Habib, Tragha, Abderrahim, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Balas, Valentina E., editor, and Ezziyyani, Mostafa, editor
Published: 2022
Full Text: View/download PDF

41. Are Rumors Always False?: Understanding Rumors Across Domains, Queries, and Ratings

Author: Chau, Xuan Truong Du, Nguyen, Thanh Tam, Jo, Jun, Nguyen, Quoc Viet Hung, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Li, Bohan, editor, Yue, Lin, editor, Jiang, Jing, editor, Chen, Weitong, editor, Li, Xue, editor, Long, Guodong, editor, Fang, Fei, editor, and Yu, Han, editor
Published: 2022
Full Text: View/download PDF

42. Forecasting the User Prediction from Weblogs Using Improved IncSpan Algorithm

Author: Om Prakash, P. G., Abdul Rahman, A., Nagaraj, J., Sivakumar, N., Xhafa, Fatos, Series Editor, Karrupusamy, P., editor, Balas, Valentina Emilia, editor, and Shi, Yong, editor
Published: 2022
Full Text: View/download PDF

43. Financial Portfolio Management and Optimization to Maximize Returns Using a Combination of HRP and Sentiment Analysis

Author: Rane, Chinmay, Pai, Siddhesh, Dani, Mithil, Dhage, Sudhir, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Gunjan, Vinit Kumar, editor, and Zurada, Jacek M., editor
Published: 2022
Full Text: View/download PDF

44. Computational Intelligence in Web Mining

Author: Singh, Dheeraj Kumar, Srivastava, Rohit, Choudhury, Tanupriya, Yadav, Anuj Kumar, Chlamtac, Imrich, Series Editor, Tomar, Ravi, editor, Hina, Manolo Dulva, editor, Zitouni, Rafik, editor, and Ramdane-Cherif, Amar, editor
Published: 2022
Full Text: View/download PDF

45. A Review: Web Content Mining Techniques

Author: Shah, Priyanka, Pandit, Hardik B., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nanda, Priyadarsi, editor, Verma, Vivek Kumar, editor, Srivastava, Sumit, editor, Gupta, Rohit Kumar, editor, and Mazumdar, Arka Prokash, editor
Published: 2022
Full Text: View/download PDF

46. A Novel Approach for Web Mining Taxonomy for High-Performance Computing

Author: Samanta, Debabrata, Dutta, Soumi, Galety, Mohammad Gouse, Pramanik, Sabyasachi, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Tavares, João Manuel R. S., editor, Dutta, Paramartha, editor, Dutta, Soumi, editor, and Samanta, Debabrata, editor
Published: 2022
Full Text: View/download PDF

47. Text and Web Content Mining: A Systematic Review

Author: Almatrooshi, Fatima, Alhammadi, Sumayya, Salloum, Said A., Shaalan, Khaled, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Al-Emran, Mostafa, editor, Al-Sharafi, Mohammed A., editor, Al-Kabi, Mohammed N., editor, and Shaalan, Khaled, editor
Published: 2022
Full Text: View/download PDF

48. Web-based objects detection to discover key objects in human activities.

Author: Cousyn, Charles, Bouchard, Kévin, and Gaboury, Sébastien
Abstract: Aging in place has garnered a lot of interest in the past decade among researchers and politicians. It presents itself as a humane and cost-efficient solution to the worsening problem related to the financing and staffing of our fragile healthcare ecosystem. In that regard, smart environments could help monitor the daily activities of the person and provide information regarding the health status and any potential problem necessitating immediate assistance. To do so, many teams, including ours, have been working on human activity recognition from distributed sensors. Nevertheless, it is still very challenging due to the difficulty of amassing enough data to learn activity models that generalizes well across different residents and different smart environments. Moreover, whenever one wants to add a new activity to the set of recognizable activities, it requires to gather additional data with label. The whole process is prohibitively costly and time consuming. Therefore, in this paper, our team proposed to explore web mining in order to build activity model in an unsupervised fashion. More specifically, using the results of two popular image search engines with automated queries and with the help of object detection/classification models, we learned the set of key objects involved in the realization of activities of daily life. A total of 108 configurations were tested. The experiments showed that the key objects related to activities can be easily extracted with a good accuracy. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

49. A WebExtension framework for experimentation and evaluation of webpage segmentation methods

Author: Geunseong Jung and Jaehyuk Cha
Subjects: Data mining, Web mining, Web extensions, Computer software, QA76.75-76.765
Abstract: Current webpages contain areas with different functions and contents. Many studies and applications have used webpage segmentation methods to separate these areas or extract only specific areas for their purposes. Examining these methods requires laborious tasks, such as collecting many webpages, inspecting them with human participants, and applying various performance metrics to their results. Therefore, we developed a WebExtension (browser extension) framework to support the examination and analysis of webpage segmentation methods. This framework can build a WebExtension to collect webpages, curate data for labeling web documents, evaluate methods, and measure the results with various performance metrics in a web browser environment. Furthermore, researchers can use preloaded well-known methods and metrics in the framework and add more methods and metrics for their research purposes.
Published: 2023
Full Text: View/download PDF

50. Predicting Customers’ Behavior Using Web-Content Mining and Web-Usage Mining

Author: Bahareh Sheykh Abbasi, Neda Abdolvand, and Saeedeh Rajaee Harandi
Subjects: e-commerce, web personalization, web mining, web-content mining, web-usage mining., Information resources (General), ZA3040-5185, Transportation and communications, HE1-9990
Abstract: Today, e-commerce has become a competitive space for online retailers. Therefore, personalization has become a vital part of e-commerce websites’ success that is a challenge for marketers and researchers. Therefore, this study aims to provide a model for web personalization and mining user interests using a hybrid approach of web-usage and web-content mining. So, the navigational patterns of web users and the interests of each user on web pages of a Persian website were extracted through web-usage mining and topic modeling, respectively. Users were then clustered using the dependency distribution algorithm and 25 categories were extracted. To better understand the behavioral patterns of web users, they were categorized using the Support Vector Machine algorithm based on the users’ interests and navigational behaviors. The most important result of the proposed system is that the patterns of users’ navigation are understandable and the subsequent analyzes will be much simpler.https://dorl.net/dor/20.1001.1.20088302.2022.20.3.9.6
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

7,705 results on '"Web mining"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources