Publication Type: Academic Journals and eBooks / Publication Year Range: This year / Search Limiters: Available in Library Collection / Topic: big data and machine learning - Searchworks@Jio Institute Digital Library Search Results

Showing total 114 results

Start Over Search Limiters Available in Library Collection Topic big data Topic machine learning Publication Year Range This year Publication Type Academic Journals Publication Type eBooks

114 results

1. Understanding the effects of socialness and color complexity in listing images on crowdfunding behavior

Author: Barnes, Stuart John
Published: 2024
Full Text: View/download PDF

2. Guest Editorial: Special issue on network/traffic optimisation towards 6G network.

Author: Logothetis, Michael, Barraca, João Paulo, Shioda, Shigeo, and Rabie, Khaled
Subjects: TELECOMMUNICATION systems, TELECOMMUNICATION network management, MOBILITY management (Mobile radio), AD hoc computer networks, VEHICULAR ad hoc networks, COMMUNICATION infrastructure, MACHINE learning, VIRTUAL networks, TEXT messages
Abstract: This document is a guest editorial for a special issue on network/traffic optimization towards the 6G network. It highlights six research papers that focus on various topics related to network and traffic optimization in the context of 6G. These topics include hotspot performance in vehicular communication, ad-hoc networks (FANETs), cooperative relaying using MIMO-NOMA technology, network resource allocation using machine learning, cyber attacks and countermeasures in VoWi-Fi, and network management. The papers present theoretical analyses, propose novel models and schemes, and discuss practical applications and experiments. The authors express their gratitude to the researchers and reviewers involved in the publication of these papers. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

3. Editorial for the Special Issue "Data Science and Big Data in Biology, Physical Science and Engineering".

Author: Mahmoud, Mohammed
Subjects: PHYSICAL sciences, BIG data, DEEP learning, ARTIFICIAL neural networks, DATA science, MACHINE learning, REINFORCEMENT learning
Abstract: This document is an editorial for a special issue of the journal "Technologies" focused on data science and big data in various fields such as biology, physical science, and engineering. The editorial highlights the importance of analyzing large amounts of data generated by digital technologies and the need for data scientists to use artificial intelligence and machine learning to extract valuable knowledge. The special issue includes 12 papers covering topics such as machine learning techniques for customer churn prediction, agile program management in the U.S. Navy, deep learning for cybersecurity in Industry 5.0, self-directed learning during the COVID-19 era, decision tree-based neural networks for data classification, data-driven governance in technology companies, and more. The papers explore different approaches, models, and tools in the context of data science and big data. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

4. Data-Driven Method for Vacuum Prediction in the Underwater Pump of a Cutter Suction Dredger.

Author: Chen, Hualin, Yuan, Zihao, Wang, Wangming, Chen, Shuaiqi, Jiang, Pan, and Wei, Wei
Subjects: DREDGES, BIG data, MACHINE learning, DATA mining, RANK correlation (Statistics), FEATURE extraction
Abstract: Vacuum is an important parameter in cutter suction dredging operations because the equipment is underwater and can easily fail. It is necessary to analyze other parameters related to the vacuum to make real-time predictions about it, which can improve the construction efficiency of the dredger under abnormal working conditions. In this paper, a data-driven method for predicting the vacuum of the underwater pump of the cutter suction dredger (CSD) is proposed with the help of big data, machine learning, data mining, and other technologies, and based on the historical data of "Hua An Long" CSD. The method eliminates anomalous data, standardizes the data set, and then relies on theory and engineering experience to achieve feature extraction using the Spearman correlation coefficient. Then, six machine learning methods were employed in this study to train and predict the data set, namely, lasso regression (lasso), elastic network (Enet), gradient boosting decision tree (including traditional GBDT, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM)), and stacking. The comparison of the indicators obtained through multiple rounds of feature number iteration shows that the LightGBM model has high prediction accuracy, a good running time, and a generalization ability. Therefore, the methodological framework proposed in this paper can help to improve the efficiency of underwater pumps and issue timely warnings in abnormal working conditions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Integrating Wireless Remote Sensing and Sensors for Monitoring Pesticide Pollution in Surface and Groundwater.

Author: Mutunga, Titus, Sinanovic, Sinan, and Harrison, Colin S.
Subjects: PESTICIDE pollution, GROUNDWATER pollution, REMOTE sensing, WATER table, POLLUTION monitoring, PESTICIDES, FECAL contamination
Abstract: Water constitutes an indispensable resource crucial for the sustenance of humanity, as it plays an integral role in various sectors such as agriculture, industrial processes, and domestic consumption. Even though water covers 71% of the global land surface, governments have been grappling with the challenge of ensuring the provision of safe water for domestic use. A contributing factor to this situation is the persistent contamination of available water sources rendering them unfit for human consumption. A common contaminant, pesticides are not frequently tested for despite their serious effects on biodiversity. Pesticide determination in water quality assessment is a challenging task because the procedures involved in the extraction and detection are complex. This reduces their popularity in many monitoring campaigns despite their harmful effects. If the existing methods of pesticide analysis are adapted by leveraging new technologies, then information concerning their presence in water ecosystems can be exposed. Furthermore, beyond the advantages conferred by the integration of wireless sensor networks (WSNs), the Internet of Things (IoT), Machine Learning (ML), and big data analytics, a notable outcome is the attainment of a heightened degree of granularity in the information of water ecosystems. This paper discusses methods of pesticide detection in water, emphasizing the possible use of electrochemical sensors, biosensors, and paper-based sensors in wireless sensing. It also explores the application of WSNs in water, the IoT, computing models, ML, and big data analytics, and their potential for integration as technologies useful for pesticide monitoring in water. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Leveraging Visualization and Machine Learning Techniques in Education: A Case Study of K-12 State Assessment Data.

Author: Taylor, Loni, Gupta, Vibhuti, and Jung, Kwanghee
Subjects: DATA-based decision making in education, ARTIFICIAL intelligence, DATA visualization, MACHINE learning, MICROSOFT Azure (Computing platform), INDIVIDUALIZED instruction
Abstract: As data-driven models gain importance in driving decisions and processes, recently, it has become increasingly important to visualize the data with both speed and accuracy. A massive volume of data is presently generated in the educational sphere from various learning platforms, tools, and institutions. The visual analytics of educational big data has the capability to improve student learning, develop strategies for personalized learning, and improve faculty productivity. However, there are limited advancements in the education domain for data-driven decision making leveraging the recent advancements in the field of machine learning. Some of the recent tools such as Tableau, Power BI, Microsoft Azure suite, Sisense, etc., leverage artificial intelligence and machine learning techniques to visualize data and generate insights from them; however, their applicability in educational advances is limited. This paper focuses on leveraging machine learning and visualization techniques to demonstrate their utility through a practical implementation using K-12 state assessment data compiled from the institutional websites of the States of Texas and Louisiana. Effective modeling and predictive analytics are the focus of the sample use case presented in this research. Our approach demonstrates the applicability of web technology in conjunction with machine learning to provide a cost-effective and timely solution to visualize and analyze big educational data. Additionally, ad hoc visualization provides contextual analysis in areas of concern for education agencies (EAs). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Can machine learning make naturalism about health truly naturalistic? A reflection on a data-driven concept of health

Author: Guersenzvaig, Ariel
Published: 2024
Full Text: View/download PDF

8. Revolutionizing Cardiology through Artificial Intelligence—Big Data from Proactive Prevention to Precise Diagnostics and Cutting-Edge Treatment—A Comprehensive Review of the Past 5 Years.

Author: Stamate, Elena, Piraianu, Alin-Ionut, Ciobotaru, Oana Roxana, Crassas, Rodica, Duca, Oana, Fulga, Ana, Grigore, Ionica, Vintila, Vlad, Fulga, Iuliu, and Ciobotaru, Octavian Catalin
Subjects: MACHINE learning, ARTIFICIAL intelligence, PULMONARY embolism, CARDIAC pacing, SCIENTIFIC literature, BIG data
Abstract: Background: Artificial intelligence (AI) can radically change almost every aspect of the human experience. In the medical field, there are numerous applications of AI and subsequently, in a relatively short time, significant progress has been made. Cardiology is not immune to this trend, this fact being supported by the exponential increase in the number of publications in which the algorithms play an important role in data analysis, pattern discovery, identification of anomalies, and therapeutic decision making. Furthermore, with technological development, there have appeared new models of machine learning (ML) and deep learning (DP) that are capable of exploring various applications of AI in cardiology, including areas such as prevention, cardiovascular imaging, electrophysiology, interventional cardiology, and many others. In this sense, the present article aims to provide a general vision of the current state of AI use in cardiology. Results: We identified and included a subset of 200 papers directly relevant to the current research covering a wide range of applications. Thus, this paper presents AI applications in cardiovascular imaging, arithmology, clinical or emergency cardiology, cardiovascular prevention, and interventional procedures in a summarized manner. Recent studies from the highly scientific literature demonstrate the feasibility and advantages of using AI in different branches of cardiology. Conclusions: The integration of AI in cardiology offers promising perspectives for increasing accuracy by decreasing the error rate and increasing efficiency in cardiovascular practice. From predicting the risk of sudden death or the ability to respond to cardiac resynchronization therapy to the diagnosis of pulmonary embolism or the early detection of valvular diseases, AI algorithms have shown their potential to mitigate human error and provide feasible solutions. At the same time, limits imposed by the small samples studied are highlighted alongside the challenges presented by ethical implementation; these relate to legal implications regarding responsibility and decision making processes, ensuring patient confidentiality and data security. All these constitute future research directions that will allow the integration of AI in the progress of cardiology. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Low-shot learning and class imbalance: a survey.

Author: Billion Polak, Preston, Prusa, Joseph D., and Khoshgoftaar, Taghi M.
Subjects: LANGUAGE models, EVIDENCE gaps, BIG data, DEEP learning
Abstract: The tasks of few-shot, one-shot, and zero-shot learning—or collectively "low-shot learning" (LSL)—at first glance are quite similar to the long-standing task of class imbalanced learning; specifically, they aim to learn classes for which there is little labeled data available. Motivated by this similarity, we conduct a survey to review the recent literature for works which combine these fields in one of two ways, either addressing the obstacle of class imbalance within a LSL setting, or utilizing LSL techniques or frameworks in order to combat class imbalance within other settings. In our survey of over 60 papers in a wide range of applications from January 2020 to July 2023 (inclusive), we examine and report methodologies and experimental results, find that most works report performance at or above their respective state-of-the-art, and highlight current research gaps which hold potential for future work, especially those involving the use of LSL techniques in imbalanced tasks. To this end, we emphasize the lack of works utilizing LSL approaches based on large language models or semantic data, and works using LSL for big-data imbalanced tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Robotics multi-modal recognition system via computer-based vision

Author: Shahin, Mohammad, Chen, F. Frank, Hosseinzadeh, Ali, Bouzary, Hamed, and Shahin, Awni
Published: 2024
Full Text: View/download PDF

11. Bulk Power Systems Emergency Control Based on Machine Learning Algorithms and Phasor Measurement Units Data: A State-of-the-Art Review.

Author: Senyuk, Mihail, Beryozkina, Svetlana, Safaraliev, Murodbek, Pazderin, Andrey, Odinaev, Ismoil, Klassen, Viktor, Savosina, Alena, and Kamalov, Firuz
Subjects: PHASOR measurement, MACHINE learning, ELECTRIC power, EXECUTIVE power, DIGITAL control systems, RENEWABLE energy sources
Abstract: Modern electrical power systems are characterized by a high rate of transient processes, the use of digital monitoring and control systems, and the accumulation of a large amount of technological information. The active integration of renewable energy sources contributes to reducing the inertia of power systems and changing the nature of transient processes. As a result, the effectiveness of emergency control systems decreases. Traditional emergency control systems operate based on the numerical analysis of power system dynamic models. This allows for finding the optimal set of preventive commands (solutions) in the form of disconnections of generating units, consumers, transmission lines, and other primary grid equipment. Thus, the steady-state or transient stability of a power system is provided. After the active integration of renewable sources into power systems, traditional emergency control algorithms became ineffective due to the time delay in finding the optimal set of control actions. Currently, machine learning algorithms are being developed that provide high performance and adaptability. This paper contains a meta-analysis of modern emergency control algorithms for power systems based on machine learning and synchronized phasor measurement data. It describes algorithms for determining disturbances in the power system, selecting control actions to maintain transient and steady-state stability, stability in voltage level, and limiting frequency. This study examines 53 studies piled on the development of a methodology for analyzing the stability of power systems based on ML algorithms. The analysis of the research is carried out in terms of accuracy, computational latency, and data used in training and testing. The most frequently used textual mathematical models of power systems are determined, and the most suitable ML algorithms for use in the operational control circuit of power systems in real time are determined. This paper also provides an analysis of the advantages and disadvantages of existing algorithms, as well as identifies areas for further research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Big Data based approach to Network Security and intelligence.

Author: Alquaifil, Mubarak, Mishra, Shailendra, and AlShehri, Mohammed
Subjects: COMPUTER network security, BIG data, BACK propagation, INTERNET security, SITUATIONAL awareness
Abstract: Big data analysis technologies and machine learning techniques are essential for examining and forecasting the state of network security as global concerns about cyber security grow. Models for monitoring network security have a number of challenges, including resource consumption, inaccuracies, low processing efficiency, and incompatibility with real-time and large-scale scenarios. This paper proposes a novel approach to Network Security Situation Awareness (NS-SA) using Big Data (BD) analytics and machine learning. The proposed approach addresses the limitations of existing NS-SA models by leveraging data purification and simplification techniques, and by employing an updated back propagation (BP) neural network to construct an NS BD analysis model. The paper provides a comprehensive explanation of the model's structure and outlines the relevant model techniques. Extensive testing has been conducted to ensure the model's accuracy and applicability in understanding NS scenarios. This study focuses on MATLAB and Python-based implementation of a neural network for network security using a big data approach. The results demonstrate the potential and value of the proposed model in accurately assessing and forecasting NS conditions. The proposed approach has several advantages over existing NS-SA models. It is more efficient in terms of resource usage, it is more accurate in its analysis of network data, It is more applicable in real-time and large-scale scenarios and it is more robust to noise and heterogeneity in network data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Editorial: Machine learning and big data analytics in mood disorders.

Author: Lu Yang and Jun Chen
Subjects: AFFECTIVE disorders, MACHINE learning, BIG data, SUPERVISED learning, PSYCHIATRIC research, HYPOMANIA
Abstract: This document is an editorial published in Frontiers in Psychiatry titled "Machine learning and big data analytics in mood disorders." The editorial discusses the use of machine learning and big data analytics in psychiatric research and practice, specifically in the field of mood disorders. It highlights the potential of these technologies to improve the understanding and management of mood disorders, including refining taxonomy, personalized therapy, and population screening. The editorial also mentions several studies that have applied machine learning and big data analytics in the detection, diagnosis, and treatment of mental disorders, providing valuable insights into real-world treatment patterns and the heterogeneity of clinical states. The authors recommend collaborative teamwork and a combination of data-driven and theory-driven perspectives to advance research in this area. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

14. Ensemble learning method based on CNN for class imbalanced data

Author: Zhong, Xin and Wang, Nan
Published: 2024
Full Text: View/download PDF

15. Data-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class Overlapping.

Author: Bolívar, Armando, García, Vicente, Alejo, Roberto, Florencia-Juárez, Rogelio, and Sánchez, J. Salvador
Subjects: SCALABILITY, MACHINE learning, BIG data, EUCLIDEAN distance, BUILDING performance, DATA quality
Abstract: An innovative strategy for organizations to obtain value from their large datasets, allowing them to guide future strategic actions and improve their initiatives, is the use of machine learning algorithms. This has led to a growing and rapid application of various machine learning algorithms with a predominant focus on building and improving the performance of these models. However, this data-centric approach ignores the fact that data quality is crucial for building robust and accurate models. Several dataset issues, such as class imbalance, high dimensionality, and class overlapping, affect data quality, introducing bias to machine learning models. Therefore, adopting a data-centric approach is essential to constructing better datasets and producing effective models. Besides data issues, Big Data imposes new challenges, such as the scalability of algorithms. This paper proposes a scalable hybrid approach to jointly addressing class imbalance, high dimensionality, and class overlapping in Big Data domains. The proposal is based on well-known data-level solutions whose main operation is calculating the nearest neighbor using the Euclidean distance as a similarity metric. However, these strategies may lose their effectiveness on datasets with high dimensionality. Hence, the data quality is achieved by combining a data transformation approach using fractional norms and SMOTE to obtain a balanced and reduced dataset. Experiments carried out on nine two-class imbalanced and high-dimensional large datasets showed that our scalable methodology implemented in Spark outperforms the traditional approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Advancing Hydrology through Machine Learning: Insights, Challenges, and Future Directions Using the CAMELS, Caravan, GRDC, CHIRPS, PERSIANN, NLDAS, GLDAS, and GRACE Datasets.

Author: Hasan, Fahad, Medley, Paul, Drake, Jason, and Chen, Gang
Subjects: HUMAN activity recognition, MACHINE learning, WATER management, HYDROLOGY, CAMELS, WATER table
Abstract: Machine learning (ML) applications in hydrology are revolutionizing our understanding and prediction of hydrological processes, driven by advancements in artificial intelligence and the availability of large, high-quality datasets. This review explores the current state of ML applications in hydrology, emphasizing the utilization of extensive datasets such as CAMELS, Caravan, GRDC, CHIRPS, NLDAS, GLDAS, PERSIANN, and GRACE. These datasets provide critical data for modeling various hydrological parameters, including streamflow, precipitation, groundwater levels, and flood frequency, particularly in data-scarce regions. We discuss the type of ML methods used in hydrology and significant successes achieved through those ML models, highlighting their enhanced predictive accuracy and the integration of diverse data sources. The review also addresses the challenges inherent in hydrological ML applications, such as data heterogeneity, spatial and temporal inconsistencies, issues regarding downscaling the LSH, and the need for incorporating human activities. In addition to discussing the limitations, this article highlights the benefits of utilizing high-resolution datasets compared to traditional ones. Additionally, we examine the emerging trends and future directions, including the integration of real-time data and the quantification of uncertainties to improve model reliability. We also place a strong emphasis on incorporating citizen science and the IoT for data collection in hydrology. By synthesizing the latest research, this paper aims to guide future efforts in leveraging large datasets and ML techniques to advance hydrological science and enhance water resource management practices. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. APPROXIMATE COMPUTING BASED LOW-POWER FPGA DESIGN FOR BIG DATA ANALYTICS IN CLOUD ENVIRONMENTS.

Author: DOVA, MURALI and SANDI, ANURADHA M.
Subjects: CLOUD storage, BIG data, COMMUNICATION infrastructure, GATE array circuits, CLOUD computing, MACHINE learning, DATA mining
Abstract: As cloud computing continues to evolve, the demand for scalable and energy-efficient infrastructure to handle extensive applications becomes paramount. Traditional transistor scaling and microprocessor design methods no longer suffice to meet the growing scale of cloud usage. This research explores the potential of approximate computing (AC) as an innovative solution to these challenges, particularly in high-demand computational settings. AC, known for its ability to make controlled accuracy trade-offs, is identified as a key strategy for improving both the performance and energy efficiency of cloud infrastructure, with a focus on low-power Field-Programmable Gate Array (FPGA) designs. This paper introduces novel methodologies that harness the strengths of AC, emphasizing its application in neural-based and machine-learning techniques for energy-efficient solutions. By targeting the performance of AC, especially in varied application domains and complex data mining scenarios, we propose two groundbreaking approaches that significantly enhance computational speed and reduce energy consumption. Our empirical analysis demonstrates notable improvements over existing techniques, highlighting the effectiveness of AC in optimizing cloud infrastructure. The proposed model on FPGA through cloud computing attains substantial elevation rates by 89 % and energy reduction by 122 %, which had been good outcomes. This study not only confirms the benefits of integrating AC with low-power FPGA designs for cloud environments but also sets a new benchmark for future research in achieving more sustainable and efficient cloud computing via VLSI FGPA design analysis solutions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Robust Fisher-regularized extreme learning machine with asymmetric Welsch-induced loss function for classification.

Author: Xue, Zhenxia, Zhao, Chongning, Wei, Shuqing, Ma, Jun, and Lin, Shouhe
Subjects: MACHINE learning, QUADRATIC programming, DISTRIBUTION (Probability theory), MATHEMATICAL regularization, BIG data, EXTREME value theory, STATISTICS
Abstract: In general, it is a worth challenging problem to build a robust classifier for data sets with noises or outliers. Establishing a robust classifier is a more difficult problem for datasets with asymmetric noise distribution. The Fisher-regularized extreme learning machine (Fisher-ELM) considers the statistical knowledge of the data, however, it ignores the impact of noises or outliers. In this paper, to reduce the negative influence of noises or outliers, we first put forward a novel asymmetric Welsch loss function named AW-loss based on asymmetric L 2 -loss function and Welsch loss function. Based on the AW-loss function, we then present a new robust Fisher-ELM called AWFisher-ELM. The proposed AWFisher-ELM not only takes into account the statistical information of the data, but also considers the impact of asymmetric distribution noises. We utilize concave-convex procedure (CCCP) and dual method to solve the non-convexity of the proposed AWFisher-ELM. Simultaneously, an algorithm for AWFisher-ELM is given and a theorem about the convergence of the algorithm is proved. To validate the effectiveness of our algorithm, we compare our AWFisher-ELM with the other state-of-the-art methods on artificial data sets, UCI data sets, NDC large data sets and image data sets by setting different ratios of noises. The experimental results are as follows, the accuracy of AWFisher-ELM is the highest in the artificial data sets, reaching 98.9%. For the large-scale NDC data sets and the image data sets, the accuracy of AWFisher-ELM is also the highest. For the ten UCI data sets, the accuracy and F 1 value of AWFisher-ELM are the highest in most data sets expect for Diabetes. In terms of training time, our AWFisher-ELM has almost the same training time with RHELM and CHELM, but it takes longer time than OPT-ELM, WCS-SVM, Fisher-SVM, Pinball-FisherSVM, and Fisher-ELM. This is because AWFisher-ELM, RHELM, and CHELM need to solve a convex quadratic subprogramming problem in each iteration. In conclusion, our method exhibits excellent generalization performance expect for the longer training time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. An Intelligent Model and Methodology for Predicting Length of Stay and Survival in a Critical Care Hospital Unit.

Author: Maldonado Belmonte, Enrique, Oton-Tortosa, Salvador, Gutierrez-Martinez, Jose-Maria, and Castillo-Martinez, Ana
Subjects: INTENSIVE care units, ARTIFICIAL intelligence, DATA libraries, DEATH forecasting, SURVIVAL analysis (Biometry), MACHINE learning, SYSTEM integration, IDENTIFICATION
Abstract: This paper describes the design and methodology for the development and validation of an intelligent model in the healthcare domain. The generated model relies on artificial intelligence techniques, aiming to predict the length of stay and survival rate of patients admitted to a critical care hospitalization unit with better results than predictive systems using scoring. The proposed methodology is based on the following stages: preliminary data analysis, analysis of the architecture and systems integration model, the big data model approach, information structure and process development, and the application of machine learning techniques. This investigation substantiates that automated machine learning models significantly surpass traditional prediction techniques for patient outcomes within critical care settings. Specifically, the machine learning-based model attained an F1 score of 0.351 for mortality forecast and 0.615 for length of stay, in contrast to the traditional scoring model's F1 scores of 0.112 for mortality and 0.412 for length of stay. These results strongly support the advantages of integrating advanced computational techniques in critical healthcare environments. It is also shown that the use of integration architectures allows for improving the quality of the information by providing a data repository large enough to generate intelligent models. From a clinical point of view, obtaining more accurate results in the estimation of the ICU stay and survival offers the possibility of expanding the uses of the model to the identification and prioritization of patients who are candidates for admission to the ICU, as well as the management of patients with specific conditions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Investigation of the Global Fear Associated with COVID-19 Using Subjectivity Analysis and Deep Learning.

Author: Thakur, Nirmalya, Patel, Kesha A., Poon, Audrey, Shah, Rishika, Azizi, Nazif, and Han, Changhee
Subjects: COVID-19, DEEP learning, AGE groups, SUBJECTIVITY, DATA analysis
Abstract: The work presented in this paper makes multiple scientific contributions related to the investigation of the global fear associated with COVID-19 by performing a comprehensive analysis of a dataset comprising survey responses of participants from 40 countries. First, the results of subjectivity analysis performed using TextBlob, showed that in the responses where participants indicated their biggest concern related to COVID-19, the average subjectivity by the age group of 41–50 decreased from April 2020 to June 2020, the average subjectivity by the age group of 71–80 drastically increased from May 2020, and the age group of 11–20 indicated the least level of subjectivity between June 2020 to August 2020. Second, subjectivity analysis also revealed the percentage of highly opinionated, neutral opinionated, and least opinionated responses per age-group where the analyzed age groups were 11–20, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80, and 81–90. For instance, the percentage of highly opinionated, neutral opinionated, and least opinionated responses by the age group of 11–20 were 17.92%, 16.24%, and 65.84%, respectively. Third, data analysis of responses from different age groups showed that the highest percentage of responses indicating that they were very worried about COVID-19 came from individuals in the age group of 21–30. Fourth, data analysis of the survey responses also revealed that in the context of taking precautions to prevent contracting COVID-19, the percentage of individuals in the age group of 31–40 taking precautions was higher as compared to the percentages of individuals from the age groups of 41–50, 51–60, 61–70, 71–80, and 81–90. Fifth, a deep learning model was developed to detect if the survey respondents were seeing or planning to see a psychologist or psychiatrist for any mental health issues related to COVID-19. The design of the deep learning model comprised 8 neurons for the input layer with the ReLU activation function, the ReLU activation function for all the hidden layers with 12 neurons each, and the sigmoid activation function for the output layer with 1 neuron. The model utilized the responses to multiple questions in the context of fear and preparedness related to COVID-19 from the dataset and achieved an accuracy of 91.62% after 500 epochs. Finally, two comparative studies with prior works in this field are presented to highlight the novelty and scientific contributions of this research work. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Continuous patient state attention model for addressing irregularity in electronic health records.

Author: Chauhan, Vinod Kumar, Thakur, Anshul, O'Donoghue, Odhran, Rohanian, Omid, Molaei, Soheila, and Clifton, David A.
Subjects: ELECTRONIC health records, ORDINARY differential equations, MACHINE learning, BIG data, TIME series analysis, HOSPITAL mortality
Abstract: Background: Irregular time series (ITS) are common in healthcare as patient data is recorded in an electronic health record (EHR) system as per clinical guidelines/requirements but not for research and depends on a patient's health status. Due to irregularity, it is challenging to develop machine learning techniques to uncover vast intelligence hidden in EHR big data, without losing performance on downstream patient outcome prediction tasks. Methods: In this paper, we propose Perceiver, a cross-attention-based transformer variant that is computationally efficient and can handle long sequences of time series in healthcare. We further develop continuous patient state attention models, using Perceiver and transformer to deal with ITS in EHR. The continuous patient state models utilise neural ordinary differential equations to learn patient health dynamics, i.e., patient health trajectory from observed irregular time steps, which enables them to sample patient state at any time. Results: The proposed models' performance on in-hospital mortality prediction task on PhysioNet-2012 challenge and MIMIC-III datasets is examined. Perceiver model either outperforms or performs at par with baselines, and reduces computations by about nine times when compared to the transformer model, with no significant loss of performance. Experiments to examine irregularity in healthcare reveal that continuous patient state models outperform baselines. Moreover, the predictive uncertainty of the model is used to refer extremely uncertain cases to clinicians, which enhances the model's performance. Code is publicly available and verified at https://codeocean.com/capsule/4587224. Conclusions: Perceiver presents a computationally efficient potential alternative for processing long sequences of time series in healthcare, and the continuous patient state attention models outperform the traditional and advanced techniques to handle irregularity in the time series. Moreover, the predictive uncertainty of the model helps in the development of transparent and trustworthy systems, which can be utilised as per the availability of clinicians. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. ENHANCED FEATURE-DRIVEN MULTI-OBJECTIVE LEARNING FOR OPTIMAL CLOUD RESOURCE ALLOCATION.

Author: I., UMA MAHESWARA RAO and SASTRY, J. K. R.
Subjects: RESOURCE allocation, DISTRIBUTED computing, COMPUTER programming, BIG data, RANDOM forest algorithms, MACHINE learning, CLOUD computing
Abstract: In cloud networks, especially those with distributed computing setups and data centers, one of the biggest obstacles is allocating resources. This is the key area, and this must be balanced between optimizing system performance on one side and affordability, stability (reliance) of operation, and energy efficiency. The importance of improving resource allocation methodologies in these complex cloud computing systems is recognized, and therefore this paper comes with an appropriate title-"Enhanced Feature-Driven Multi-Objective Learning for Optimal Cloud Resource Allocation" (OCRA), which integrates together both the latest machine learning techniques as well as traditional concepts from research into cloud computing. OCRA capably analyzes historical files on CPU, memory, disk and network usage. In addition to neatly assimilating large data sets such as that was the compliance rate with past SLAs or workload frequencies over certain time periods and resource allocations; even their patterns of service requests are an important piece of information for many busy people's lives today the adaptive mechanism is one of the defining traits of the model. It can accurately anticipate changes in resource demand and immediately adjust supply, fully able to respond rapidly when fluctuations arise suddenly or unexpectedly. Multi-Objective Random Forests are at the very core of OCRA. Each tree for decision making is specially designed to meet a particular performance objective in mind. Combining these trees into a Random Forest ensemble increases not only the model's predictive accuracy but also its stability. Pareto optimization is wisely used to maintain a balance among performance indicators, without an excessive focus on one effect alone. OCRA is proven empirically through experimental studies where key performance indicators such as Resource Utilization Rate and Quality of Service (QoS) Adherence Rate are taken into account. OCRA is both energy-efficient, an important attribute in today's environmentally conscious world, and does not sacrifice performance. As far as speed, flexibility and overall efficiency are concerned, OCRA has always been superior to the other cloud resources allocation programs of its own day. While it's still not quite ready for users who don't have a firm background in computer science or programming skills (ocra is plotted on 0-x), with sufficient memory and dominant minutes turn into mechanical equipment without configuration services. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. 临床预测模型的困境与机遇.

Author: 谷鸿秋
Abstract: Copyright of Chinese Journal of Stroke is the property of Chinese Journal of Stroke Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

24. Distributed independent vector machine for big data classification problems

Author: Almaspoor, Mohammad Hassan, Safaei, Ali A., Salajegheh, Afshin, and Minaei-Bidgoli, Behrouz
Published: 2024
Full Text: View/download PDF

25. Big data resolving using Apache Spark for load forecasting and demand response in smart grid: a case study of Low Carbon London Project.

Author: Ali El-Sayed Ali, Hussien, Alham, M. H., and Ibrahim, Doaa Khalil
Subjects: BIG data, MACHINE learning, DEMAND forecasting, POINT of view (Literature), RENEWABLE energy sources, INFORMATION & communication technologies
Abstract: Using recent information and communication technologies for monitoring and management initiates a revolution in the smart grid. These technologies generate massive data that can only be processed using big data tools. This paper emphasizes the role of big data in resolving load forecasting, renewable energy sources integration, and demand response as significant aspects of smart grids. Meters data from the Low Carbon London Project is investigated as a case study. Because of the immense stream of meters' readings and exogenous data added to load forecasting models, addressing the problem is in the context of big data. Descriptive analytics are developed using Spark SQL to get insights regarding household energy consumption. Spark MLlib is utilized for predictive analytics by building scalable machine learning models accommodating meters' data streams. Multivariate polynomial regression and decision tree models are preferred here based on the big data point of view and the literature that ensures they are accurate and interpretable. The results confirmed the descriptive analytics and data visualization capabilities to provide valuable insights, guide the feature selection process, and enhance load forecasting models' accuracy. Accordingly, proper evaluation of demand response programs and integration of renewable energy resources is accomplished using achieved load forecasting results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. An improved deep hashing model for image retrieval with binary code similarities.

Author: Liu, Huawen, Wu, Zongda, Yin, Minghao, Yu, Donghua, Zhu, Xinzhong, and Lou, Jungang
Subjects: BINARY codes, CONVOLUTIONAL neural networks, IMAGE retrieval, MACHINE learning, HAMMING distance, DEEP learning
Abstract: The exponential growth of data raises an unprecedented challenge in data analysis: how to retrieve interesting information from such large-scale data. Hash learning is a promising solution to address this challenge, because it may bring many potential advantages, such as extremely high efficiency and low storage cost, after projecting high-dimensional data to compact binary codes. However, traditional hash learning algorithms often suffer from the problem of semantic inconsistency, where images with similar semantic features may have different binary codes. In this paper, we propose a novel end-to-end deep hashing method based on the similarities of binary codes, dubbed CSDH (Code Similarity-based Deep Hashing), for image retrieval. Specifically, it extracts deep features from images to capture semantic information using a pre-trained deep convolutional neural network. Additionally, a hidden and fully connected layer is attached at the end of the deep network to derive hash bits by virtue of an activation function. To preserve the semantic consistency of images, a loss function has been introduced. It takes the label similarities, as well as the Hamming embedding distances, into consideration. By doing so, CSDH can learn more compact and powerful hash codes, which not only can preserve semantic similarity but also have small Hamming distances between similar images. To verify the effectiveness of CSDH, we evaluate CSDH on two public benchmark image collections, i.e., CIFAR-10 and NUS-WIDE, with five classic shallow hashing models and six popular deep hashing ones. The experimental results show that CSDH can achieve competitive performance to the popular deep hashing algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. DAPS diagrams for defining Data Science projects.

Author: de Mast, Jeroen and Lokkerbol, Joran
Subjects: SCIENCE projects, DATA science, BIG data, BUSINESS analytics, OPERATIONS research, DATA mining
Abstract: Background: Models for structuring big-data and data-analytics projects typically start with a definition of the project's goals and the business value they are expected to create. The literature identifies proper project definition as crucial for a project's success, and also recognizes that the translation of business objectives into data-analytic problems is a difficult task. Unfortunately, common project structures, such as CRISP-DM, provide little guidance for this crucial stage when compared to subsequent project stages such as data preparation and modeling. Contribution: This paper contributes structure to the project-definition stage of data-analytic projects by proposing the Data-Analytic Problem Structure (DAPS). The diagrammatic technique facilitates the collaborative development of a consistent and precise definition of a data-analytic problem, and the articulation of how it contributes to the organization's goals. In addition, the technique helps to identify important assumptions, and to break down large ambitions in manageable subprojects. Methods: The semi-formal specification technique took other models for problem structuring — common in fields such as operations research and business analytics — as a point of departure. The proposed technique was applied in 47 real data-analytic projects and refined based on the results, following a design-science approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Cocrystal Prediction Based on Deep Forest Model—A Case Study of Febuxostat.

Author: Chen, Jiahui, Li, Zhihui, Kang, Yanlei, and Li, Zhong
Subjects: ARTIFICIAL neural networks, FEBUXOSTAT, DATABASES, MACHINE learning, DEEP learning, PREDICTION models, FORECASTING
Abstract: To aid cocrystal screening, a deep forest-based cocrystal prediction model was developed in this study using data from the Cambridge Structural Database (CSD). The positive samples in the experiment came from the CSD. The negative samples were partly from the failure records in other papers, and some were randomly generated according to specific rules, resulting in a total of 8576 pairs. Compared with the models of traditional machine learning methods and simple deep neural networks models, the deep forest model has better performance and faster training speed. The accuracy is about 95% on the test set. Febuxostat cocrystal screening was also tested to verify the validity of the model. Our model correctly predicted the formation of cocrystal. It shows that our model is practically useful in practice. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. A Needle in a Cosmic Haystack: A Review of FRB Search Techniques.

Author: Rajwade, Kaustubh M. and van Leeuwen, Joeri
Subjects: PULSAR detection, RADIO interference, OBSERVATORIES, BIG data, RADIO telescopes, SOLAR radio bursts
Abstract: Ephemeral Fast Radio Bursts (FRBs) must be powered by some of the most energetic processes in the Universe. That makes them highly interesting in their own right, and as precise probes for estimating cosmological parameters. This field thus poses a unique challenge: FRBs must be detected promptly and immediately localised and studied based only on that single millisecond-duration flash. The problem is that the burst occurrence is highly unpredictable and that their distance strongly suppresses their brightness. Since the discovery of FRBs in single-dish archival data in 2007, detection software has evolved tremendously. Pipelines now detect bursts in real time within a matter of seconds, operate on interferometers, buffer high-time and frequency resolution data, and issue real-time alerts to other observatories for rapid multi-wavelength follow-up. In this paper, we review the components that comprise a FRB search software pipeline, we discuss the proven techniques that were adopted from pulsar searches, we highlight newer, more efficient techniques for detecting FRBs, and we conclude by discussing the proposed novel future methodologies that may power the search for FRBs in the era of big data astronomy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Navigating the Digital Landscape of Diabetes Care: Current State of the Art and Future Directions.

Author: Gonçalves, Helena, Silva, Firmino, Rodrigues, Catarina, and Godinho, António
Subjects: BLOOD sugar monitors, CONTINUOUS glucose monitoring, INSULIN pumps, MACHINE learning, GLYCEMIC control, MEDICAL personnel, DIABETES
Abstract: Diabetes mellitus remains a global health challenge, requiring innovative solutions for effective disease management. This paper offers a thorough analysis of diabetes technologies, highlighting their various roles in diabetes care. Through a thorough review of the literature and analysis of emerging trends, we explore the multifaceted impact of technology on diabetes care. We investigate the key role of continuous glucose monitoring systems, insulin pumps and smart insulin pens in achieving optimal glycaemic control. The paper also evaluates the integration of artificial intelligence and machine learning algorithms in predictive modelling for early detection of glucose fluctuations, ultimately preventing diabetes-related complications. Additionally, studies the potential of telemedicine and mobile applications in enhancing patient engagement and self-management. Moreover, the review covers advancements in closed-loop insulin delivery systems, offering insights into their clinical effectiveness and potential to revolutionize diabetes care. Ethical and privacy considerations related to the use of patient data in these technologies are discussed, emphasizing the importance of striking a balance between technological innovation and patient security. This paper's evidence synthesis underscores the increasing influence of diabetes technologies on patient outcomes, quality of life, and healthcare systems. It underscores the need for multidisciplinary collaboration between healthcare professionals, researchers and technology developers to ensure the seamless integration and accessibility of these tools to patients living with diabetes. This study serves as a valuable resource for clinicians, researchers, and policymakers, providing a comprehensive view of evolving diabetes technologies and their potential in the field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. The Development of the E-Commerce Market in Poland in the Conditions of Intensification of Migration Processes.

Author: Zatonatska, Tetiana, Fareniuk, Yana, Juscius, Vytautas, Martinkiene, Jurgita, and Maksymchuk, Olena
Subjects: RUSSIAN invasion of Ukraine, 2022-, CONSUMER behavior, ELECTRONIC commerce, SALES forecasting, CLUSTER analysis (Statistics), INTERNET stores
Abstract: The impact of the war in Ukraine and migration has affected the ecommerce markets of the recipient countries, presenting both opportunities, in the form of an increased consumer base, and challenges, such as the lack of a clear development vision. This paper aims to investigate the influence of migration processes on the development of e-commerce in Poland and examine the feasibility of using forecasting methods by e-commerce companies under these conditions for future activity planning. To fulfill the research objective, the following tasks were addressed: investigating the current state of e-commerce development influenced by migration processes; exploring modern migration processes and their impact on global economies; assessing the impact of migration from Ukraine on the Polish market; and analyzing a Polish online store to develop a model for forecasting data and planning activities under the influence of migration processes. To achieve this goal, three models were constructed: a multiple regression model to assess the level of migration processes' influence on e-commerce; a neural network to forecast sales for a Polish e-commerce store; and cluster analysis to identify clusters of goods most affected by migration processes. The study analyzed the nuances of modern migration processes and assessed the reverse effect of migration as a driver of e-commerce development. Migration stimulates e-commerce by altering consumer behavior and logistics routes, increasing exports and imports, and fostering the spread of digital entrepreneurship. Using data from a Polish online store, the study modeled the impact of market changes on the company's operations and identified the most significant factors. Thus, the analysis explored the impact of migration on ebusiness in Poland through constructed models. Regression analysis revealed that migration processes have contributed to the development of the Polish online store's sales, thanks to the increase in migrant consumers and rising price levels. A neural network was developed with machine learning, incorporating macroeconomic and demographic factors into its forecasting typology. Cluster analysis was employed to examine the online store's assortment, identifying clusters by sales volume and migrants' influence. The analysis determined that, following the onset of the migration movement, categories experiencing a surge in demand from refugees, such as baby food products, appliances, telephones, furniture, and communication devices, saw the most significant growth. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Compact Data Learning for Machine Learning Classifications.

Author: Kim, Song-Kyoo
Subjects: MACHINE learning, ARTIFICIAL intelligence, STATISTICAL accuracy, BIG data, CLASSIFICATION, ARRHYTHMIA
Abstract: This paper targets the area of optimizing machine learning (ML) training data by constructing compact data. The methods of optimizing ML training have improved and become a part of artificial intelligence (AI) system development. Compact data learning (CDL) is an alternative practical framework to optimize a classification system by reducing the size of the training dataset. CDL originated from compact data design, which provides the best assets without handling complex big data. CDL is a dedicated framework for improving the speed of the machine learning training phase without affecting the accuracy of the system. The performance of an ML-based arrhythmia detection system and its variants with CDL maintained the same statistical accuracy. ML training with CDL could be maximized by applying an 85% reduced input dataset, which indicated that a trained ML system could have the same statistical accuracy by only using 15% of the original training dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Urban Complexity and the Dynamic Evolution of Urban Land Functions in Yiwu City: A Micro-Analysis with Multi-Source Big Data.

Author: Zhou, Liangliang, Shi, Yishao, and Xie, Mengqiu
Subjects: URBAN land use, CITIES & towns, MACHINE learning, BIG data, URBAN planning, CLASSIFICATION, DATA warehousing
Abstract: The diversification of business forms leads to functional and spatial complexity in cities. The efficient determination of the complexity of an urban system is the basis for the scientific monitoring of the multi-functional aggregation within cities. Previous studies on the urban spatial structure were limited by the difficulty of collecting micro-data and the high time cost, and they focused on the macro-spatial structure, lacking fine-grained investigations of the micro-spatial structure. Additionally, high-resolution remote sensing images, which mainly rely on the textural characteristics of the spectrum of ground objects, cannot detect the social and economic functions of ground objects. Thus, it is difficult to meet the actual needs of urban planning and management. The purpose of this paper is to automatically identify the spatial heterogeneity and temporal variation of urban land use functions in the context of complex urban systems. The TF-IDF (term frequency–inverse document frequency) algorithm, a machine learning classification algorithm, and other methods are applied to identify the urban functions and distribution characteristics of the main urban area based on the POI (point of interest) data and urban form data. The results show the following: (1) From 2012 to 2022, all types of land use in Yiwu city grew at different rates, with logistics and warehousing space growing the fastest, which is in line with Yiwu's goal of building a national logistics center for trade and services. (2) The residential area has a spatial structure with a dense central circle and a scattered periphery extending from northeast to southwest and from east to west. (3) The commercial service sector shows clear spatial differentiation between the core and the periphery. The commercial functional areas of Niansanli, Houzhai, and Chengxi, where the number of commercial POIs is relatively small, are located at the intersection of the administrative subdistricts near the city center, indicating that the commercial economic activities of the downtown subdistrict have a certain spillover effect on adjacent subdistricts. (4) The public facilities of each subdistrict are generally located in the core of each subdistrict, which ensures better convenience and accessibility. (5) Industrial land with a large total area that is scattered and mixed with urban residential land gradually tends to be centralized, forming an industrial belt around the city. This study comprehensively considers the aggregation relationship between urban buildings and land use and improves the accuracy of land identification and functional zoning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Special Issue on Transportation Big Data and Its Applications.

Author: Ma, Xiaolei, Chen, Xinqiang, and Dai, Zhuang
Subjects: DEEP learning, ELECTRIC motor buses, INTELLIGENT transportation systems, PUBLIC transit ridership, BIG data, INFRASTRUCTURE (Economics), MACHINE learning, TRANSPORTATION planning
Abstract: This document is a summary of a special issue on transportation big data and its applications. The issue explores various topics related to big data in transportation, including architectures, processing methods, and utilization for tasks such as traffic pattern discovery, collision identification, and operational efficiency optimization. The issue includes sixteen papers covering a range of subjects, such as maritime traffic evaluation, urban traffic intersection extraction, and risk measurement and forecasting. The authors highlight the transformative potential of big data in transportation and express gratitude to the contributors for their insights. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

35. Prediction of Agricultural Commodity Prices using Big Data Framework.

Author: Rana, Humaira, Farooq, Muhammad Umer, Kazi, Abdul Karim, Baig, Mirza Adnan, and Akhtar, Muhammad Ali
Subjects: FARM produce prices, AGRICULTURAL prices, PRICES, MACHINE learning, AGRICULTURAL forecasts, BIG data, EMPLOYMENT statistics, STANDARD deviations
Abstract: The agriculture sector plays a crucial role in the economy of Pakistan, contributing significantly to the Gross Domestic Product (GDP) and the employment rate. However, this sector faces challenges such as climate change, water scarcity, and low productivity, which have a direct impact on agricultural commodity prices. Accurate forecasting of commodity prices is essential for farmers, traders, and policymakers to make informed decisions and improve economic outcomes. This paper explores the use of a big data framework for agricultural commodity price forecasting in Pakistan, using a historical dataset on commodity prices in various Pakistani cities from 2007 to 2022 and Apache Spark to preprocess and clean the data. Based on historical spinach prices in Vehari City, the machine learning models Auto-Regressive Moving Average (ARIMA), Random Forest, and Long-Short-Term Memory (LSTM) were applied to price trends, and their performance was compared using Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE) and squared correlation coefficient (R²). LSTM outperformed ARIMA and Random Forest with a higher R² value of 0.8 and the lowest MAE of 125.29. Such predictions can help farmers to effectively plan crop cultivation and traders to make well-informed decisions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Block size estimation for data partitioning in HPC applications using machine learning techniques.

Author: Cantini, Riccardo, Marozzo, Fabrizio, Orsino, Alessio, Talia, Domenico, Trunfio, Paolo, Badia, Rosa M., Ejarque, Jorge, and Vázquez-Novoa, Fernando
Subjects: MACHINE learning, SUPERVISED learning, HIGH performance computing, DISTRIBUTED computing
Abstract: The extensive use of HPC infrastructures and frameworks for running data-intensive applications has led to a growing interest in data partitioning techniques and strategies. In fact, application performance can be heavily affected by how data are partitioned, which in turn depends on the selected size for data blocks, i.e. the block size. Therefore, finding an effective partitioning, i.e. a suitable block size, is a key strategy to speed-up parallel data-intensive applications and increase scalability. This paper describes a methodology, namely BLEST-ML (BLock size ESTimation through Machine Learning), for block size estimation that relies on supervised machine learning techniques. The proposed methodology was evaluated by designing an implementation tailored to dislib, a distributed computing library highly focused on machine learning algorithms built on top of the PyCOMPSs framework. We assessed the effectiveness of the provided implementation through an extensive experimental evaluation considering different algorithms from dislib, datasets, and infrastructures, including the MareNostrum 4 supercomputer. The results we obtained show the ability of BLEST-ML to efficiently determine a suitable way to split a given dataset, thus providing a proof of its applicability to enable the efficient execution of data-parallel applications in high performance environments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Hybrid Clustering Algorithm Based on Improved Density Peak Clustering.

Author: Guo, Limin, Qin, Weijia, Cai, Zhi, and Su, Xing
Subjects: MACHINE learning, DENSITY, ALGORITHMS, BIG data
Abstract: In the era of big data, unsupervised learning algorithms such as clustering are particularly prominent. In recent years, there have been significant advancements in clustering algorithm research. The Clustering by Density Peaks algorithm is known as Clustering by Fast Search and Find of Density Peaks (density peak clustering). This clustering algorithm, proposed in Science in 2014, automatically finds cluster centers. It is simple, efficient, does not require iterative computation, and is suitable for large-scale and high-dimensional data. However, DPC and most of its refinements have several drawbacks. The method primarily considers the overall structure of the data, often resulting in the oversight of many clusters. The choice of truncation distance affects the calculation of local density values, and varying dataset sizes may necessitate different computational methods, impacting the quality of clustering results. In addition, the initial assignment of labels can cause a 'chain reaction', i.e., if one data point is incorrectly labeled, it may lead to more subsequent data points being incorrectly labeled. In this paper, we propose an improved density peak clustering method, DPC-MS, which uses the mean-shift algorithm to find local density extremes, making the accuracy of the algorithm independent of the parameter dc. After finding the local density extreme points, the allocation strategy of the DPC algorithm is employed to assign the remaining points to appropriate local density extreme points, forming the final clusters. The robustness of this method in handling uncertain dataset sizes adds some application value, and several experiments were conducted on synthetic and real datasets to evaluate the performance of the proposed method. The results show that the proposed method outperforms some of the more recent methods in most cases. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Deep Autoencoder-Based Hybrid Network for Building Energy Consumption Forecasting.

Author: Khan, Noman, Khan, Samee Ullah, and Sung Wook Baik
Subjects: ENERGY management, ENERGY consumption, DEEP learning, CONVOLUTIONAL neural networks, NONLINEAR analysis
Abstract: Energy management systems for residential and commercial buildingsmust use an appropriate and efficientmodel to predict energy consumption accurately. To deal with the challenges in powermanagement, the short-termPower Consumption (PC) prediction for household appliances plays a vital role in improving domestic and commercial energy efficiency. Big data applications and analytics have shown that data-driven load forecasting approaches can forecast PC in commercial and residential sectors and recognize patterns of electric usage in complex conditions. However, traditional Machine Learning (ML) algorithms and their features engineering procedure emphasize the practice of inefficient and ineffective techniques resulting in poor generalization. Additionally, different appliances in a home behave contrarily under distinct circumstances, making PC forecasting more challenging. To address these challenges, in this paper a hybrid architecture using an unsupervised learning strategy is investigated. The architecture integrates a one-dimensional Convolutional Neural Network (CNN) based Autoencoder (AE) and online sequential Extreme Learning Machine (ELM) for commercial and residential short-term PC forecasting. First, the load data of different buildings are collected and cleaned from various abnormalities. A subsequent step involves AE for learning a compressed representation of spatial features and sending them to the online sequential ELM to learn nonlinear relations and forecast the final load. Finally, the proposed network is demonstrated to achieve State-of-the-Art (SOTA) error metrics based on two benchmark PC datasets for residential and commercial buildings. The Mean Square Error (MSE) values obtained by the proposed method are 0.0147 and 0.0121 for residential and commercial buildings datasets, respectively. The obtained results prove that our model is suitable for the PC prediction of different types of buildings. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Reconciling spatiotemporal conjunction with digital twin for sequential travel time prediction and intelligent routing

Author: Chen, Claire Y. T., Sun, Edward W., and Lin, Yi-Bing
Published: 2024
Full Text: View/download PDF

40. Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis.

Author: Bano, Shahzadi, Zhi, Weimei, Qiu, Baozhi, Raza, Muhammad, Sehito, Nabila, Kamal, Mian Muhammad, Aldehim, Ghadah, and Alruwais, Nuha
Subjects: *MACHINE learning, *SKEWNESS (Probability theory), *CLASSIFICATION, *BIG data
Abstract: This research paper focuses on the challenges associated with learning classifiers from large-scale, highly imbalanced datasets prevalent in many real-world applications. Traditional algorithms learning often need better performance and high computational efficiency when dealing with imbalanced data. Factors such as class imbalance, noise, and class overlap make it demanding to learn effective classifiers. In this study, we propose a novel self-paced ensemble framework for classifying imbalanced data. The framework employs under-sampling to self-harmonize data hardness and build a robust ensemble. Extensive experimental testing demonstrates promising results in handling overlapping classes and skewed distributions while maintaining computational efficiency. The self-paced ensemble method addresses the challenges of high imbalance ratios, class overlap, and noise presence in large-scale imbalanced classification problems. By incorporating the knowledge of these challenges into our learning framework, we establish the concept of classification hardness distribution, and the self-paced ensemble is a revolutionary learning paradigm for massive imbalance categorization, capable of improving the performance of existing learning algorithms on imbalanced data and providing better results for future applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Research on Employee Performance Management Method Based on Big Data Improvement GWO-DELM Algorithms.

Author: Zhuyu Wang and Yue Liu
Subjects: JOB performance, BIG data, PERFORMANCE management, PERSONNEL management, DEEP learning, OPTIMIZATION algorithms, MACHINE learning
Abstract: INTRODUCTION: Accurate and objective human resources performance management evaluation methods are conducive to a comprehensive understanding of the real and objective situation of teachers, and are conducive to identifying the management, teaching and academic level of teachers, which enables teacher managers to have a clear understanding of the gaps and problems among teachers. OBJECTIVES: Aiming at the current human resources performance management evaluation method, there are evaluation indexes exist objectivity is not strong, poor precision, single method and other problems. METHODS: This research puts forward an intelligent optimisation algorithm based on the improvement of the depth of the limit of the learning machine network of human resources performance management evaluation method. (1) Through the analysis of the problems existing in the current human resources performance management, select the human resources performance management evaluation indexes, and construct the human resources performance management evaluation system; (2) Through the multi-strategy grey wolf optimization algorithm method to improve the deep learning network, and construct the evaluation model of the human resources performance management in colleges; (3) The analysis of simulation experiments verifies the high precision and real-time nature of the proposed method. RESULTS: The results show that the proposed method improves the precision of the evaluation model, improves the prediction time. CONCLUSION: This research solves the problems of low precision and non-objective system indicators of human resource performance management evaluation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. ICS-IDS: application of big data analysis in AI-based intrusion detection systems to identify cyberattacks in ICS networks

Author: Ali, Bakht Sher, Ullah, Inam, Al Shloul, Tamara, Khan, Izhar Ahmed, Khan, Ijaz, Ghadi, Yazeed Yasin, Abdusalomov, Akmalbek, Nasimov, Rashid, Ouahada, Khmaies, and Hamam, Habib
Published: 2024
Full Text: View/download PDF

43. A Consumer Behavior Analysis Framework toward Improving Market Performance Indicators: Saudi's Retail Sector as a Case Study.

Author: Alawadh, Monerah and Barnawi, Ahmed
Subjects: CONSUMER behavior, BEHAVIORAL assessment, BIG data, CONSUMERS, TRANSACTION records, DATA analysis
Abstract: Studying customer behavior and anticipating future trends is a challenging task, as customer behavior is complex and constantly evolving. To effectively anticipate future trends, businesses need to analyze large amounts of data, use sophisticated analytical techniques, and stay up-to-date with the latest research and industry trends. In this paper, we propose a comprehensive framework to identify trends in consumer behavior using multiple layers of processing, including clustering, classification, and association rule learning. The aim is to help a major retailer in Saudi Arabia better understand customer behavior by utilizing the power of big data analysis. The proposed framework is presented as being generalized to gain insight into the generated big data and enable data-driven decision-making in other relevant domains. We developed this framework in collaboration with a large supermarket chain in Saudi Arabia, which provided us with over 1,000,000 sales transaction records belonging to around 30,000 of their loyal customers. In this study, we apply our proposed framework to those data as a case study and present our initial results of consumer clustering and association rules for each cluster. Moreover, we analyze our findings to figure out how we can further utilize intelligence to predict customer behavior in clustered groups. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Deep Learning in Asset Pricing.

Author: Chen, Luyang, Pelger, Markus, and Zhu, Jason
Subjects: ARTIFICIAL neural networks, DEEP learning, PRICES, RATE of return on stocks, STOCK price forecasting, SHARPE ratio
Abstract: We use deep neural networks to estimate an asset pricing model for individual stock returns that takes advantage of the vast amount of conditioning information, keeps a fully flexible form, and accounts for time variation. The key innovations are to use the fundamental no-arbitrage condition as criterion function to construct the most informative test assets with an adversarial approach and to extract the states of the economy from many macroeconomic time series. Our asset pricing model outperforms out-of-sample all benchmark approaches in terms of Sharpe ratio, explained variation, and pricing errors and identifies the key factors that drive asset prices. This paper was accepted by Agostino Capponi, finance. Supplemental Material: The online appendix and data are available at https://doi.org/10.1287/mnsc.2023.4695. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Big data analytics deep learning techniques and applications: A survey.

Author: Selmy, Hend A., Mohamed, Hoda K., and Medhat, Walaa
Subjects: *DEEP learning, *BIG data, *IMAGE recognition (Computer vision), *MACHINE learning
Abstract: • This paper provides an in-depth review of the latest deep learning methods for use in big data analytics. • Explain the importance of deep learning, its taxonomy, and big data analytics techniques. • Explores deep learning approaches in IoT data applications, including their complexities and limitations. • Suggests using deep learning techniques in many data-intensive applications and benchmarked frameworks and datasets. • A comparison of established approaches with deep learning methods in big data analytics is offered. Deep learning (DL), as one of the most active machine learning research fields, has achieved great success in numerous scientific and technological disciplines, including speech recognition, image classification, language processing, big data analytics, and many more. Big data analytics (BDA), where raw data is often unlabeled or uncategorized, can greatly benefit from DL because of its ability to analyze and learn from enormous amounts of unstructured data. This survey paper tackles a comprehensive overview of state-of-the-art DL techniques applied in BDA. The main target of this survey is intended to illustrate the significance of DL and its taxonomy and detail the basic techniques used in BDA. It also explains the DL techniques used in big IoT data applications as well as their various complexities and challenges. The survey presents various real-world data-intensive applications where DL techniques can be applied. In particular, it concentrates on the DL techniques in accordance with the BDA type for each application domain. Additionally, the survey examines DL benchmarked frameworks used in BDA and reviews the available benchmarked datasets, besides analyzing the strengths and limitations of each DL technique and their suitable applications. Further, a comparative analysis is also presented by comparing existing approaches to the DL methods used in BDA. Finally, the challenges of DL modeling and future directions are discussed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Bulk Power Systems Emergency Control Based on Machine Learning Algorithms and Phasor Measurement Units Data: A State-of-the-Art Review

Author: Mihail Senyuk, Svetlana Beryozkina, Murodbek Safaraliev, Andrey Pazderin, Ismoil Odinaev, Viktor Klassen, Alena Savosina, and Firuz Kamalov
Subjects: power system, big data, machine learning, emergency control, synchronous generator, small signal stability, Technology
Abstract: Modern electrical power systems are characterized by a high rate of transient processes, the use of digital monitoring and control systems, and the accumulation of a large amount of technological information. The active integration of renewable energy sources contributes to reducing the inertia of power systems and changing the nature of transient processes. As a result, the effectiveness of emergency control systems decreases. Traditional emergency control systems operate based on the numerical analysis of power system dynamic models. This allows for finding the optimal set of preventive commands (solutions) in the form of disconnections of generating units, consumers, transmission lines, and other primary grid equipment. Thus, the steady-state or transient stability of a power system is provided. After the active integration of renewable sources into power systems, traditional emergency control algorithms became ineffective due to the time delay in finding the optimal set of control actions. Currently, machine learning algorithms are being developed that provide high performance and adaptability. This paper contains a meta-analysis of modern emergency control algorithms for power systems based on machine learning and synchronized phasor measurement data. It describes algorithms for determining disturbances in the power system, selecting control actions to maintain transient and steady-state stability, stability in voltage level, and limiting frequency. This study examines 53 studies piled on the development of a methodology for analyzing the stability of power systems based on ML algorithms. The analysis of the research is carried out in terms of accuracy, computational latency, and data used in training and testing. The most frequently used textual mathematical models of power systems are determined, and the most suitable ML algorithms for use in the operational control circuit of power systems in real time are determined. This paper also provides an analysis of the advantages and disadvantages of existing algorithms, as well as identifies areas for further research.
Published: 2024
Full Text: View/download PDF

47. AI, Data, and Digitalization

Author: Akerkar, Rajendra
Subjects: data, big data, artificial intelligence, ethical AI, causal effects, mobile data, geolocation data, social data, machine learning, knowledge graphs, thema EDItEUR::U Computing and Information Technology::UK Computer hardware, thema EDItEUR::U Computing and Information Technology::UK Computer hardware::UKN Network hardware, thema EDItEUR::U Computing and Information Technology::UY Computer science::UYD Systems analysis and design, thema EDItEUR::U Computing and Information Technology::UM Computer programming / software engineering::UMB Algorithms and data structures, thema EDItEUR::G Reference, Information and Interdisciplinary subjects::GP Research and information: general::GPF Information theory, thema EDItEUR::G Reference, Information and Interdisciplinary subjects::GP Research and information: general::GPJ Coding theory and cryptology
Abstract: This open access book constitutes the revised selected papers of the First International Symposium on AI, Data and Digitalization, SAIDD 2023, held in Sogndal, Norway, during May 9–10, 2023. The 13 full papers included in this volume were carefully reviewed and selected from 42 submissions. The papers deal with the impact of data and AI on the digital revolution and their contribution to solving societal challenges.
Published: 2024
Full Text: View/download PDF

48. Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database.

Author: Napravnik, Mateja, Hržić, Franko, Tschauner, Sebastian, and Štajduhar, Ivan
Subjects: COMPUTER-aided diagnosis, COMPUTER-assisted image analysis (Medicine), RADIOLOGY, DIAGNOSTIC imaging, MULTIMODAL user interfaces, MACHINE learning, MEDICAL databases
Abstract: Background: The use of machine learning in medical diagnosis and treatment has grown significantly in recent years with the development of computer-aided diagnosis systems, often based on annotated medical radiology images. However, the lack of large annotated image datasets remains a major obstacle, as the annotation process is time-consuming and costly. This study aims to overcome this challenge by proposing an automated method for annotating a large database of medical radiology images based on their semantic similarity. Results: An automated, unsupervised approach is used to create a large annotated dataset of medical radiology images originating from the Clinical Hospital Centre Rijeka, Croatia. The pipeline is built by data-mining three different types of medical data: images, DICOM metadata and narrative diagnoses. The optimal feature extractors are then integrated into a multimodal representation, which is then clustered to create an automated pipeline for labelling a precursor dataset of 1,337,926 medical images into 50 clusters of visually similar images. The quality of the clusters is assessed by examining their homogeneity and mutual information, taking into account the anatomical region and modality representation. Conclusions: The results indicate that fusing the embeddings of all three data sources together provides the best results for the task of unsupervised clustering of large-scale medical data and leads to the most concise clusters. Hence, this work marks the initial step towards building a much larger and more fine-grained annotated dataset of medical radiology images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Analysis of Campus Catering Data Using Machine Learning.

Author: Chien-Min Chen, Chen-Sheng Li, and Shih-Pang Tseng
Subjects: REAL economy, BIG data, DECISION trees, CATERING services, DATA analysis
Abstract: At present, sensors and big data create powerful systems capable of real-time monitoring and decision-making. The development of big data in all walks of life is very fast. Big data technology and applications are also gradually accepted by the public, and the data industry is gradually maturing and beginning to enter a rapid development stage. At the same time, the development of the Internet is making data analysis more accurate, and the combination of the two complements each other, contributing to the good development of big data. With the rapid development and wide application of machine learning technology, its application in all walks of life has become increasingly widespread. In this work, we collect the business data of campus restaurants in different time periods to ensure the breadth and depth of the data. The regression algorithm and decision tree algorithm in machine learning are used to integrate and analyze the collected data to reflect the demand tendency of the general public and also the relationship between cost and benefit. We analyze the catering needs of different consumer groups, develop big data applications with greater potential value, and seek long-term development for catering real economy enterprises. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. (Re)-discovering simulation as a critical element of OM/SCM research: call for research.

Author: Melnyk, Steven Alexander, Thürer, Matthias, Blome, Constantin, Schoenherr, Tobias, and Gold, Stefan
Subjects: SUPPLY chain management, PRODUCTION management (Manufacturing), ARTIFICIAL intelligence, OPERATIONS management, MACHINE learning
Abstract: Purpose: This study focuses on (re-)introducing computer simulation as a part of the research paradigm. Simulation is a widely applied research method in supply chain and operations management. However, leading journals, such as the International Journal of Operations and Production Management, have often been reluctant to accept simulation studies. This study provides guidelines on how to conduct simulation research that advances theory, is relevant, and matters. Design/methodology/approach: This study pooled the viewpoints of the editorial team of the International Journal of Operations and Production Management and authors of simulation studies. The authors debated their views and outlined why simulation is important and what a compelling simulation should look like. Findings: There is an increasing importance of considering uncertainty, an increasing interest in dynamic phenomena, such as the transient response(s) to disruptions, and an increasing need to consider complementary outcomes, such as sustainability, which many researchers believe can be tackled by big data and modern analytical tools. But building, elaborating, and testing theory by purposeful experimentation is the strength of computer simulation. The authors therefore argue that simulation should play an important role in supply chain and operations management research, but for this, it also has to evolve away from simply generating and analyzing data. Four types of simulation research with much promise are outlined: empirical grounded simulation, simulation that establishes causality, simulation that supplements machine learning, artificial intelligence and analytics and simulation for sensitive environments. Originality/value: This study identifies reasons why simulation is important for understanding and responding to today's business and societal challenges, it provides some guidance on how to design good simulation studies in this context and it links simulation to empirical research and theory going beyond multimethod studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

114 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources