412 results on '"data synthesis"'
Search Results
2. TabSAL: Synthesizing Tabular data with Small agent Assisted Language models
- Author
-
Li, Jiale, Qian, Run, Tan, Yandan, Li, Zhixin, Chen, Luyu, Liu, Sen, Wu, Jie, and Chai, Hongfeng
- Published
- 2024
- Full Text
- View/download PDF
3. User-perceptional privacy protection in NILM: A differential privacy approach
- Author
-
Zhang, Jiahao, Lu, Chenbei, Yi, Hongyu, and Wu, Chenye
- Published
- 2025
- Full Text
- View/download PDF
4. Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review
- Author
-
Gangwal, Amit, Ansari, Azim, Ahmad, Iqrar, Azad, Abul Kalam, and Wan Sulaiman, Wan Mohd Azizi
- Published
- 2024
- Full Text
- View/download PDF
5. A novel robust data synthesis method based on feature subspace interpolation to optimize samples with unknown noise
- Author
-
Du, Yukun, Cai, Yitao, Jin, Xiao, Yu, Haiyue, Lou, Zhilong, Li, Yao, Jiang, Jiang, and Wang, Yongxiong
- Published
- 2025
- Full Text
- View/download PDF
6. Comparative Analysis of Systematic, Scoping, Umbrella, and Narrative Reviews in Clinical Research: Critical Considerations and Future Directions.
- Author
-
Motevalli, Mohamad and Xie, Zhongqiu
- Abstract
Review studies play a key role in the development of clinical practice by synthesizing data and drawing conclusions from multiple scientific sources. In recent decades, there has been a significant increase in the number of review studies conducted and published by researchers. In clinical research, different types of review studies (systematic, scoping, umbrella, and narrative reviews) are conducted with different objectives and methodologies. Despite the abundance of guidelines for conducting review studies, researchers often face challenges in selecting the most appropriate review method, mainly due to their overlapping characteristics, including the complexity of matching review types to specific research questions. The aim of this article is to compare the main features of systematic, scoping, umbrella, and narrative reviews in clinical research and to address key considerations for selecting the most appropriate review approach. It also discusses future opportunities for updating their strategies based on emerging trends in clinical research. Understanding the differences between review approaches will help researchers, practitioners, journalists, and policymakers to effectively navigate the complex and evolving field of scientific research, leading to informed decisions that ultimately enhance the overall quality of healthcare practices. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
7. Synthesis of higher‐B0 CEST Z‐spectra from lower‐B0 data via deep learning and singular value decomposition.
- Author
-
Yan, Mengdi, Bie, Chongxue, Jia, Wentao, Liu, Chuyu, He, Xiaowei, and Song, Xiaolei
- Subjects
ARTIFICIAL neural networks ,SINGULAR value decomposition ,MAGNETIZATION transfer ,DEEP learning ,EGG whites - Abstract
Chemical exchange saturation transfer (CEST) MRI at 3 T suffers from low specificity due to overlapping CEST effects from multiple metabolites, while higher field strengths (B0) allow for better separation of Z‐spectral "peaks," aiding signal interpretation and quantification. However, data acquisition at higher B0 is restricted by equipment access, field inhomogeneity and safety issues. Herein, we aim to synthesize higher‐B0Z‐spectra from readily available data acquired with 3 T clinical scanners using a deep learning framework. Trained with simulation data using models based on Bloch–McConnell equations, this framework comprised two deep neural networks (DNNs) and a singular value decomposition (SVD) module. The first DNN identified B0 shifts in Z‐spectra and aligned them to correct frequencies. After B0 correction, the lower‐B0Z‐spectra were streamlined to the second DNN, casting into the key feature representations of higher‐B0Z‐spectra, obtained through SVD truncation. Finally, the complete higher‐B0Z‐spectra were recovered from inverse SVD, given the low‐rank property of Z‐spectra. This study constructed and validated two models, a phosphocreatine (PCr) model and a pseudo‐in‐vivo one. Each experimental dataset, including PCr phantoms, egg white phantoms, and in vivo rat brains, was sequentially acquired on a 3 T human and a 9.4 T animal scanner. Results demonstrated that the synthetic 9.4 T Z‐spectra were almost identical to the experimental ground truth, showing low RMSE (0.11% ± 0.0013% for seven PCr tubes, 1.8% ± 0.2% for three egg white tubes, and 0.79% ± 0.54% for three rat slices) and high R2 (>0.99). The synthesized amide and NOE contrast maps, calculated using the Lorentzian difference, were also well matched with the experiments. Additionally, the synthesis model exhibited robustness to B0 inhomogeneities, noise, and other acquisition imperfections. In conclusion, the proposed framework enables synthesis of higher‐B0Z‐spectra from lower‐B0 ones, which may facilitate CEST MRI quantification and applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Ursodeoxycholic acid and COVID-19 outcomes: a cohort study and data synthesis of state-of-art evidence.
- Author
-
Yu, Yang, Li, Guo-Fu, Li, Jian, Han, Lu-Yao, Zhang, Zhi-Long, Liu, Tian-Shuo, Jiao, Shu-Xin, Qiao, Yu-Wei, Zhang, Na, Zhan, De-Chuan, Tang, Shao-Qiu, and Yu, Guo
- Abstract
Background: The potential of ursodeoxycholic acid (UDCA) in inhibiting angiotensin-converting enzyme 2 was demonstrated. However, conflicting evidence emerged regarding the association between UDCA and COVID-19 outcomes, prompting the need for a comprehensive investigation. Research design and methods: Patients diagnosed with COVID-19 infection were retrospectively analyzed and divided into two groups: the UDCA-treated group and the control group. Kaplan–Meier recovery analysis and Cox proportional hazards models were used to evaluate the recovery time and hazard ratios. Additionally, study-level pooled analyses for multiple clinical outcomes were performed. Results: In the 115-patient cohort, UDCA treatment was significantly associated with a reduced recovery time. The subgroup analysis suggests that the 300 mg subgroup had a significant (adjusted hazard ratio: 1.63 [95% CI, 1.01 to 2.60]) benefit with a shorter duration of fever. The results of pooled analyses also show that UDCA treatment can significantly reduce the incidence of severe/critical diseases in COVID-19 (adjusted odds ratio: 0.68 [95% CI, 0.50 to 0.94]). Conclusions: UDCA treatment notably improves the recovery time following an Omicron strain infection without observed safety concerns. These promising results advocate for UDCA as a viable treatment for COVID-19, paving the way for further large-scale and prospective research to explore the full potential of UDCA. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Predicting blood transfusions for coronary artery bypass graft patients using deep neural networks and synthetic data.
- Author
-
Tsai, Hsiao-Tien, Wu, Jichong, Gupta, Puneet, Heinz, Eric R., and Jafari, Amir
- Subjects
- *
ARTIFICIAL neural networks , *CORONARY artery bypass , *BLOOD transfusion , *CARDIAC surgery - Abstract
Coronary Artery Bypass Graft (CABG) is a common cardiac surgery, but it continues to have many associated risks, including the need for blood transfusions. Previous research has shown that blood transfusion during CABG surgery is associated with an increased risk for infection and mortality. The current study aims to use modern techniques, such as deep neural networks and data synthesis, to develop models that can best predict the need for blood transfusion among CABG patients. Results show that neural networks with synthetic data generated by DataSynthesizer have the best performance. Implications of results and future directions are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Not just to know more, but to also know better: How data analysis-synthesis can be woven into sport science practiced as an art of inquiry.
- Author
-
Sullivan, Mark O., Vaughan, James, and Woods, Carl T.
- Subjects
- *
DATA analysis , *SPORTS sciences , *RESEARCH , *INQUIRY (Theory of knowledge) , *LEARNING , *SPORTS , *SOCIOCULTURAL factors - Abstract
Utilising novel ways of knowing, aligned with an ecological approach, the Learning in Development Research Framework (LDRF) has been introduced as a different way to guide research and practice in sport. A central feature of this framework is an appreciation of researcher embeddedness; positioned as an inhabitant who follows along with the unfolding inquiry. This positioning is integral for enriching ones understanding of the relations between socio-cultural constraints and affordances for skill learning within a sports organisation. Moreover, the notion of embeddedness foregrounds the ongoing nature of inquiry when practiced as an art of inquiry. In an effort to extend these ideas, this paper highlights how a phronetic iterative approach to data analysis-synthesis could be undertaken, while ensuring that the researcher remains 'in touch' with a phenomenon, and thus faithful to key tenets of research practiced as an art of inquiry. To illustrate this, we present a 'walk-through' from a recent LDRF study. Rather than focusing on data collection or recorded observations made from afar, this walk-through shows how a researcher, practicing an art of inquiry, can grow knowledge of and with the phenomena, enriching the evolution of practice and performance from within an ecology of relations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. 基于数据合成的飞行器结构损伤状态快速识别方法.
- Author
-
王浩渊, 粟华, 李鹏, and 龚春林
- Subjects
DIGITAL twins ,AIRFRAMES ,STRUCTURAL health monitoring ,DATABASES ,PROBLEM solving - Abstract
Copyright of Systems Engineering & Electronics is the property of Journal of Systems Engineering & Electronics Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
12. Data-Centric Benchmarking of Neural Network Architectures for the Univariate Time Series Forecasting Task
- Author
-
Philipp Schlieper, Mischa Dombrowski, An Nguyen, Dario Zanca, and Bjoern Eskofier
- Subjects
deep learning ,time series ,neural networks ,model selection ,data synthesis ,univariate forecasting ,Science (General) ,Q1-390 ,Mathematics ,QA1-939 - Abstract
Time series forecasting has witnessed a rapid proliferation of novel neural network approaches in recent times. However, performances in terms of benchmarking results are generally not consistent, and it is complicated to determine in which cases one approach fits better than another. Therefore, we propose adopting a data-centric perspective for benchmarking neural network architectures on time series forecasting by generating ad hoc synthetic datasets. In particular, we combine sinusoidal functions to synthesize univariate time series data for multi-input-multi-output prediction tasks. We compare the most popular architectures for time series, namely long short-term memory (LSTM) networks, convolutional neural networks (CNNs), and transformers, and directly connect their performance with different controlled data characteristics, such as the sequence length, noise and frequency, and delay length. Our findings suggest that transformers are the best architecture for dealing with different delay lengths. In contrast, for different noise and frequency levels and different sequence lengths, LSTM is the best-performing architecture by a significant amount. Based on our insights, we derive recommendations which allow machine learning (ML) practitioners to decide which architecture to apply, given the dataset’s characteristics.
- Published
- 2024
- Full Text
- View/download PDF
13. Data synthesis for biodiversity science: a database on plant diversity of the Indian Himalayan Region.
- Author
-
Wani, Sajad Ahmad, Khuroo, Anzar Ahmad, Zaffar, Nowsheena, Rafiqi, Safoora, Farooq, Iram, Afzal, Shahida, and Rashid, Irfan
- Subjects
LIFE history theory ,GLOBAL environmental change ,PLANT diversity ,NUMBERS of species ,SPECIES diversity - Abstract
In an era of global environmental change, empirical synthesis of biodiversity data across geographic scales and taxonomic groups is urgently required. Recently, with an upsurge in data synthesis, substantial progress has been made in making massive biodiversity data available on a global scale. However, most of these databases lack sufficient geographic coverage, particularly from biodiversity hotspot regions of developing countries. Here, we present a comprehensive and curated plant database of the Indian Himalayan Region (IHR) – home to two global biodiversity hotspots. The database, currently comprising 11,743 native plant species, has been collated from an extensive quantitative synthesis of 324 floristic studies published between 1872 and 2022. Based on this database, we investigate the patterns of species richness, distribution, life-history traits, endemic and threat status of the native flora of the IHR, and the results revealed that these patterns vary considerably among the 12 states of the IHR. Sikkim harbours the highest number of plant species (5090), followed by Arunachal Pradesh (4907). We found a total of 1123 species (ca. 10%) endemic to India and 157 threatened species occurring in the IHR. The life-history traits (growth form and lifespan) were unequally represented between the Himalaya and the Indo-Burma hotspots. We found herbs as the dominant growth form across the IHR. Also, maximum species similarity was found between Jammu and Kashmir and Himachal Pradesh (Cs = 0.66), and minimum between the former and Meghalaya (Cs = 0.13). Overall, our study represents a significant step forward in filling the knowledge gaps from the global biodiversity hotspots in India, with immense management and policy implications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Data-Centric Benchmarking of Neural Network Architectures for the Univariate Time Series Forecasting Task.
- Author
-
Schlieper, Philipp, Dombrowski, Mischa, Nguyen, An, Zanca, Dario, and Eskofier, Bjoern
- Subjects
CONVOLUTIONAL neural networks ,DEEP learning ,TIME series analysis ,TRANSFORMER models ,MACHINE learning - Abstract
Time series forecasting has witnessed a rapid proliferation of novel neural network approaches in recent times. However, performances in terms of benchmarking results are generally not consistent, and it is complicated to determine in which cases one approach fits better than another. Therefore, we propose adopting a data-centric perspective for benchmarking neural network architectures on time series forecasting by generating ad hoc synthetic datasets. In particular, we combine sinusoidal functions to synthesize univariate time series data for multi-input-multi-output prediction tasks. We compare the most popular architectures for time series, namely long short-term memory (LSTM) networks, convolutional neural networks (CNNs), and transformers, and directly connect their performance with different controlled data characteristics, such as the sequence length, noise and frequency, and delay length. Our findings suggest that transformers are the best architecture for dealing with different delay lengths. In contrast, for different noise and frequency levels and different sequence lengths, LSTM is the best-performing architecture by a significant amount. Based on our insights, we derive recommendations which allow machine learning (ML) practitioners to decide which architecture to apply, given the dataset's characteristics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Using machine learning to reveal seasonal nutrient dynamics and their impact on chlorophyll-a levels in lake ecosystems: A focus on nitrogen and phosphorus
- Author
-
Yong Fang, Ruting Huang, and Xianyang Shi
- Subjects
Eutrophication ,Nutrients ,Machine learning ,Data synthesis ,Seasons ,Ecology ,QH540-549.5 - Abstract
Chlorophyll-a (Chl-a) is a pivotal indicator of lake eutrophication. Studies examining nutrients limiting lake eutrophication at large scales have traditionally focused on summer and autumn, potentially limiting the applicability of their findings. This study encompasses 86 state-controlled points in the Eastern China Basin, spanning data collected from January 2020 to July 2023. Furthermore, we focus on the application of three machine-learning models (i.e., eXtreme Gradient Boosting, Support Vector Machines, and Naive Bayes Classifier) to analyze the seasonal nutrient dynamics in lake ecosystems. We categorized the monitoring data by season to eliminate outliers and employed adaptive synthetic sampling to address data imbalance issues. The results reveal that the direct correlations between total nitrogen (TN), total phosphorus (TP), and TP in conjunction with turbidity and Chl-a are broadly weak, possibly because of geographic variations, nutrient lag effects on algae, and differences in algal community composition. However, probabilistic analyses revealed that as TP or TN levels transitioned from oligo-mesotrophic (O) to eutrophic (E), TP exhibited a greater influence on the variation in Chl-a status than TN during spring and winter (p
- Published
- 2024
- Full Text
- View/download PDF
16. Spatiotemporal data fusion and deep learning for remote sensing-based sustainable urban planning
- Author
-
Jadhav, Sachin, Durairaj, M., Reenadevi, R., Subbulakshmi, R., Gupta, Vaishali, and Ramesh, Janjhyam Venkata Naga
- Published
- 2024
- Full Text
- View/download PDF
17. Advanced integration of 2DCNN-GRU model for accurate identification of shockable life-threatening cardiac arrhythmias: a deep learning approach.
- Author
-
Ba Mahel, Abduljabbar S., Shenghong Cao, Kaixuan Zhang, Chelloug, Samia Allaoua, Alnashwan, Rana, and Ali Muthanna, Mohammed Saleh
- Subjects
ARRHYTHMIA ,DEEP learning ,VENTRICULAR tachycardia ,VENTRICULAR fibrillation ,CARDIAC patients ,CARDIOVASCULAR diseases - Abstract
Cardiovascular diseases remain one of the main threats to human health, significantly affecting the quality and life expectancy. Effective and prompt recognition of these diseases is crucial. This research aims to develop an effective novel hybrid method for automatically detecting dangerous arrhythmias based on cardiac patients’ short electrocardiogram (ECG) fragments. This study suggests using a continuous wavelet transform (CWT) to convert ECG signals into images (scalograms) and examining the task of categorizing short 2-s segments of ECG signals into four groups of dangerous arrhythmias that are shockable, including ventricular flutter (C1), ventricular fibrillation (C2), ventricular tachycardia torsade de pointes (C3), and high-rate ventricular tachycardia (C4). We propose developing a novel hybrid neural network with a deep learning architecture to classify dangerous arrhythmias. This work utilizes actual electrocardiogram (ECG) data obtained from the PhysioNet database, alongside artificially generated ECG data produced by the Synthetic Minority Over-sampling Technique (SMOTE) approach, to address the issue of imbalanced class distribution for obtaining an accuracy-trained model. Experimental results demonstrate that the proposed approach achieves high accuracy, sensitivity, specificity, precision, and an F1-score of 97.75%, 97.75%, 99.25%, 97.75%, and 97.75%, respectively, in classifying all the four shockable classes of arrhythmias and are superior to traditional methods. Our work possesses significant clinical value in real-life scenarios since it has the potential to significantly enhance the diagnosis and treatment of lifethreatening arrhythmias in individuals with cardiac disease. Furthermore, our model also has demonstrated adaptability and generality for two other datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Tackling Challenges in Data Pooling: Missing Data Handling in Latent Variable Models with Continuous and Categorical Indicators.
- Author
-
Chen, Lihan, Miočević, Milica, and Falk, Carl F.
- Subjects
- *
LATENT variables , *CONFIRMATORY factor analysis , *MISSING data (Statistics) , *DISTRIBUTION (Probability theory) , *RESEARCH personnel - Abstract
Data pooling is a powerful strategy in empirical research. However, combining multiple datasets often results in a large amount of missing data, as variables that are not present in some datasets effectively contain missing values for all participants in those datasets. Furthermore, data pooling typically leads to a mix of continuous and categorical items with nonnormal multivariate distributions. We investigated two popular approaches to handle missing data in this context: (1) applying direct maximum likelihood by treating data as continuous (con-ML), and (2) applying categorical least squares using a polychoric correlation matrix computed from pairwise deletion (cat-LS). These approaches are available for free and relatively straightforward for empirical researchers to implement. Through simulation studies with confirmatory factor analysis and latent mediation analysis, we found cat-LS to be unsuitable for pooled data analysis, whereas con-ML yielded acceptable performance for the estimation of latent path coefficients barring severe nonnormality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Efficient Wheat Head Segmentation with Minimal Annotation: A Generative Approach.
- Author
-
Myers, Jaden, Najafian, Keyhan, Maleki, Farhad, and Ovens, Katie
- Subjects
GENERATIVE adversarial networks ,SUPERVISED learning ,IMAGE processing ,WHEAT - Abstract
Deep learning models have been used for a variety of image processing tasks. However, most of these models are developed through supervised learning approaches, which rely heavily on the availability of large-scale annotated datasets. Developing such datasets is tedious and expensive. In the absence of an annotated dataset, synthetic data can be used for model development; however, due to the substantial differences between simulated and real data, a phenomenon referred to as domain gap, the resulting models often underperform when applied to real data. In this research, we aim to address this challenge by first computationally simulating a large-scale annotated dataset and then using a generative adversarial network (GAN) to fill the gap between simulated and real images. This approach results in a synthetic dataset that can be effectively utilized to train a deep-learning model. Using this approach, we developed a realistic annotated synthetic dataset for wheat head segmentation. This dataset was then used to develop a deep-learning model for semantic segmentation. The resulting model achieved a Dice score of 83.4% on an internal dataset and Dice scores of 79.6% and 83.6% on two external datasets from the Global Wheat Head Detection datasets. While we proposed this approach in the context of wheat head segmentation, it can be generalized to other crop types or, more broadly, to images with dense, repeated patterns such as those found in cellular imagery. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Enhanced Pet Behavior Prediction via S2GAN-Based Heterogeneous Data Synthesis.
- Author
-
Kim, Jinah and Moon, Nammee
- Subjects
GENERATIVE adversarial networks ,PREDICTION models - Abstract
Heterogeneous data have been used to enhance behavior prediction performance; however, it involves issues such as missing data, which need to be addressed. This paper proposes enhanced pet behavior prediction via Sensor to Skeleton Generative Adversarial Networks (S2GAN)-based heterogeneous data synthesis. The S2GAN model synthesizes the key features of video skeletons based on collected nine-axis sensor data and replaces missing data, thereby enhancing the accuracy of behavior prediction. In this study, data collected from 10 pets in a real-life-like environment were used to conduct recognition experiments on 9 commonly occurring types of indoor behavior. Experimental results confirmed that the proposed S2GAN-based synthesis method effectively resolves possible missing data issues in real environments and significantly improves the performance of the pet behavior prediction model. Additionally, by utilizing data collected under conditions similar to the real environment, the method enables more accurate and reliable behavior prediction. This research demonstrates the importance and utility of synthesizing heterogeneous data in behavior prediction, laying the groundwork for applications in various fields such as abnormal behavior detection and monitoring. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Synthesis methods used to combine observational studies and randomised trials in published meta-analyses
- Author
-
Cherifa Cheurfa, Sofia Tsokani, Katerina-Maria Kontouli, Isabelle Boutron, and Anna Chaimani
- Subjects
Data synthesis ,Non-randomised studies ,Comparative effectiveness, heterogeneous designs ,Medicine - Abstract
Abstract Background This study examined the synthesis methods used in meta-analyses pooling data from observational studies (OSs) and randomised controlled trials (RCTs) from various medical disciplines. Methods We searched Medline via PubMed to identify reports of systematic reviews of interventions, including and pooling data from RCTs and OSs published in 110 high-impact factor general and specialised journals between 2015 and 2019. Screening and data extraction were performed in duplicate. To describe the synthesis methods used in the meta-analyses, we considered the first meta-analysis presented in each article. Results Overall, 132 reports were identified with a median number of included studies of 14 [9–26]. The median number of OSs was 6.5 [3–12] and that of RCTs was 3 [1–6]. The effect estimates recorded from OSs (i.e., adjusted or unadjusted) were not specified in 82% (n = 108) of the meta-analyses. An inverse-variance common-effect model was used in 2% (n = 3) of the meta-analyses, a random-effects model was used in 55% (n = 73), and both models were used in 40% (n = 53). A Poisson regression model was used in 1 meta-analysis, and 2 meta-analyses did not report the model they used. The mean total weight of OSs in the studied meta-analyses was 57.3% (standard deviation, ± 30.3%). Only 44 (33%) meta-analyses reported results stratified by study design. Of them, the results between OSs and RCTs had a consistent direction of effect in 70% (n = 31). Study design was explored as a potential source of heterogeneity in 79% of the meta-analyses, and confounding factors were investigated in only 10% (n = 13). Publication bias was assessed in 70% (n = 92) of the meta-analyses. Tau-square was reported in 32 meta-analyses with a median of 0.07 [0–0.30]. Conclusion The inclusion of OSs in a meta-analysis on interventions could provide useful information. However, considerations of several methodological and conceptual aspects of OSs, that are required to avoid misleading findings, were often absent or insufficiently reported in our sample.
- Published
- 2024
- Full Text
- View/download PDF
22. Synthesis methods used to combine observational studies and randomised trials in published meta-analyses
- Author
-
Cheurfa, Cherifa, Tsokani, Sofia, Kontouli, Katerina-Maria, Boutron, Isabelle, and Chaimani, Anna
- Published
- 2024
- Full Text
- View/download PDF
23. Generative Models for Synthetic Urban Mobility Data: A Systematic Literature Review.
- Author
-
KAPP, ALEXANDRA, HANSMEYER, JULIA, and MIHALJEVIĆ, HELENA
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *ARTIFICIAL intelligence , *DATA mining , *ALGORITHMIC bias , *DEEP learning , *TRAFFIC violations , *TAXICABS - Published
- 2024
- Full Text
- View/download PDF
24. Research on the Simulation Method of HTTP Traffic Based on GAN.
- Author
-
Yang, Chenglin, Xu, Dongliang, and Ma, Xiao
- Subjects
COMPUTER network traffic ,GENERATIVE adversarial networks ,TRANSFORMER models ,GAUSSIAN mixture models ,HTTP (Computer network protocol) ,EVOLUTIONARY algorithms - Abstract
Due to the increasing severity of network security issues, training corresponding detection models requires large datasets. In this work, we propose a novel method based on generative adversarial networks to synthesize network data traffic. We introduced a network traffic data normalization method based on Gaussian mixture models (GMM), and for the first time, incorporated a generator based on the Swin Transformer structure into the field of network traffic generation. To further enhance the robustness of the model, we mapped real data through an AE (autoencoder) module and optimized the training results in the form of evolutionary algorithms. We validated the training results on four different datasets and introduced four additional models for comparative experiments in the experimental evaluation section. Our proposed SEGAN outperformed other state-of-the-art network traffic emulation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Advanced integration of 2DCNN-GRU model for accurate identification of shockable life-threatening cardiac arrhythmias: a deep learning approach
- Author
-
Abduljabbar S. Ba Mahel, Shenghong Cao, Kaixuan Zhang, Samia Allaoua Chelloug, Rana Alnashwan, and Mohammed Saleh Ali Muthanna
- Subjects
dangerous arrhythmias ,recognition ,deep learning networks ,data synthesis ,scalogram ,Physiology ,QP1-981 - Abstract
Cardiovascular diseases remain one of the main threats to human health, significantly affecting the quality and life expectancy. Effective and prompt recognition of these diseases is crucial. This research aims to develop an effective novel hybrid method for automatically detecting dangerous arrhythmias based on cardiac patients’ short electrocardiogram (ECG) fragments. This study suggests using a continuous wavelet transform (CWT) to convert ECG signals into images (scalograms) and examining the task of categorizing short 2-s segments of ECG signals into four groups of dangerous arrhythmias that are shockable, including ventricular flutter (C1), ventricular fibrillation (C2), ventricular tachycardia torsade de pointes (C3), and high-rate ventricular tachycardia (C4). We propose developing a novel hybrid neural network with a deep learning architecture to classify dangerous arrhythmias. This work utilizes actual electrocardiogram (ECG) data obtained from the PhysioNet database, alongside artificially generated ECG data produced by the Synthetic Minority Over-sampling Technique (SMOTE) approach, to address the issue of imbalanced class distribution for obtaining an accuracy-trained model. Experimental results demonstrate that the proposed approach achieves high accuracy, sensitivity, specificity, precision, and an F1-score of 97.75%, 97.75%, 99.25%, 97.75%, and 97.75%, respectively, in classifying all the four shockable classes of arrhythmias and are superior to traditional methods. Our work possesses significant clinical value in real-life scenarios since it has the potential to significantly enhance the diagnosis and treatment of life-threatening arrhythmias in individuals with cardiac disease. Furthermore, our model also has demonstrated adaptability and generality for two other datasets.
- Published
- 2024
- Full Text
- View/download PDF
26. A systematic review of deep learning data augmentation in medical imaging: Recent advances and future research directions
- Author
-
Tauhidul Islam, Md. Sadman Hafiz, Jamin Rahman Jim, Md. Mohsin Kabir, and M.F. Mridha
- Subjects
Deep learning ,Data augmentation ,Image transformation ,Medical imaging augmentation ,Data synthesis ,Systematic review ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Data augmentation involves artificially expanding a dataset by applying various transformations to the existing data. Recent developments in deep learning have advanced data augmentation, enabling more complex transformations. Especially vital in the medical domain, deep learning-based data augmentation improves model robustness by generating realistic variations in medical images, enhancing diagnostic and predictive task performance. Therefore, to assist researchers and experts in their pursuits, there is a need for an extensive and informative study that covers the latest advancements in the growing domain of deep learning-based data augmentation in medical imaging. There is a gap in the literature regarding recent advancements in deep learning-based data augmentation. This study explores the diverse applications of data augmentation in medical imaging and analyzes recent research in these areas to address this gap. The study also explores popular datasets and evaluation metrics to improve understanding. Subsequently, the study provides a short discussion of conventional data augmentation techniques along with a detailed discussion on applying deep learning algorithms in data augmentation. The study further analyzes the results and experimental details from recent state-of-the-art research to understand the advancements and progress of deep learning-based data augmentation in medical imaging. Finally, the study discusses various challenges and proposes future research directions to address these concerns. This systematic review offers a thorough overview of deep learning-based data augmentation in medical imaging, covering application domains, models, results analysis, challenges, and research directions. It provides a valuable resource for multidisciplinary studies and researchers making decisions based on recent analytics.
- Published
- 2024
- Full Text
- View/download PDF
27. Theseus Data Synthesis Approach: A Privacy-Preserving Online Data Sharing Service
- Author
-
Yi-Jun Tang and Po-Wen Chi
- Subjects
Data anonymization ,data synthesis ,privacy-preserving data sharing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
With the vigorously developed services of cloud computing, it is relatively easier and more convenient for organizations or enterprises to open data on clouds. However, as personal information in electronic data becomes more massive and detailed, how to balance data opening and personal privacy has become a critical issue. In this paper, we propose the Theseus Data Synthesis Approach (TDSA), which generates synthetic data by replacing partial records until no record from the original dataset remains. Unlike other data anonymization works such as k-anonymity and differential privacy, which encountered limitations and challenges when applying to real-world scenarios. In our work, Since there are no real data, personal privacy is definitely preserved. We also analyze the quality and utility of the synthetic dataset and make comparisons with related works. We conclude that with our scheme, opening useful data on clouds and preserving personal privacy can be simultaneously achieved.
- Published
- 2024
- Full Text
- View/download PDF
28. Feature Distribution-Based Medical Data Augmentation: Enhancing Mood Disorder Classification
- Author
-
Joo Hun Yoo, Ji Hyun An, and Tai-Myoung Chung
- Subjects
Data augmentation ,data synthesis ,deep neural networks ,mood disorder classification ,multimodal analysis ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Classification models using deep or machine learning algorithms require a sufficient and balanced training dataset to improve performance. Still, they suffer from data collection due to data privacy issues. In medical research, where most data variables are sensitive information, collecting enough training data for model performance improvement is more challenging. This study presents a new medical data augmentation algorithm consisting of four steps to solve the data shortage and class imbalance issues. The main idea of the proposed algorithm is to reflect the core characteristic of the original data’s class label. The algorithm receives an original dataset as an input value to extract the feature vector and trains the individual autoencoder model. Then it verifies the augmented feature vector through a distributional equality check, and each feature vector is concatenated into one feature vector. The deep learning model inference is applied on a concatenated vector for the second verification, to finalize the augmented training dataset. Our team performed mood disorder classification using patient data to prove the presented data augmentation algorithm. With the method, the classification performance improved by 0.059 in the severity classification of major depressive disorder, 0.041 in the severity classification of anxiety disorder, and 0.073 in the subtype classification of bipolar disorder. Through this study, we proved that our algorithm can be applied to minimize model bias and improve classification performance on the medical data that are unbalanced or insufficient in number by class.
- Published
- 2024
- Full Text
- View/download PDF
29. Crack modeling via minimum-weight surfaces in 3d Voronoi diagrams
- Author
-
Christian Jung and Claudia Redenbach
- Subjects
Fracture modeling ,Tessellations ,Data synthesis ,3d image processing ,Adaptive dilation ,Mathematics ,QA1-939 ,Industry ,HD2321-4730.9 - Abstract
Abstract As the number one building material, concrete is of fundamental importance in civil engineering. Understanding its failure mechanisms is essential for designing sustainable buildings and infrastructure. Micro-computed tomography (μCT) is a well-established tool for virtually assessing crack initiation and propagation in concrete. The reconstructed 3d images can be examined via techniques from the fields of classical image processing and machine learning. Ground truths are a prerequisite for an objective evaluation of crack segmentation methods. Furthermore, they are necessary for training machine learning models. However, manual annotation of large 3d concrete images is not feasible. To tackle the problem of data scarcity, the image pairs of cracked concrete and corresponding ground truth can be synthesized. In this work we propose a novel approach to stochastically model crack structures via Voronoi diagrams. The method is based on minimum-weight surfaces, an extension of shortest paths to 3d. Within a dedicated image processing pipeline, the surfaces are then discretized and embedded into real μCT images of concrete. The method is flexible and fast, such that a variety of different crack structures can be generated in a short amount of time.
- Published
- 2023
- Full Text
- View/download PDF
30. Unsupervised GAN epoch selection for biomedical data synthesis
- Author
-
Böhland Moritz, Bruch Roman, Löffler Katharina, and Reischl Markus
- Subjects
generative adversarial network ,data synthesis ,segmentation ,computer vision ,Medicine - Abstract
Supervised Neural Networks are used for segmentation in many biological and biomedical applications. To omit the time-consuming and tiring process of manual labeling, unsupervised Generative Adversarial Networks (GANs) can be used to synthesize labeled data. However, the training of GANs requires extensive computation and is often unstable. Due to the lack of established stopping criteria, GANs are usually trained multiple times for a heuristically fixed number of epochs. Early stopping and epoch selection can lead to better synthetic datasets resulting in higher downstream segmentation quality on biological or medical data. This article examines whether the Frechet Inception Distance (FID), the Kernel Inception Distance (KID), or the WeightWatcher tool can be used for early stopping or epoch selection of unsupervised GANs. The experiments show that the last trained GAN epoch is not necessarily the best one to synthesize downstream segmentation data. On complex datasets, FID and KID correlate with the downstream segmentation quality, and both can be used for epoch selection.
- Published
- 2023
- Full Text
- View/download PDF
31. Deep learning based classification of sheep behaviour from accelerometer data with imbalance
- Author
-
Kirk E. Turner, Andrew Thompson, Ian Harris, Mark Ferguson, and Ferdous Sohel
- Subjects
Sheep behaviour classification ,Data synthesis ,Class imbalance ,Grazing sheep ,Agriculture (General) ,S1-972 ,Information technology ,T58.5-58.64 - Abstract
Classification of sheep behaviour from a sequence of tri-axial accelerometer data has the potential to enhance sheep management. Sheep behaviour is inherently imbalanced (e.g., more ruminating than walking) resulting in underperforming classification for the minority activities which hold importance. Existing works have not addressed class imbalance and use traditional machine learning techniques, e.g., Random Forest (RF). We investigated Deep Learning (DL) models, namely, Long Short Term Memory (LSTM) and Bidirectional LSTM (BLSTM), appropriate for sequential data, from imbalanced data. Two data sets were collected in normal grazing conditions using jaw-mounted and ear-mounted sensors. Novel to this study, alongside typical single classes, e.g., walking, depending on the behaviours, data samples were labelled with compound classes, e.g., walking_grazing. The number of steps a sheep performed in the observed 10 s time window was also recorded and incorporated in the models. We designed several multi-class classification studies with imbalance being addressed using synthetic data. DL models achieved superior performance to traditional ML models, especially with augmented data (e.g., 4-Class + Steps: LSTM 88.0%, RF 82.5%). DL methods showed superior generalisability on unseen sheep (i.e., F1-score: BLSTM 0.84, LSTM 0.83, RF 0.65). LSTM, BLSTM and RF achieved sub-millisecond average inference time, making them suitable for real-time applications. The results demonstrate the effectiveness of DL models for sheep behaviour classification in grazing conditions. The results also demonstrate the DL techniques can generalise across different sheep. The study presents a strong foundation of the development of such models for real-time animal monitoring.
- Published
- 2023
- Full Text
- View/download PDF
32. The Extended Pillar Integration Process (ePIP): A Data Integration Method Allowing the Systematic Synthesis of Findings From Three Different Sources.
- Author
-
Gauly, Julia, Ulahannan, Arun, and Grove, Amy L.
- Abstract
Mixed methods research requires data integration from multiple sources. Existing techniques are restricted to integrating a maximum of two data sources, do not provide step-by-step guidance or can be cumbersome where many data need to be integrated. We have solved these limitations through the development of the extended Pillar Integration Process (ePIP), a method which contributes to the field of mixed methods by being the first data integration method providing explicit steps on how to integrate data from three data sources. The ePIP provides greater transparency, validity and consistency compared to existing methods. We provide two worked examples from health sciences and automotive human factors, highlighting its value as a mixed methods integration tool. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Ad-RuLer: A Novel Rule-Driven Data Synthesis Technique for Imbalanced Classification.
- Author
-
Zhang, Xiao, Paz, Iván, Nebot, Àngela, Mugica, Francisco, and Romero, Enrique
- Subjects
MACHINE learning ,RANDOM forest algorithms ,MACHINE performance ,LOGISTIC regression analysis ,CLASSIFICATION - Abstract
When classifiers face imbalanced class distributions, they often misclassify minority class samples, consequently diminishing the predictive performance of machine learning models. Existing oversampling techniques predominantly rely on the selection of neighboring data via interpolation, with less emphasis on uncovering the intrinsic patterns and relationships within the data. In this research, we present the usefulness of an algorithm named RuLer to deal with the problem of classification with imbalanced data. RuLer is a learning algorithm initially designed to recognize new sound patterns within the context of the performative artistic practice known as live coding. This paper demonstrates that this algorithm, once adapted (Ad-RuLer), has great potential to address the problem of oversampling imbalanced data. An extensive comparison with other mainstream oversampling algorithms (SMOTE, ADASYN, Tomek-links, Borderline-SMOTE, and KmeansSMOTE), using different classifiers (logistic regression, random forest, and XGBoost) is performed on several real-world datasets with different degrees of data imbalance. The experiment results indicate that Ad-RuLer serves as an effective oversampling technique with extensive applicability. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. Crack modeling via minimum-weight surfaces in 3d Voronoi diagrams.
- Author
-
Jung, Christian and Redenbach, Claudia
- Subjects
- *
VORONOI polygons , *MACHINE learning , *SUSTAINABLE architecture , *THREE-dimensional imaging , *CIVIL engineering - Abstract
As the number one building material, concrete is of fundamental importance in civil engineering. Understanding its failure mechanisms is essential for designing sustainable buildings and infrastructure. Micro-computed tomography (μCT) is a well-established tool for virtually assessing crack initiation and propagation in concrete. The reconstructed 3d images can be examined via techniques from the fields of classical image processing and machine learning. Ground truths are a prerequisite for an objective evaluation of crack segmentation methods. Furthermore, they are necessary for training machine learning models. However, manual annotation of large 3d concrete images is not feasible. To tackle the problem of data scarcity, the image pairs of cracked concrete and corresponding ground truth can be synthesized. In this work we propose a novel approach to stochastically model crack structures via Voronoi diagrams. The method is based on minimum-weight surfaces, an extension of shortest paths to 3d. Within a dedicated image processing pipeline, the surfaces are then discretized and embedded into real μCT images of concrete. The method is flexible and fast, such that a variety of different crack structures can be generated in a short amount of time. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. A Generative Adversarial Network to Synthesize 3D Magnetohydrodynamic Distortions for Electrocardiogram Analyses Applied to Cardiac Magnetic Resonance Imaging.
- Author
-
Mehri, Maroua, Calmon, Guillaume, Odille, Freddy, Oster, Julien, and Lalande, Alain
- Subjects
- *
CARDIAC magnetic resonance imaging , *GENERATIVE adversarial networks , *DATA augmentation , *MAGNETIC resonance imaging , *PROBABILISTIC generative models , *DEEP learning - Abstract
Recently, deep learning (DL) models have been increasingly adopted for automatic analyses of medical data, including electrocardiograms (ECGs). Large, available ECG datasets, generally of high quality, often lack specific distortions, which could be helpful for enhancing DL-based algorithms. Synthetic ECG datasets could overcome this limitation. A generative adversarial network (GAN) was used to synthesize realistic 3D magnetohydrodynamic (MHD) distortion templates, as observed during magnetic resonance imaging (MRI), and then added to available ECG recordings to produce an augmented dataset. Similarity metrics, as well as the accuracy of a DL-based R-peak detector trained with and without data augmentation, were used to evaluate the effectiveness of the synthesized data. Three-dimensional MHD distortions produced by the proposed GAN were similar to the measured ones used as input. The precision of a DL-based R-peak detector, tested on actual unseen data, was significantly enhanced by data augmentation; its recall was higher when trained with augmented data. Using synthesized MHD-distorted ECGs significantly improves the accuracy of a DL-based R-peak detector, with a good generalization capacity. This provides a simple and effective alternative to collecting new patient data. DL-based algorithms for ECG analyses can suffer from bias or gaps in training datasets. Using a GAN to synthesize new data, as well as metrics to evaluate its performance, can overcome the scarcity issue of data availability. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Adapting Neural Radiance Fields (NeRF) to the 3D Scene Reconstruction Problem Under Dynamic Illumination Conditions.
- Author
-
Savin, V. and Kolodiazhna, O.
- Subjects
- *
RADIANCE , *LIGHTING , *DATA augmentation - Abstract
The problem of new image synthesis with the use Neural Radiance Fields (NeRF) for an environment with dynamic illumination is considered. When training NeRF models, a photometric loss function is used, i.e., a pixel-by-pixel difference between intensity values of scene images and the images generated using NeRF. For reflective surfaces, image intensity depends on the viewing angle, and this effect is accounted for by using the direction of a ray as the NeRF model input parameter. For scenes with dynamic illumination, image intensity depends not only on the position and viewing direction, but also on time. It is shown that illumination change affects the learning of NeRF with a standard photometric loss function and decreases the quality of the obtained images and depth maps. To overcome this problem, we propose to introduce time as an additional NeRF input argument. Experiments performed on the ScanNet dataset demonstrate that NeRF with a modified input outperform the original model version and generate more consistent and coherent 3D structures. The results of this study can be used to improve the quality of training data augmentation for training distance forecasting models (e.g., depth-from-stereo models allowing for depth/distance forecasts based on stereo data) for scenes with non-static illumination. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Efficient Wheat Head Segmentation with Minimal Annotation: A Generative Approach
- Author
-
Jaden Myers, Keyhan Najafian, Farhad Maleki, and Katie Ovens
- Subjects
deep learning ,segmentation ,generative adversarial networks ,data synthesis ,Photography ,TR1-1050 ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Deep learning models have been used for a variety of image processing tasks. However, most of these models are developed through supervised learning approaches, which rely heavily on the availability of large-scale annotated datasets. Developing such datasets is tedious and expensive. In the absence of an annotated dataset, synthetic data can be used for model development; however, due to the substantial differences between simulated and real data, a phenomenon referred to as domain gap, the resulting models often underperform when applied to real data. In this research, we aim to address this challenge by first computationally simulating a large-scale annotated dataset and then using a generative adversarial network (GAN) to fill the gap between simulated and real images. This approach results in a synthetic dataset that can be effectively utilized to train a deep-learning model. Using this approach, we developed a realistic annotated synthetic dataset for wheat head segmentation. This dataset was then used to develop a deep-learning model for semantic segmentation. The resulting model achieved a Dice score of 83.4% on an internal dataset and Dice scores of 79.6% and 83.6% on two external datasets from the Global Wheat Head Detection datasets. While we proposed this approach in the context of wheat head segmentation, it can be generalized to other crop types or, more broadly, to images with dense, repeated patterns such as those found in cellular imagery.
- Published
- 2024
- Full Text
- View/download PDF
38. Enhanced Pet Behavior Prediction via S2GAN-Based Heterogeneous Data Synthesis
- Author
-
Jinah Kim and Nammee Moon
- Subjects
behavior prediction ,behavior monitoring ,heterogeneous data ,data synthesis ,generative adversarial network ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Heterogeneous data have been used to enhance behavior prediction performance; however, it involves issues such as missing data, which need to be addressed. This paper proposes enhanced pet behavior prediction via Sensor to Skeleton Generative Adversarial Networks (S2GAN)-based heterogeneous data synthesis. The S2GAN model synthesizes the key features of video skeletons based on collected nine-axis sensor data and replaces missing data, thereby enhancing the accuracy of behavior prediction. In this study, data collected from 10 pets in a real-life-like environment were used to conduct recognition experiments on 9 commonly occurring types of indoor behavior. Experimental results confirmed that the proposed S2GAN-based synthesis method effectively resolves possible missing data issues in real environments and significantly improves the performance of the pet behavior prediction model. Additionally, by utilizing data collected under conditions similar to the real environment, the method enables more accurate and reliable behavior prediction. This research demonstrates the importance and utility of synthesizing heterogeneous data in behavior prediction, laying the groundwork for applications in various fields such as abnormal behavior detection and monitoring.
- Published
- 2024
- Full Text
- View/download PDF
39. CTAB-GAN+: enhancing tabular data synthesis
- Author
-
Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, and Lydia Y. Chen
- Subjects
GAN ,data synthesis ,tabular data ,differential privacy ,imbalanced distribution ,Information technology ,T58.5-58.64 - Abstract
The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.
- Published
- 2024
- Full Text
- View/download PDF
40. Risky business: human-related data is lacking from Lyme disease risk models
- Author
-
Erica Fellin, Mathieu Varin, and Virginie Millien
- Subjects
blacklegged ticks ,data synthesis ,human-related ,Lyme disease ,risk assessment ,risk map ,Public aspects of medicine ,RA1-1270 - Abstract
Used as a communicative tool for risk management, risk maps provide a service to the public, conveying information that can raise risk awareness and encourage mitigation. Several studies have utilized risk maps to determine risks associated with the distribution of Borrelia burgdorferi, the causal agent of Lyme disease in North America and Europe, as this zoonotic disease can lead to severe symptoms. This literature review focused on the use of risk maps to model distributions of B. burgdorferi and its vector, the blacklegged tick (Ixodes scapularis), in North America to compare variables used to predict these spatial models. Data were compiled from the existing literature to determine which ecological, environmental, and anthropic (i.e., human focused) variables past research has considered influential to the risk level for Lyme disease. The frequency of these variables was examined and analyzed via a non-metric multidimensional scaling analysis to compare different map elements that may categorize the risk models performed. Environmental variables were found to be the most frequently used in risk spatial models, particularly temperature. It was found that there was a significantly dissimilar distribution of variables used within map elements across studies: Map Type, Map Distributions, and Map Scale. Within these map elements, few anthropic variables were considered, particularly in studies that modeled future risk, despite the objective of these models directly or indirectly focusing on public health intervention. Without including human-related factors considering these variables within risk map models, it is difficult to determine how reliable these risk maps truly are. Future researchers may be persuaded to improve disease risk models by taking this into consideration.
- Published
- 2023
- Full Text
- View/download PDF
41. SeedArc, a global archive of primary seed germination data.
- Author
-
Fernández‐Pascual, Eduardo, Carta, Angelino, Rosbakh, Sergey, Guja, Lydia, Phartyal, Shyam S., Silveira, Fernando A. O., Chen, Si‐Chong, Larson, Julie E., and Jiménez‐Alfaro, Borja
- Subjects
- *
GERMINATION , *BOTANY , *BIOTIC communities , *SEED size , *PLANT reproduction , *BIOMES , *PLANT ecology - Abstract
Keywords: data synthesis; database; germination; open science; plant reproduction; repository; seed; trait EN data synthesis database germination open science plant reproduction repository seed trait 466 470 5 09/25/23 20231015 NES 231015 Data availability The data and code used to produce this article are available at https://github.com/efernandezpascual/seedarcms. The need for a global archive of primary seed germination data The seed ecology community has recently recognized the need to synthesize knowledge, setting the research agenda for functional seed ecology (Saatkamp I et al i ., [34]). I SeedArc i compiles primary seed germination data to synthesize the seed germination spectrum at a global scale. The theory underlying the seed germination spectrum has been laid out by decades of work on seed ecology (Baskin & Baskin, [1]), but empirical studies testing major ecological hypotheses at both global and local scales remain elusive without a standardized seed germination database. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
42. Distribution and trends of mercury in aquatic and terrestrial biota of New York, USA: a synthesis of 50 years of research and monitoring.
- Author
-
Adams, Evan M., Gulka, Julia E., Yang, Yang, Burton, Mark E. H., Burns, Douglas A., Buxton, Valerie, Cleckner, Lisa, DeSorbo, Christopher R., Driscoll, Charles T., Evers, David C., Fisher, Nicholas, Lane, Oksana, Mao, Huiting, Riva-Murray, Karen, Millard, Geoffrey, Razavi, N. Roxanna, Richter, Wayne, Sauer, Amy K., and Schoch, Nina
- Subjects
AQUATIC organisms ,MERCURY ,AQUATIC habitats ,LAND cover ,MERCURY vapor ,RISK exposure ,METHYLMERCURY - Abstract
Mercury (Hg) inputs have particularly impacted the northeastern United States due to its proximity to anthropogenic emissions sources and abundant habitats that efficiently convert inorganic Hg into methylmercury. Intensive research and monitoring efforts over the past 50 years in New York State, USA, have informed the assessment of the extent and impacts of Hg exposure on fishes and wildlife. By synthesizing Hg data statewide, this study quantified temporal trends of Hg exposure, spatiotemporal patterns of risk, the role that habitat and Hg deposition play in producing spatial patterns of Hg exposure in fish and other wildlife, and the effectiveness of current monitoring approaches in describing Hg trends. Most temporal trends were stable, but we found significant declines in Hg exposure over time in some long-sampled fish. The Adirondack Mountains and Long Island showed the greatest number of aquatic and terrestrial species with elevated Hg concentrations, reflecting an unequal distribution of exposure risk to fauna across the state. Persistent hotspots were detected for aquatic species in central New York and the Adirondack Mountains. Elevated Hg concentrations were associated with open water, forests, and rural, developed habitats for aquatic species, and open water and forested habitats for terrestrial species. Areas of consistently elevated Hg were found in areas driven by atmospheric and local Hg inputs, and habitat played a significant role in translating those inputs into biotic exposure. Continued long-term monitoring will be important in evaluating how these patterns continue to change in the face of changing land cover, climate, and Hg emissions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. Of causes and symptoms: using monitoring data and expert knowledge to diagnose the causes of stream degradation.
- Author
-
Rettig, Katharina, Semmler-Elpers, Renate, Brettschneider, Denise, Hering, Daniel, and Feld, Christian K.
- Subjects
WATER management ,BAYESIAN analysis ,ECOLOGICAL assessment ,WATER use ,LAND use ,FECAL contamination - Abstract
Ecological status assessment under the European Water Framework Directive (WFD) often integrates the impact of multiple stressors into a single index value. This hampers the identification of individual stressors being responsible for status deterioration. As a consequence, management measures are often disentangled from assessment results. To close this gap and to support river basin managers in the diagnosis of stressors, we linked numerous macroinvertebrate assessment metrics and one diatom index with potential causes of ecological deterioration through Bayesian belief networks (BBNs). The BBNs were informed by WFD monitoring data as well as regular consultation with experts and allow to estimate the probabilities of individual degradation causes based upon a selection of biological metrics. Macroinvertebrate metrics were shown to be stronger linked to hydromorphological conditions and land use than to water quality-related parameters (e.g., thermal and nutrient pollution). The modeled probabilities also allow to order the potential causes of degradation hierarchically. The comparison of assessment metrics showed that compositional and trait-based community metrics performed equally well in the diagnosis. The testing of the BBNs by experts resulted in an agreement between model output and expert opinion of 17–92% for individual stressors. Overall, the expert-based validation confirmed a good diagnostic potential of the BBNs; on average 80% of the diagnosed causes were in agreement with expert judgement. We conclude that diagnostic BBNs can assist the identification of causes of stream and river degradation and thereby inform the derivation of appropriate management decisions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples.
- Author
-
Du, Yukun, Cai, Yitao, Jin, Xiao, Wang, Hongxia, Li, Yao, and Lu, Min
- Subjects
- *
SAMPLE size (Statistics) , *INTERPOLATION - Abstract
Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Climate Evolution Through the Onset and Intensification of Northern Hemisphere Glaciation.
- Author
-
McClymont, E. L., Ho, S. L., Ford, H. L., Bailey, I., Berke, M. A., Bolton, C. T., De Schepper, S., Grant, G. R., Groeneveld, J., Inglis, G. N., Karas, C., Patterson, M. O., Swann, G. E. A., Thirumalai, K., White, S. M., Alonso‐Garcia, M., Anand, P., Hoogakker, B. A. A., Littler, K., and Petrick, B. F.
- Subjects
- *
PLIOCENE-Pleistocene boundary , *GLACIATION , *ICE sheets , *ATMOSPHERIC carbon dioxide , *PLIOCENE Epoch , *OCEAN circulation - Abstract
The Pliocene Epoch (∼5.3–2.6 million years ago, Ma) was characterized by a warmer than present climate with smaller Northern Hemisphere ice sheets, and offers an example of a climate system in long‐term equilibrium with current or predicted near‐future atmospheric CO2 concentrations (pCO2). A long‐term trend of ice‐sheet expansion led to more pronounced glacial (cold) stages by the end of the Pliocene (∼2.6 Ma), known as the "intensification of Northern Hemisphere Glaciation" (iNHG). We assessed the spatial and temporal variability of ocean temperatures and ice‐volume indicators through the late Pliocene and early Pleistocene (from 3.3 to 2.4 Ma) to determine the character of this climate transition. We identified asynchronous shifts in long‐term means and the pacing and amplitude of shorter‐term climate variability, between regions and between climate proxies. Early changes in Antarctic glaciation and Southern Hemisphere ocean properties occurred even during the mid‐Piacenzian warm period (∼3.264–3.025 Ma) which has been used as an analog for future warming. Increased climate variability subsequently developed alongside signatures of larger Northern Hemisphere ice sheets (iNHG). Yet, some regions of the ocean felt no impact of iNHG, particularly in lower latitudes. Our analysis has demonstrated the complex, non‐uniform and globally asynchronous nature of climate changes associated with the iNHG. Shifting ocean gateways and ocean circulation changes may have pre‐conditioned the later evolution of ice sheets with falling atmospheric pCO2. Further development of high‐resolution, multi‐proxy reconstructions of climate is required so that the full potential of the rich and detailed geological records can be realized. Plain Language Summary: Warm climates of the geological past provide windows into future environmental responses to elevated atmospheric CO2 concentrations, and past climate transitions identify important or sensitive regions and processes. We assessed the patterns of average ocean temperatures and indicators of ice sheet size over hundreds of thousands of years, and compared to shorter‐term variability (tens of thousands of years) during a recent transition from late Pliocene warmth (when CO2 was similar to present) to the onset of the large and repeated advances of northern hemisphere ice sheets referred to as the "ice ages." We show that different regions of the climate system changed at different times, with some changing before the ice sheets expanded. The development of larger ice sheets in the Northern Hemisphere then impacted ocean temperatures and circulation, but there were many regions where no impacts were felt. Our analysis highlights regional differences in the timing and amplitudes of change within a globally‐significant climate transition as well as in response to the current atmospheric CO2 concentrations in our climate system. Key Points: The "stable" warm late Pliocene ∼3.3–3.1 million years ago was a time of climate transition, especially in the southern hemisphereOcean temperatures and ice sheets evolved asynchronously 3.3–2.4 Ma during the onset and intensification of Northern Hemisphere GlaciationClimate variability evolved in complex, non‐uniform ways, most strongly expressed in northern mid‐latitude sea‐surface temperature records [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. In silico simulation of hepatic arteries: An open‐source algorithm for efficient synthetic data generation.
- Author
-
Whitehead, Joseph F., Laeseke, Paul F., Periyasamy, Sarvesh, Speidel, Michael A., and Wagner, Martin G.
- Subjects
- *
MACHINE learning , *IMAGE reconstruction algorithms , *COST functions , *HEPATIC artery , *DEEP learning , *ALGORITHMS - Abstract
Background: In silico testing of novel image reconstruction and quantitative algorithms designed for interventional imaging requires realistic high‐resolution modeling of arterial trees with contrast dynamics. Furthermore, data synthesis for training of deep learning algorithms requires that an arterial tree generation algorithm be computationally efficient and sufficiently random. Purpose: The purpose of this paper is to provide a method for anatomically and physiologically motivated, computationally efficient, random hepatic arterial tree generation. Methods: The vessel generation algorithm uses a constrained constructive optimization approach with a volume minimization‐based cost function. The optimization is constrained by the Couinaud liver classification system to assure a main feeding artery to each Couinaud segment. An intersection check is included to guarantee non‐intersecting vasculature and cubic polynomial fits are used to optimize bifurcation angles and to generate smoothly curved segments. Furthermore, an approach to simulate contrast dynamics and respiratory and cardiac motion is also presented. Results: : The proposed algorithm can generate a synthetic hepatic arterial tree with 40 000 branches in 11 s. The high‐resolution arterial trees have realistic morphological features such as branching angles (MAD with Murray's law =1.2±1.2o$ = \;1.2 \pm {1.2^o}$), radii (median Murray deviation =0.08$ = \;0.08$), and smoothly curved, non‐intersecting vessels. Furthermore, the algorithm assures a main feeding artery to each Couinaud segment and is random (variability = 0.98 ± 0.01). Conclusions: This method facilitates the generation of large datasets of high‐resolution, unique hepatic angiograms for the training of deep learning algorithms and initial testing of novel 3D reconstruction and quantitative algorithms designed for interventional imaging. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. Assessing the exposure of UK habitats to 20th‐ and 21st‐century climate change, and its representation in ecological monitoring schemes.
- Author
-
Wilson, Oliver J. and Pescott, Oliver L.
- Subjects
- *
ENVIRONMENTAL monitoring , *WEATHER , *MEDITERRANEAN climate , *HABITATS , *LAND cover , *CLIMATE change - Abstract
Climate change is a significant driver of contemporary biodiversity change. Ecological monitoring schemes can be crucial in highlighting its consequences, but connecting and interpreting observed climatic and ecological changes demands an understanding of monitored locations' exposure to climate change. Generalising from trends in monitored sites to habitats also requires an assessment of how closely sampled locations' climate change trajectories mirror those of wider ecosystems. Such assessments are rare but vital for drawing robust ecological conclusions.Focusing on the UK, we generated a metric of climate change exposure by quantifying the change in observed historical (1901–2019) and predicted future (2021–2080, pessimistic emissions scenario) conditions. We then assessed habitat‐specific climate change exposure by overlaying the resulting data with maps of contemporary (2019) land cover. Finally, we compared patterns of climate change exposure in locations sampled by ecological monitoring schemes to random samples from wider habitats.The UK's climate changed significantly between the early 20th century and the last decade, and is predicted to undergo even greater changes (including the development of Iberian/Mediterranean climate types in places) into the 21st century. Climate change exposure is unevenly distributed: regionally, it falls more in southern, central and eastern England; locally, it is greater at higher‐elevation locations than nearby areas at lower elevations.Areas with contemporary arable and horticulture, urban, calcareous grassland and suburban land cover are predicted to experience the greatest overall climatic change, though other habitats experienced relatively greater change than these in the first half of the 20th century.The extent to which locations sampled by ecological monitoring schemes represent broader habitat‐level gradients of climate change exposure varies. Monitored sites' coverage of wider trends is heterogeneous across habitats, time periods and schemes.Policy implications. UK ecological monitoring schemes can effectively, though variably, capture the effects of climate change on habitats. To improve their performance, climate change could be explicitly included in the design of such programmes. Additionally, our findings on how effectively different datasets represent wider patterns of climate change are crucial for informing syntheses of ecological change connected to shifting atmospheric conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. Data for Digital Forensics: Why a Discussion on "How Realistic is Synthetic Data" is Dispensable.
- Author
-
Göbel, Thomas, Baier, Harald, and Breitinger, Frank
- Subjects
DIGITAL forensics ,DATA libraries ,FORENSIC sciences ,RESEARCH personnel - Abstract
Digital forensics depends on data sets for various purposes like concept evaluation, educational training, and tool validation. Researchers have gathered such data sets into repositories and created data simulation frameworks for producing large amounts of data. Synthetic data often face skepticism due to its perceived deviation from real-world data, raising doubts about its realism. This paper addresses this concern, arguing that there is no definitive answer. We focus on four common digital forensic use cases that rely on data. Through these, we elucidate the specifications and prerequisites of data sets within their respective contexts. Our discourse uncovers that both real-world and synthetic data are indispensable for advancing digital forensic science, software, tools, and the competence of practitioners. Additionally, we provide an overview of available data set repositories and data generation frameworks, contributing to the ongoing dialogue on digital forensic data sets' utility. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Emotion recognition using facial expressions in an immersive virtual reality application.
- Author
-
Chen, Xinrun and Chen, Hengxin
- Subjects
EMOTION recognition ,FACIAL expression ,VIRTUAL reality ,HEAD-mounted displays ,INFRARED cameras ,EMOTIONS ,LIGHT sources - Abstract
Facial expression recognition (FER) is an important method to study and distinguish human emotions. In the virtual reality (VR) context, people's emotions are instantly and naturally triggered and mobilized due to the high immersion and realism of VR. However, when people are wearing head mounted display (HMD) VR equipment, the eye regions will be covered. The FER accuracy will be reduced if the eye region information is discarded. Therefore, it is necessary to obtain the information of eye regions using other methods. The main difficulty in FER in an immersive VR context is that the conventional FER methods depend on public databases. The image facial information in the public databases is complete, so these methods are difficult to directly apply to the VR context. To solve this problem, this paper designs and implements a solution for FER in the VR context as follows. A real facial expression database collection scheme in the VR context is implemented by adding an infrared camera and infrared light source to the HMD. A virtual database construction method is presented for FER in the VR context, which can improve the generalization of models. A deep network named the multi-region facial expression recognition model is designed for FER in the VR context. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. On Evaluating IoT Data Trust via Machine Learning.
- Author
-
Tadj, Timothy, Arablouei, Reza, and Dedeoglu, Volkan
- Subjects
TRUST ,SUPERVISED learning ,MACHINE learning ,INTERNET of things ,PYTHON programming language ,TAGS (Metadata) ,SECURE Sockets Layer (Computer network protocol) ,RANDOM walks ,CLUSTER analysis (Statistics) - Abstract
Data trust in IoT is crucial for safeguarding privacy, security, reliable decision-making, user acceptance, and complying with regulations. Various approaches based on supervised or unsupervised machine learning (ML) have recently been proposed for evaluating IoT data trust. However, assessing their real-world efficacy is hard mainly due to the lack of related publicly available datasets that can be used for benchmarking. Since obtaining such datasets is challenging, we propose a data synthesis method, called random walk infilling (RWI), to augment IoT time-series datasets by synthesizing untrustworthy data from existing trustworthy data. Thus, RWI enables us to create labeled datasets that can be used to develop and validate ML models for IoT data trust evaluation. We also extract new features from IoT time-series sensor data that effectively capture its autocorrelation as well as its cross-correlation with the data of the neighboring (peer) sensors. These features can be used to learn ML models for recognizing the trustworthiness of IoT sensor data. Equipped with our synthesized ground-truth-labeled datasets and informative correlation-based features, we conduct extensive experiments to critically examine various approaches to evaluating IoT data trust via ML. The results reveal that commonly used ML-based approaches to IoT data trust evaluation, which rely on unsupervised cluster analysis to assign trust labels to unlabeled data, perform poorly. This poor performance is due to the underlying assumption that clustering provides reliable labels for data trust, which is found to be untenable. The results also indicate that ML models, when trained on datasets augmented via RWI and using the proposed features, generalize well to unseen data and surpass existing related approaches. Moreover, we observe that a semi-supervised ML approach that requires only about 10% of the data labeled offers competitive performance while being practically more appealing compared to the fully supervised approaches. The related Python code and data are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.