Descriptor: "data synthesis" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"data synthesis"' showing total 1,972 results

Start Over Descriptor "data synthesis"

1,972 results on '"data synthesis"'

51. Crack modeling via minimum-weight surfaces in 3d Voronoi diagrams

Author: Christian Jung and Claudia Redenbach
Subjects: Fracture modeling, Tessellations, Data synthesis, 3d image processing, Adaptive dilation, Mathematics, QA1-939, Industry, HD2321-4730.9
Abstract: Abstract As the number one building material, concrete is of fundamental importance in civil engineering. Understanding its failure mechanisms is essential for designing sustainable buildings and infrastructure. Micro-computed tomography (μCT) is a well-established tool for virtually assessing crack initiation and propagation in concrete. The reconstructed 3d images can be examined via techniques from the fields of classical image processing and machine learning. Ground truths are a prerequisite for an objective evaluation of crack segmentation methods. Furthermore, they are necessary for training machine learning models. However, manual annotation of large 3d concrete images is not feasible. To tackle the problem of data scarcity, the image pairs of cracked concrete and corresponding ground truth can be synthesized. In this work we propose a novel approach to stochastically model crack structures via Voronoi diagrams. The method is based on minimum-weight surfaces, an extension of shortest paths to 3d. Within a dedicated image processing pipeline, the surfaces are then discretized and embedded into real μCT images of concrete. The method is flexible and fast, such that a variety of different crack structures can be generated in a short amount of time.
Published: 2023
Full Text: View/download PDF

52. A systematic review of deep learning data augmentation in medical imaging: Recent advances and future research directions

Author: Tauhidul Islam, Md. Sadman Hafiz, Jamin Rahman Jim, Md. Mohsin Kabir, and M.F. Mridha
Subjects: Deep learning, Data augmentation, Image transformation, Medical imaging augmentation, Data synthesis, Systematic review, Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Data augmentation involves artificially expanding a dataset by applying various transformations to the existing data. Recent developments in deep learning have advanced data augmentation, enabling more complex transformations. Especially vital in the medical domain, deep learning-based data augmentation improves model robustness by generating realistic variations in medical images, enhancing diagnostic and predictive task performance. Therefore, to assist researchers and experts in their pursuits, there is a need for an extensive and informative study that covers the latest advancements in the growing domain of deep learning-based data augmentation in medical imaging. There is a gap in the literature regarding recent advancements in deep learning-based data augmentation. This study explores the diverse applications of data augmentation in medical imaging and analyzes recent research in these areas to address this gap. The study also explores popular datasets and evaluation metrics to improve understanding. Subsequently, the study provides a short discussion of conventional data augmentation techniques along with a detailed discussion on applying deep learning algorithms in data augmentation. The study further analyzes the results and experimental details from recent state-of-the-art research to understand the advancements and progress of deep learning-based data augmentation in medical imaging. Finally, the study discusses various challenges and proposes future research directions to address these concerns. This systematic review offers a thorough overview of deep learning-based data augmentation in medical imaging, covering application domains, models, results analysis, challenges, and research directions. It provides a valuable resource for multidisciplinary studies and researchers making decisions based on recent analytics.
Published: 2024
Full Text: View/download PDF

53. Research on the Simulation Method of HTTP Traffic Based on GAN.

Author: Yang, Chenglin, Xu, Dongliang, and Ma, Xiao
Subjects: COMPUTER network traffic, GENERATIVE adversarial networks, TRANSFORMER models, GAUSSIAN mixture models, HTTP (Computer network protocol), EVOLUTIONARY algorithms
Abstract: Due to the increasing severity of network security issues, training corresponding detection models requires large datasets. In this work, we propose a novel method based on generative adversarial networks to synthesize network data traffic. We introduced a network traffic data normalization method based on Gaussian mixture models (GMM), and for the first time, incorporated a generator based on the Swin Transformer structure into the field of network traffic generation. To further enhance the robustness of the model, we mapped real data through an AE (autoencoder) module and optimized the training results in the form of evolutionary algorithms. We validated the training results on four different datasets and introduced four additional models for comparative experiments in the experimental evaluation section. Our proposed SEGAN outperformed other state-of-the-art network traffic emulation methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

54. Synthesis methods used to combine observational studies and randomised trials in published meta-analyses.

Author: Cheurfa, Cherifa, Tsokani, Sofia, Kontouli, Katerina-Maria, Boutron, Isabelle, and Chaimani, Anna
Subjects: SCIENTIFIC observation, POISSON regression, DATA integrity, EXPERIMENTAL design, STANDARD deviations
Abstract: Background: This study examined the synthesis methods used in meta-analyses pooling data from observational studies (OSs) and randomised controlled trials (RCTs) from various medical disciplines. Methods: We searched Medline via PubMed to identify reports of systematic reviews of interventions, including and pooling data from RCTs and OSs published in 110 high-impact factor general and specialised journals between 2015 and 2019. Screening and data extraction were performed in duplicate. To describe the synthesis methods used in the meta-analyses, we considered the first meta-analysis presented in each article. Results: Overall, 132 reports were identified with a median number of included studies of 14 [9–26]. The median number of OSs was 6.5 [3–12] and that of RCTs was 3 [1–6]. The effect estimates recorded from OSs (i.e., adjusted or unadjusted) were not specified in 82% (n = 108) of the meta-analyses. An inverse-variance common-effect model was used in 2% (n = 3) of the meta-analyses, a random-effects model was used in 55% (n = 73), and both models were used in 40% (n = 53). A Poisson regression model was used in 1 meta-analysis, and 2 meta-analyses did not report the model they used. The mean total weight of OSs in the studied meta-analyses was 57.3% (standard deviation, ± 30.3%). Only 44 (33%) meta-analyses reported results stratified by study design. Of them, the results between OSs and RCTs had a consistent direction of effect in 70% (n = 31). Study design was explored as a potential source of heterogeneity in 79% of the meta-analyses, and confounding factors were investigated in only 10% (n = 13). Publication bias was assessed in 70% (n = 92) of the meta-analyses. Tau-square was reported in 32 meta-analyses with a median of 0.07 [0–0.30]. Conclusion: The inclusion of OSs in a meta-analysis on interventions could provide useful information. However, considerations of several methodological and conceptual aspects of OSs, that are required to avoid misleading findings, were often absent or insufficiently reported in our sample. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

55. The Extended Pillar Integration Process (ePIP): A Data Integration Method Allowing the Systematic Synthesis of Findings From Three Different Sources.

Author: Gauly, Julia, Ulahannan, Arun, and Grove, Amy L.
Abstract: Mixed methods research requires data integration from multiple sources. Existing techniques are restricted to integrating a maximum of two data sources, do not provide step-by-step guidance or can be cumbersome where many data need to be integrated. We have solved these limitations through the development of the extended Pillar Integration Process (ePIP), a method which contributes to the field of mixed methods by being the first data integration method providing explicit steps on how to integrate data from three data sources. The ePIP provides greater transparency, validity and consistency compared to existing methods. We provide two worked examples from health sciences and automotive human factors, highlighting its value as a mixed methods integration tool. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

56. Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition

Author: Strohmayer, Julian, Kampel, Martin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tsapatsoulis, Nicolas, editor, Lanitis, Andreas, editor, Pattichis, Marios, editor, Pattichis, Constantinos, editor, Kyrkou, Christos, editor, Kyriacou, Efthyvoulos, editor, Theodosiou, Zenonas, editor, and Panayides, Andreas, editor
Published: 2023
Full Text: View/download PDF

57. Towards Cross Domain CSI Action Recognition Through One-Shot Bimodal Domain Adaptation

Author: Zhou, Bao, Zhou, Rui, Luo, Yue, Cheng, Yu, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Longfei, Shangguan, editor, and Bodhi, Priyantha, editor
Published: 2023
Full Text: View/download PDF

58. Deep learning based classification of sheep behaviour from accelerometer data with imbalance

Author: Kirk E. Turner, Andrew Thompson, Ian Harris, Mark Ferguson, and Ferdous Sohel
Subjects: Sheep behaviour classification, Data synthesis, Class imbalance, Grazing sheep, Agriculture (General), S1-972, Information technology, T58.5-58.64
Abstract: Classification of sheep behaviour from a sequence of tri-axial accelerometer data has the potential to enhance sheep management. Sheep behaviour is inherently imbalanced (e.g., more ruminating than walking) resulting in underperforming classification for the minority activities which hold importance. Existing works have not addressed class imbalance and use traditional machine learning techniques, e.g., Random Forest (RF). We investigated Deep Learning (DL) models, namely, Long Short Term Memory (LSTM) and Bidirectional LSTM (BLSTM), appropriate for sequential data, from imbalanced data. Two data sets were collected in normal grazing conditions using jaw-mounted and ear-mounted sensors. Novel to this study, alongside typical single classes, e.g., walking, depending on the behaviours, data samples were labelled with compound classes, e.g., walking_grazing. The number of steps a sheep performed in the observed 10 s time window was also recorded and incorporated in the models. We designed several multi-class classification studies with imbalance being addressed using synthetic data. DL models achieved superior performance to traditional ML models, especially with augmented data (e.g., 4-Class + Steps: LSTM 88.0%, RF 82.5%). DL methods showed superior generalisability on unseen sheep (i.e., F1-score: BLSTM 0.84, LSTM 0.83, RF 0.65). LSTM, BLSTM and RF achieved sub-millisecond average inference time, making them suitable for real-time applications. The results demonstrate the effectiveness of DL models for sheep behaviour classification in grazing conditions. The results also demonstrate the DL techniques can generalise across different sheep. The study presents a strong foundation of the development of such models for real-time animal monitoring.
Published: 2023
Full Text: View/download PDF

59. Unsupervised GAN epoch selection for biomedical data synthesis

Author: Böhland Moritz, Bruch Roman, Löffler Katharina, and Reischl Markus
Subjects: generative adversarial network, data synthesis, segmentation, computer vision, Medicine
Abstract: Supervised Neural Networks are used for segmentation in many biological and biomedical applications. To omit the time-consuming and tiring process of manual labeling, unsupervised Generative Adversarial Networks (GANs) can be used to synthesize labeled data. However, the training of GANs requires extensive computation and is often unstable. Due to the lack of established stopping criteria, GANs are usually trained multiple times for a heuristically fixed number of epochs. Early stopping and epoch selection can lead to better synthetic datasets resulting in higher downstream segmentation quality on biological or medical data. This article examines whether the Frechet Inception Distance (FID), the Kernel Inception Distance (KID), or the WeightWatcher tool can be used for early stopping or epoch selection of unsupervised GANs. The experiments show that the last trained GAN epoch is not necessarily the best one to synthesize downstream segmentation data. On complex datasets, FID and KID correlate with the downstream segmentation quality, and both can be used for epoch selection.
Published: 2023
Full Text: View/download PDF

60. Drivers of biodiversity change in the Anthropocene

Author: Daskalova, Gergana Nikolaeva, Myers-Smith, Isla, Bjorkman, Anne, and Dornelas, Maria
Subjects: biodiversity, conservation, global change, ecology, data science, data synthesis, time-series, forest loss, global change drivers, rarity, species traits, biodiversity change, species richness, community composition
Abstract: Across the globe, the populations of species and the biodiversity of ecological communities are changing, including declines, gains and stable trends over time. Against a backdrop of accelerating global change, a critical research challenge is to disentangle the sources of the heterogeneous patterns of population and biodiversity change over time. In this thesis, I linked population and biodiversity change with species traits like rarity and commonness, and with global change drivers like forest loss. I synthesised global biodiversity databases with gridded driver datasets to quantify how species' populations and biodiversity are being impacted by human activities in the Anthropocene. The rise of open-access data in ecology has produced databases with millions of records which have launched large-scale syntheses of how Earth's biota is changing over time and space. However, our knowledge of biodiversity change is limited by the available data and their biases. In Chapter 1, I tested the representation of three worldwide biodiversity databases (Living Planet, BioTIME and PREDICTS) across geographic and temporal variation in global change over land and sea and across the tree of life. I found that variation in global change drivers is better captured over space than over time and in the marine realm versus on land. I provided recommendations on how to improve the use of existing data, better target future ecological monitoring and capture different combinations of global change. In Chapter 2, I tested whether vertebrate species from specific biomes, taxa or with certain species traits are more likely to increase or decrease in a time of accelerating global change. I analysed nearly 10 000 population abundance time series from over 2000 vertebrate species part of the Living Planet Database. I integrated abundance data with information on geographic range, habitat preference, taxonomic and phylogenetic relationships, and IUCN Red List Categories and threats. I found that 15% of populations declined, 18% increased, and 67% showed no net changes over time. Amphibians were the only taxa that experienced net declines in the analysed data, while birds, mammals and reptiles experienced net increases. Despite this variation among broad taxonomic groups, surprisingly I did not detect phylogenetic patterns in which species were more likely to decline versus increase. Population trends were poorly explained by species' rarity and global-scale threats. I found that incorporating the full spectrum of population change, including declines, gains and stable trends, will improve conservation efforts to protect global biodiversity. In Chapter 3, I explored land-use change to fill the gap in empirical evidence of how habitat transformations such as forest loss and gain are reshaping biodiversity over time. I quantified how change in forest cover has influenced temporal shifts in populations and ecological assemblages from over 6000 globally distributed time series across six taxonomic groups. I found that local-scale increases and decreases in abundance, species richness, and temporal species replacement (turnover) were intensified by as much as 48% after forest loss. Larger amounts of forest loss did not always correlate with higher population and biodiversity change across sites, highlighting the mediating effects of local context and historical baselines. Temporal lags in population- and assemblage-level shifts after forest loss extended up to 50 years and increased with species' generation time. My findings indicate that forest loss amplified population and biodiversity change, with effects on both short and long temporal scales. A mix of immediate and lagged biodiversity change following land-use change emphasises the need for temporally explicit biodiversity scenarios to accurately estimate progress towards conservation goals. Together, my thesis findings demonstrate the wide spectrum of population and biodiversity change happening across varying amounts of global change and different realms, taxa and species traits. These heterogeneous impacts of global change on population and biodiversity spanned temporal scales from immediate effects in a couple of years to lagged responses decades after disturbance. The links between global change drivers and shifts in species' abundance, species richness and compositional turnover depended on historical context and species' characteristics like generation time. I documented both immediate and temporally delayed effects of global change drivers on species' populations abundance and the biodiversity of ecological assemblages which highlights the importance of long-term ecological monitoring. The main implications of my thesis findings are that first, any inferences drawn from biodiversity syntheses reflect the types of species and places represented by the data and the global change that is experienced. To create accurate scenarios, we need biodiversity data that span not only different taxa and locations, but also the spectrum of global change variation around the world. Second, biodiversity predictions should incorporate both positive and negative impacts of global change drivers as well as lagged responses. Finally, ecosystems and the species within them are usually simultaneously exposed to a suite of global change drivers and a key future research step is to test the synergy and/or antagony in the effects and interactions among multiple types of environmental change on populations and biodiversity. Overall, my thesis research demonstrates that the drivers of biodiversity change in the Anthropocene have both immediate and temporally-delayed effects which depend on species' traits and the sites' historical context. My findings suggest that by incorporating the full spectrum of biodiversity change and the nuance around interacting global change drivers we can improve projections of future ecological shifts and enhance local and international conservation policies.
Published: 2021
Full Text: View/download PDF

61. CTAB-GAN+: enhancing tabular data synthesis

Author: Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, and Lydia Y. Chen
Subjects: GAN, data synthesis, tabular data, differential privacy, imbalanced distribution, Information technology, T58.5-58.64
Abstract: The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.
Published: 2024
Full Text: View/download PDF

62. Ad-RuLer: A Novel Rule-Driven Data Synthesis Technique for Imbalanced Classification.

Author: Zhang, Xiao, Paz, Iván, Nebot, Àngela, Mugica, Francisco, and Romero, Enrique
Subjects: MACHINE learning, RANDOM forest algorithms, MACHINE performance, LOGISTIC regression analysis, CLASSIFICATION
Abstract: When classifiers face imbalanced class distributions, they often misclassify minority class samples, consequently diminishing the predictive performance of machine learning models. Existing oversampling techniques predominantly rely on the selection of neighboring data via interpolation, with less emphasis on uncovering the intrinsic patterns and relationships within the data. In this research, we present the usefulness of an algorithm named RuLer to deal with the problem of classification with imbalanced data. RuLer is a learning algorithm initially designed to recognize new sound patterns within the context of the performative artistic practice known as live coding. This paper demonstrates that this algorithm, once adapted (Ad-RuLer), has great potential to address the problem of oversampling imbalanced data. An extensive comparison with other mainstream oversampling algorithms (SMOTE, ADASYN, Tomek-links, Borderline-SMOTE, and KmeansSMOTE), using different classifiers (logistic regression, random forest, and XGBoost) is performed on several real-world datasets with different degrees of data imbalance. The experiment results indicate that Ad-RuLer serves as an effective oversampling technique with extensive applicability. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

63. Crack modeling via minimum-weight surfaces in 3d Voronoi diagrams.

Author: Jung, Christian and Redenbach, Claudia
Subjects: VORONOI polygons, MACHINE learning, SUSTAINABLE architecture, THREE-dimensional imaging, CIVIL engineering
Abstract: As the number one building material, concrete is of fundamental importance in civil engineering. Understanding its failure mechanisms is essential for designing sustainable buildings and infrastructure. Micro-computed tomography (μCT) is a well-established tool for virtually assessing crack initiation and propagation in concrete. The reconstructed 3d images can be examined via techniques from the fields of classical image processing and machine learning. Ground truths are a prerequisite for an objective evaluation of crack segmentation methods. Furthermore, they are necessary for training machine learning models. However, manual annotation of large 3d concrete images is not feasible. To tackle the problem of data scarcity, the image pairs of cracked concrete and corresponding ground truth can be synthesized. In this work we propose a novel approach to stochastically model crack structures via Voronoi diagrams. The method is based on minimum-weight surfaces, an extension of shortest paths to 3d. Within a dedicated image processing pipeline, the surfaces are then discretized and embedded into real μCT images of concrete. The method is flexible and fast, such that a variety of different crack structures can be generated in a short amount of time. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

64. A Generative Adversarial Network to Synthesize 3D Magnetohydrodynamic Distortions for Electrocardiogram Analyses Applied to Cardiac Magnetic Resonance Imaging.

Author: Mehri, Maroua, Calmon, Guillaume, Odille, Freddy, Oster, Julien, and Lalande, Alain
Subjects: *CARDIAC magnetic resonance imaging, *GENERATIVE adversarial networks, *DATA augmentation, *MAGNETIC resonance imaging, *PROBABILISTIC generative models, *DEEP learning
Abstract: Recently, deep learning (DL) models have been increasingly adopted for automatic analyses of medical data, including electrocardiograms (ECGs). Large, available ECG datasets, generally of high quality, often lack specific distortions, which could be helpful for enhancing DL-based algorithms. Synthetic ECG datasets could overcome this limitation. A generative adversarial network (GAN) was used to synthesize realistic 3D magnetohydrodynamic (MHD) distortion templates, as observed during magnetic resonance imaging (MRI), and then added to available ECG recordings to produce an augmented dataset. Similarity metrics, as well as the accuracy of a DL-based R-peak detector trained with and without data augmentation, were used to evaluate the effectiveness of the synthesized data. Three-dimensional MHD distortions produced by the proposed GAN were similar to the measured ones used as input. The precision of a DL-based R-peak detector, tested on actual unseen data, was significantly enhanced by data augmentation; its recall was higher when trained with augmented data. Using synthesized MHD-distorted ECGs significantly improves the accuracy of a DL-based R-peak detector, with a good generalization capacity. This provides a simple and effective alternative to collecting new patient data. DL-based algorithms for ECG analyses can suffer from bias or gaps in training datasets. Using a GAN to synthesize new data, as well as metrics to evaluate its performance, can overcome the scarcity issue of data availability. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

65. Adapting Neural Radiance Fields (NeRF) to the 3D Scene Reconstruction Problem Under Dynamic Illumination Conditions.

Author: Savin, V. and Kolodiazhna, O.
Subjects: *RADIANCE, *LIGHTING, *DATA augmentation
Abstract: The problem of new image synthesis with the use Neural Radiance Fields (NeRF) for an environment with dynamic illumination is considered. When training NeRF models, a photometric loss function is used, i.e., a pixel-by-pixel difference between intensity values of scene images and the images generated using NeRF. For reflective surfaces, image intensity depends on the viewing angle, and this effect is accounted for by using the direction of a ray as the NeRF model input parameter. For scenes with dynamic illumination, image intensity depends not only on the position and viewing direction, but also on time. It is shown that illumination change affects the learning of NeRF with a standard photometric loss function and decreases the quality of the obtained images and depth maps. To overcome this problem, we propose to introduce time as an additional NeRF input argument. Experiments performed on the ScanNet dataset demonstrate that NeRF with a modified input outperform the original model version and generate more consistent and coherent 3D structures. The results of this study can be used to improve the quality of training data augmentation for training distance forecasting models (e.g., depth-from-stereo models allowing for depth/distance forecasts based on stereo data) for scenes with non-static illumination. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

66. SeedArc, a global archive of primary seed germination data.

Author: Fernández‐Pascual, Eduardo, Carta, Angelino, Rosbakh, Sergey, Guja, Lydia, Phartyal, Shyam S., Silveira, Fernando A. O., Chen, Si‐Chong, Larson, Julie E., and Jiménez‐Alfaro, Borja
Subjects: *GERMINATION, *BOTANY, *BIOTIC communities, *SEED size, *PLANT reproduction, *BIOMES, *PLANT ecology
Abstract: Keywords: data synthesis; database; germination; open science; plant reproduction; repository; seed; trait EN data synthesis database germination open science plant reproduction repository seed trait 466 470 5 09/25/23 20231015 NES 231015 Data availability The data and code used to produce this article are available at https://github.com/efernandezpascual/seedarcms. The need for a global archive of primary seed germination data The seed ecology community has recently recognized the need to synthesize knowledge, setting the research agenda for functional seed ecology (Saatkamp I et al i ., [34]). I SeedArc i compiles primary seed germination data to synthesize the seed germination spectrum at a global scale. The theory underlying the seed germination spectrum has been laid out by decades of work on seed ecology (Baskin & Baskin, [1]), but empirical studies testing major ecological hypotheses at both global and local scales remain elusive without a standardized seed germination database. [Extracted from the article]
Published: 2023
Full Text: View/download PDF

67. Distribution and trends of mercury in aquatic and terrestrial biota of New York, USA: a synthesis of 50 years of research and monitoring.

Author: Adams, Evan M., Gulka, Julia E., Yang, Yang, Burton, Mark E. H., Burns, Douglas A., Buxton, Valerie, Cleckner, Lisa, DeSorbo, Christopher R., Driscoll, Charles T., Evers, David C., Fisher, Nicholas, Lane, Oksana, Mao, Huiting, Riva-Murray, Karen, Millard, Geoffrey, Razavi, N. Roxanna, Richter, Wayne, Sauer, Amy K., and Schoch, Nina
Subjects: AQUATIC organisms, MERCURY (Element), AQUATIC habitats, LAND cover, MERCURY vapor, RISK exposure, METHYLMERCURY
Abstract: Mercury (Hg) inputs have particularly impacted the northeastern United States due to its proximity to anthropogenic emissions sources and abundant habitats that efficiently convert inorganic Hg into methylmercury. Intensive research and monitoring efforts over the past 50 years in New York State, USA, have informed the assessment of the extent and impacts of Hg exposure on fishes and wildlife. By synthesizing Hg data statewide, this study quantified temporal trends of Hg exposure, spatiotemporal patterns of risk, the role that habitat and Hg deposition play in producing spatial patterns of Hg exposure in fish and other wildlife, and the effectiveness of current monitoring approaches in describing Hg trends. Most temporal trends were stable, but we found significant declines in Hg exposure over time in some long-sampled fish. The Adirondack Mountains and Long Island showed the greatest number of aquatic and terrestrial species with elevated Hg concentrations, reflecting an unequal distribution of exposure risk to fauna across the state. Persistent hotspots were detected for aquatic species in central New York and the Adirondack Mountains. Elevated Hg concentrations were associated with open water, forests, and rural, developed habitats for aquatic species, and open water and forested habitats for terrestrial species. Areas of consistently elevated Hg were found in areas driven by atmospheric and local Hg inputs, and habitat played a significant role in translating those inputs into biotic exposure. Continued long-term monitoring will be important in evaluating how these patterns continue to change in the face of changing land cover, climate, and Hg emissions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

68. Of causes and symptoms: using monitoring data and expert knowledge to diagnose the causes of stream degradation.

Author: Rettig, Katharina, Semmler-Elpers, Renate, Brettschneider, Denise, Hering, Daniel, and Feld, Christian K.
Subjects: WATER management, BAYESIAN analysis, ECOLOGICAL assessment, WATER use, LAND use, FECAL contamination
Abstract: Ecological status assessment under the European Water Framework Directive (WFD) often integrates the impact of multiple stressors into a single index value. This hampers the identification of individual stressors being responsible for status deterioration. As a consequence, management measures are often disentangled from assessment results. To close this gap and to support river basin managers in the diagnosis of stressors, we linked numerous macroinvertebrate assessment metrics and one diatom index with potential causes of ecological deterioration through Bayesian belief networks (BBNs). The BBNs were informed by WFD monitoring data as well as regular consultation with experts and allow to estimate the probabilities of individual degradation causes based upon a selection of biological metrics. Macroinvertebrate metrics were shown to be stronger linked to hydromorphological conditions and land use than to water quality-related parameters (e.g., thermal and nutrient pollution). The modeled probabilities also allow to order the potential causes of degradation hierarchically. The comparison of assessment metrics showed that compositional and trait-based community metrics performed equally well in the diagnosis. The testing of the BBNs by experts resulted in an agreement between model output and expert opinion of 17–92% for individual stressors. Overall, the expert-based validation confirmed a good diagnostic potential of the BBNs; on average 80% of the diagnosed causes were in agreement with expert judgement. We conclude that diagnostic BBNs can assist the identification of causes of stream and river degradation and thereby inform the derivation of appropriate management decisions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

69. Trends in Published Comparative and International Education Research, 2014–2020, with a Focus on Global South and Non-academic Authors

Author: Wiseman, Alexander W.
Published: 2022
Full Text: View/download PDF

70. Genetic algorithms and their applications to synthetic data generation

Author: Chen, Yingrui, Elliot, Mark, and Smith, Duncan
Subjects: Machine Learning, Data Privacy, Genetic Algorithms, Data Synthesis
Abstract: Data synthesis is a statistical disclosure control technique that prevents the leakage of personal information from survey data. Rubin, who originally proposed this technique, treated the confidential data within a dataset as missing and then replaced those data using multiple imputation [103]. Most methods in data synthesis were then developed based on this principle. However, data synthesis is a multi-objective problem that aims to maximise information utility as well as minimising disclosure risks, and these methods have no explicit mechanism for balancing the objectives. This issue is the basis for the line of enquiry embodied in this thesis. The need to optimise competing objectives suggests the possible use of iterative machine learning techniques for data synthesis, but - to date - investigations of this possibility have been limited. In the thesis, a new synthesis method using Genetic Algorithms (GAs) is introduced. GAs are evolutionary computational methods that simulate natural evolution. They allow candidates (which in this thesis are datasets) to compete, reproduce and mate in a pre-determined environment until one or more of them perfectly fits the environment (which is defined by a set of objectives). GAs were firstly used on binary strings and now they have variants that deal with different problems and data forms. In this thesis, a GA data synthesiser whose candidates are matrix and real-coded data is designed, and most of its parameters and hyper-parameters tested. A new information utility function to measure the overall divergence from synthetic data to the original data is used. The results of running the synthesiser on a real dataset are presented, which show that the GA approach successfully produced plausible synthetic data using a single utility objective and they were proved to be able to seek for a trade-off between information utility and disclosure risks during the process of synthesising. The overall conclusion is that GAs represent a significant opportunity for the practice of data synthesis.
Published: 2020

71. Generative neural data synthesis for autonomous systems

Author: Jegorova, Marija, Hospedales, Timothy, Mistry, Michael, and Ramamoorthy, Subramanian
Subjects: GANs, data synthesis, data augmentation
Abstract: A significant number of Machine Learning methods for automation currently rely on data-hungry training techniques. The lack of accessible training data often represents an insurmountable obstacle, especially in the fields of robotics and automation, where acquiring new data can be far from trivial. Additional data acquisition is not only often expensive and time-consuming, but occasionally is not even an option. Furthermore, the real world applications sometimes have commercial sensitivity issues associated with the distribution of the raw data. This doctoral thesis explores bypassing the aforementioned difficulties by synthesising new realistic and diverse datasets using the Generative Adversarial Network (GAN). The success of this approach is demonstrated empirically through solving a variety of case-specific data-hungry problems, via application of novel GAN-based techniques and architectures. Specifically, it starts with exploring the use of GANs for the realistic simulation of the extremely high-dimensional underwater acoustic imagery for the purpose of training both teleoperators and autonomous target recognition systems. We have developed a method capable of generating realistic sonar data of any chosen dimension by image-translation GANs with Markov principle. Following this, we apply GAN-based models to robot behavioural repertoire generation, that enables a robot manipulator to successfully overcome unforeseen impedances, such as unknown sets of obstacles and random broken joints scenarios. Finally, we consider dynamical system identification for articulated robot arms. We show how using diversity-driven GAN models to generate exploratory trajectories can allow dynamic parameters to be identified more efficiently and accurately than with conventional optimisation approaches. Together, these results show that GANs have the potential to benefit a variety of robotics learning problems where training data is currently a bottleneck.
Published: 2020
Full Text: View/download PDF

72. Risky business: human-related data is lacking from Lyme disease risk models

Author: Erica Fellin, Mathieu Varin, and Virginie Millien
Subjects: blacklegged ticks, data synthesis, human-related, Lyme disease, risk assessment, risk map, Public aspects of medicine, RA1-1270
Abstract: Used as a communicative tool for risk management, risk maps provide a service to the public, conveying information that can raise risk awareness and encourage mitigation. Several studies have utilized risk maps to determine risks associated with the distribution of Borrelia burgdorferi, the causal agent of Lyme disease in North America and Europe, as this zoonotic disease can lead to severe symptoms. This literature review focused on the use of risk maps to model distributions of B. burgdorferi and its vector, the blacklegged tick (Ixodes scapularis), in North America to compare variables used to predict these spatial models. Data were compiled from the existing literature to determine which ecological, environmental, and anthropic (i.e., human focused) variables past research has considered influential to the risk level for Lyme disease. The frequency of these variables was examined and analyzed via a non-metric multidimensional scaling analysis to compare different map elements that may categorize the risk models performed. Environmental variables were found to be the most frequently used in risk spatial models, particularly temperature. It was found that there was a significantly dissimilar distribution of variables used within map elements across studies: Map Type, Map Distributions, and Map Scale. Within these map elements, few anthropic variables were considered, particularly in studies that modeled future risk, despite the objective of these models directly or indirectly focusing on public health intervention. Without including human-related factors considering these variables within risk map models, it is difficult to determine how reliable these risk maps truly are. Future researchers may be persuaded to improve disease risk models by taking this into consideration.
Published: 2023
Full Text: View/download PDF

73. ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples.

Author: Du, Yukun, Cai, Yitao, Jin, Xiao, Wang, Hongxia, Li, Yao, and Lu, Min
Subjects: *SAMPLE size (Statistics), *INTERPOLATION
Abstract: Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

74. Climate Evolution Through the Onset and Intensification of Northern Hemisphere Glaciation.

Author: McClymont, E. L., Ho, S. L., Ford, H. L., Bailey, I., Berke, M. A., Bolton, C. T., De Schepper, S., Grant, G. R., Groeneveld, J., Inglis, G. N., Karas, C., Patterson, M. O., Swann, G. E. A., Thirumalai, K., White, S. M., Alonso‐Garcia, M., Anand, P., Hoogakker, B. A. A., Littler, K., and Petrick, B. F.
Subjects: *PLIOCENE-Pleistocene boundary, *GLACIATION, *ICE sheets, *ATMOSPHERIC carbon dioxide, *PLIOCENE Epoch, *OCEAN circulation
Abstract: The Pliocene Epoch (∼5.3–2.6 million years ago, Ma) was characterized by a warmer than present climate with smaller Northern Hemisphere ice sheets, and offers an example of a climate system in long‐term equilibrium with current or predicted near‐future atmospheric CO2 concentrations (pCO2). A long‐term trend of ice‐sheet expansion led to more pronounced glacial (cold) stages by the end of the Pliocene (∼2.6 Ma), known as the "intensification of Northern Hemisphere Glaciation" (iNHG). We assessed the spatial and temporal variability of ocean temperatures and ice‐volume indicators through the late Pliocene and early Pleistocene (from 3.3 to 2.4 Ma) to determine the character of this climate transition. We identified asynchronous shifts in long‐term means and the pacing and amplitude of shorter‐term climate variability, between regions and between climate proxies. Early changes in Antarctic glaciation and Southern Hemisphere ocean properties occurred even during the mid‐Piacenzian warm period (∼3.264–3.025 Ma) which has been used as an analog for future warming. Increased climate variability subsequently developed alongside signatures of larger Northern Hemisphere ice sheets (iNHG). Yet, some regions of the ocean felt no impact of iNHG, particularly in lower latitudes. Our analysis has demonstrated the complex, non‐uniform and globally asynchronous nature of climate changes associated with the iNHG. Shifting ocean gateways and ocean circulation changes may have pre‐conditioned the later evolution of ice sheets with falling atmospheric pCO2. Further development of high‐resolution, multi‐proxy reconstructions of climate is required so that the full potential of the rich and detailed geological records can be realized. Plain Language Summary: Warm climates of the geological past provide windows into future environmental responses to elevated atmospheric CO2 concentrations, and past climate transitions identify important or sensitive regions and processes. We assessed the patterns of average ocean temperatures and indicators of ice sheet size over hundreds of thousands of years, and compared to shorter‐term variability (tens of thousands of years) during a recent transition from late Pliocene warmth (when CO2 was similar to present) to the onset of the large and repeated advances of northern hemisphere ice sheets referred to as the "ice ages." We show that different regions of the climate system changed at different times, with some changing before the ice sheets expanded. The development of larger ice sheets in the Northern Hemisphere then impacted ocean temperatures and circulation, but there were many regions where no impacts were felt. Our analysis highlights regional differences in the timing and amplitudes of change within a globally‐significant climate transition as well as in response to the current atmospheric CO2 concentrations in our climate system. Key Points: The "stable" warm late Pliocene ∼3.3–3.1 million years ago was a time of climate transition, especially in the southern hemisphereOcean temperatures and ice sheets evolved asynchronously 3.3–2.4 Ma during the onset and intensification of Northern Hemisphere GlaciationClimate variability evolved in complex, non‐uniform ways, most strongly expressed in northern mid‐latitude sea‐surface temperature records [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

75. Assessing the exposure of UK habitats to 20th‐ and 21st‐century climate change, and its representation in ecological monitoring schemes.

Author: Wilson, Oliver J. and Pescott, Oliver L.
Subjects: *ENVIRONMENTAL monitoring, *WEATHER, *MEDITERRANEAN climate, *HABITATS, *LAND cover, *CLIMATE change
Abstract: Climate change is a significant driver of contemporary biodiversity change. Ecological monitoring schemes can be crucial in highlighting its consequences, but connecting and interpreting observed climatic and ecological changes demands an understanding of monitored locations' exposure to climate change. Generalising from trends in monitored sites to habitats also requires an assessment of how closely sampled locations' climate change trajectories mirror those of wider ecosystems. Such assessments are rare but vital for drawing robust ecological conclusions.Focusing on the UK, we generated a metric of climate change exposure by quantifying the change in observed historical (1901–2019) and predicted future (2021–2080, pessimistic emissions scenario) conditions. We then assessed habitat‐specific climate change exposure by overlaying the resulting data with maps of contemporary (2019) land cover. Finally, we compared patterns of climate change exposure in locations sampled by ecological monitoring schemes to random samples from wider habitats.The UK's climate changed significantly between the early 20th century and the last decade, and is predicted to undergo even greater changes (including the development of Iberian/Mediterranean climate types in places) into the 21st century. Climate change exposure is unevenly distributed: regionally, it falls more in southern, central and eastern England; locally, it is greater at higher‐elevation locations than nearby areas at lower elevations.Areas with contemporary arable and horticulture, urban, calcareous grassland and suburban land cover are predicted to experience the greatest overall climatic change, though other habitats experienced relatively greater change than these in the first half of the 20th century.The extent to which locations sampled by ecological monitoring schemes represent broader habitat‐level gradients of climate change exposure varies. Monitored sites' coverage of wider trends is heterogeneous across habitats, time periods and schemes.Policy implications. UK ecological monitoring schemes can effectively, though variably, capture the effects of climate change on habitats. To improve their performance, climate change could be explicitly included in the design of such programmes. Additionally, our findings on how effectively different datasets represent wider patterns of climate change are crucial for informing syntheses of ecological change connected to shifting atmospheric conditions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

76. In silico simulation of hepatic arteries: An open‐source algorithm for efficient synthetic data generation.

Author: Whitehead, Joseph F., Laeseke, Paul F., Periyasamy, Sarvesh, Speidel, Michael A., and Wagner, Martin G.
Subjects: *MACHINE learning, *IMAGE reconstruction algorithms, *COST functions, *HEPATIC artery, *DEEP learning, *ALGORITHMS
Abstract: Background: In silico testing of novel image reconstruction and quantitative algorithms designed for interventional imaging requires realistic high‐resolution modeling of arterial trees with contrast dynamics. Furthermore, data synthesis for training of deep learning algorithms requires that an arterial tree generation algorithm be computationally efficient and sufficiently random. Purpose: The purpose of this paper is to provide a method for anatomically and physiologically motivated, computationally efficient, random hepatic arterial tree generation. Methods: The vessel generation algorithm uses a constrained constructive optimization approach with a volume minimization‐based cost function. The optimization is constrained by the Couinaud liver classification system to assure a main feeding artery to each Couinaud segment. An intersection check is included to guarantee non‐intersecting vasculature and cubic polynomial fits are used to optimize bifurcation angles and to generate smoothly curved segments. Furthermore, an approach to simulate contrast dynamics and respiratory and cardiac motion is also presented. Results: : The proposed algorithm can generate a synthetic hepatic arterial tree with 40 000 branches in 11 s. The high‐resolution arterial trees have realistic morphological features such as branching angles (MAD with Murray's law =1.2±1.2o$ = \;1.2 \pm {1.2^o}$), radii (median Murray deviation =0.08$ = \;0.08$), and smoothly curved, non‐intersecting vessels. Furthermore, the algorithm assures a main feeding artery to each Couinaud segment and is random (variability = 0.98 ± 0.01). Conclusions: This method facilitates the generation of large datasets of high‐resolution, unique hepatic angiograms for the training of deep learning algorithms and initial testing of novel 3D reconstruction and quantitative algorithms designed for interventional imaging. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

77. Emotion recognition using facial expressions in an immersive virtual reality application.

Author: Chen, Xinrun and Chen, Hengxin
Subjects: EMOTION recognition, FACIAL expression, VIRTUAL reality, HEAD-mounted displays, INFRARED cameras, EMOTIONS, LIGHT sources
Abstract: Facial expression recognition (FER) is an important method to study and distinguish human emotions. In the virtual reality (VR) context, people's emotions are instantly and naturally triggered and mobilized due to the high immersion and realism of VR. However, when people are wearing head mounted display (HMD) VR equipment, the eye regions will be covered. The FER accuracy will be reduced if the eye region information is discarded. Therefore, it is necessary to obtain the information of eye regions using other methods. The main difficulty in FER in an immersive VR context is that the conventional FER methods depend on public databases. The image facial information in the public databases is complete, so these methods are difficult to directly apply to the VR context. To solve this problem, this paper designs and implements a solution for FER in the VR context as follows. A real facial expression database collection scheme in the VR context is implemented by adding an infrared camera and infrared light source to the HMD. A virtual database construction method is presented for FER in the VR context, which can improve the generalization of models. A deep network named the multi-region facial expression recognition model is designed for FER in the VR context. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

78. On Evaluating IoT Data Trust via Machine Learning.

Author: Tadj, Timothy, Arablouei, Reza, and Dedeoglu, Volkan
Subjects: TRUST, SUPERVISED learning, MACHINE learning, INTERNET of things, PYTHON programming language, TAGS (Metadata), SECURE Sockets Layer (Computer network protocol), RANDOM walks, CLUSTER analysis (Statistics)
Abstract: Data trust in IoT is crucial for safeguarding privacy, security, reliable decision-making, user acceptance, and complying with regulations. Various approaches based on supervised or unsupervised machine learning (ML) have recently been proposed for evaluating IoT data trust. However, assessing their real-world efficacy is hard mainly due to the lack of related publicly available datasets that can be used for benchmarking. Since obtaining such datasets is challenging, we propose a data synthesis method, called random walk infilling (RWI), to augment IoT time-series datasets by synthesizing untrustworthy data from existing trustworthy data. Thus, RWI enables us to create labeled datasets that can be used to develop and validate ML models for IoT data trust evaluation. We also extract new features from IoT time-series sensor data that effectively capture its autocorrelation as well as its cross-correlation with the data of the neighboring (peer) sensors. These features can be used to learn ML models for recognizing the trustworthiness of IoT sensor data. Equipped with our synthesized ground-truth-labeled datasets and informative correlation-based features, we conduct extensive experiments to critically examine various approaches to evaluating IoT data trust via ML. The results reveal that commonly used ML-based approaches to IoT data trust evaluation, which rely on unsupervised cluster analysis to assign trust labels to unlabeled data, perform poorly. This poor performance is due to the underlying assumption that clustering provides reliable labels for data trust, which is found to be untenable. The results also indicate that ML models, when trained on datasets augmented via RWI and using the proposed features, generalize well to unseen data and surpass existing related approaches. Moreover, we observe that a semi-supervised ML approach that requires only about 10% of the data labeled offers competitive performance while being practically more appealing compared to the fully supervised approaches. The related Python code and data are available online. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

79. Data for Digital Forensics: Why a Discussion on "How Realistic is Synthetic Data" is Dispensable.

Author: Göbel, Thomas, Baier, Harald, and Breitinger, Frank
Subjects: DIGITAL forensics, DATA libraries, FORENSIC sciences, RESEARCH personnel
Abstract: Digital forensics depends on data sets for various purposes like concept evaluation, educational training, and tool validation. Researchers have gathered such data sets into repositories and created data simulation frameworks for producing large amounts of data. Synthetic data often face skepticism due to its perceived deviation from real-world data, raising doubts about its realism. This paper addresses this concern, arguing that there is no definitive answer. We focus on four common digital forensic use cases that rely on data. Through these, we elucidate the specifications and prerequisites of data sets within their respective contexts. Our discourse uncovers that both real-world and synthetic data are indispensable for advancing digital forensic science, software, tools, and the competence of practitioners. Additionally, we provide an overview of available data set repositories and data generation frameworks, contributing to the ongoing dialogue on digital forensic data sets' utility. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

80. Genetic diversity and IUCN Red List status.

Author: Schmidt, Chloé, Hoban, Sean, Hunter, Margaret, Paz‐Vinas, Ivan, and Garroway, Colin J.
Subjects: *GENETIC variation, *GENETIC drift, *BIOLOGICAL extinction, *INBREEDING, *GENETIC correlations, *ENDANGERED species
Abstract: The International Union for Conservation of Nature (IUCN) Red List is an important and widely used tool for conservation assessment. The IUCN uses information about a species' range, population size, habitat quality and fragmentation levels, and trends in abundance to assess extinction risk. Genetic diversity is not considered, although it affects extinction risk. Declining populations are more strongly affected by genetic drift and higher rates of inbreeding, which can reduce the efficiency of selection, lead to fitness declines, and hinder species' capacities to adapt to environmental change. Given the importance of conserving genetic diversity, attempts have been made to find relationships between red‐list status and genetic diversity. Yet, there is still no consensus on whether genetic diversity is captured by the current IUCN Red List categories in a way that is informative for conservation. To assess the predictive power of correlations between genetic diversity and IUCN Red List status in vertebrates, we synthesized previous work and reanalyzed data sets based on 3 types of genetic data: mitochondrial DNA, microsatellites, and whole genomes. Consistent with previous work, species with higher extinction risk status tended to have lower genetic diversity for all marker types, but these relationships were weak and varied across taxa. Regardless of marker type, genetic diversity did not accurately identify threatened species for any taxonomic group. Our results indicate that red‐list status is not a useful metric for informing species‐specific decisions about the protection of genetic diversity and that genetic data cannot be used to identify threat status in the absence of demographic data. Thus, there is a need to develop and assess metrics specifically designed to assess genetic diversity and inform conservation policy, including policies recently adopted by the UN's Convention on Biological Diversity Kunming‐Montreal Global Biodiversity Framework. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

81. Lipid profiles and production performance responses of laying hens to dietary Moringa oleifera leaf meal: systematic review and meta-analysis.

Author: Ogbuewu, Ifeanyichukwu P. and Mbajiorgu, Christian A.
Abstract: The inclusion of Moringa oleifera leaf meal (MLM) in chicken diets especially in developing countries is on the increase due to scarcity of traditional feedstuffs. Therefore, this investigation aimed to explore the effects of MLM on lipid profiles and production characteristics of laying hens. Twenty-three publications retrieved from Web of Science, PubMed, Scopus and Google Scholar search engines were used for the analysis. Data from the 23 studies were analysed using random-effects model in OpenMEE software. Results were presented as standardised mean difference (SMD) at a 95% confidence interval. The results show significant improvement in feed conversion ratio (SMD = − 0.49; p <.001), egg mass (SMD = 0.35; p =.003), Haugh unit (SMD = 0.39; p <.001), eggshell thickness (SMD = 0.63; p <.001) and eggshell weight (SMD = 0.45; p <.001) at a reduced feed intake. On the other hand, egg weight, hen-day egg production and blood high-density lipoprotein cholesterol were not statistically different from controls. Results reveal that dietary MLM enhanced blood cholesterol, low-density lipoprotein (LDL) cholesterol, triglycerides and yolk cholesterol concentrations in laying hens. There is presence of significant heterogeneity and meta-regression revealed that study country, number of hen, housing system, hen age, inclusion level and layer strains were predictors of the treatment effect. In conclusion, the results of this meta-analysis suggest that inclusion of MLM in the diet of laying hens improved feed conversion ratio, aspects of egg quality and blood/yolk cholesterol concentrations in laying hens at a reduced feed intake. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

82. Pesticide effects on soil fauna communities—A meta‐analysis.

Author: Beaumelle, Léa, Tison, Léa, Eisenhauer, Nico, Hines, Jes, Malladi, Sandhya, Pelosi, Céline, Thouvenot, Lise, and Phillips, Helen R. P.
Subjects: *SOIL animals, *PESTICIDES, *AGRICULTURAL pests, *INVERTEBRATE communities, *GROWING season, *ECOSYSTEM health
Abstract: Soil invertebrate communities represent a significant fraction of global biodiversity and play crucial roles in ecosystems. A number of human activities threaten soil communities, in particular intensive agricultural practices such as pesticide use. However, there is currently no quantitative synthesis of the impacts of pesticides on soil fauna communities.Here, using a meta‐analysis of 54 studies and 294 observations, we quantify pesticide effects on the abundance, biomass, richness and diversity of natural soil fauna communities across a wide range of environmental contexts. We also identify scenarios with the most detrimental effects on soil fauna communities by analysing the effects of different pesticides (herbicides, fungicides, insecticides, broad‐spectrum substances and multiple substances), different application rates and temporal extents (short‐ or long‐term), as well as the response of different functional groups of soil animals (body size categories, presence of exoskeleton).Pesticides overall decreased the abundance and diversity of soil fauna communities across studies (Grand mean effect size (Hedge's g) = −0.30 +/− 0.16) and had stronger effects on soil fauna diversity than abundance. The most detrimental scenarios involved multiple substances, broad‐spectrum substances and insecticides, which significantly decreased soil fauna diversity even at recommended rates. We found no evidence that pesticide effects dampen over time, as short‐term and long‐term studies exhibited similar mean effect sizes.Policy implications: Our study highlights that pesticide use has significant detrimental non‐target effects on soil biodiversity, eroding a substantial part of global biodiversity and threatening ecosystem health. This provides crucial evidence supporting recent policies, such as the European Green Deal, that aim to reduce pesticide use in agriculture to conserve biodiversity. The detrimental effects of multiple substances revealed here are particularly concerning because realistic pesticide use often combines several substances targeting different pests and diseases over the crop season. We suggest that future guidelines for pesticide registration, restrictions and banning should rely on data able to fully capture the long‐term consequences of multiple substances for multiple non‐target species in realistic conditions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

83. Efficient Wheat Head Segmentation with Minimal Annotation: A Generative Approach

Author: Jaden Myers, Keyhan Najafian, Farhad Maleki, and Katie Ovens
Subjects: deep learning, segmentation, generative adversarial networks, data synthesis, Photography, TR1-1050, Computer applications to medicine. Medical informatics, R858-859.7, Electronic computers. Computer science, QA75.5-76.95
Abstract: Deep learning models have been used for a variety of image processing tasks. However, most of these models are developed through supervised learning approaches, which rely heavily on the availability of large-scale annotated datasets. Developing such datasets is tedious and expensive. In the absence of an annotated dataset, synthetic data can be used for model development; however, due to the substantial differences between simulated and real data, a phenomenon referred to as domain gap, the resulting models often underperform when applied to real data. In this research, we aim to address this challenge by first computationally simulating a large-scale annotated dataset and then using a generative adversarial network (GAN) to fill the gap between simulated and real images. This approach results in a synthetic dataset that can be effectively utilized to train a deep-learning model. Using this approach, we developed a realistic annotated synthetic dataset for wheat head segmentation. This dataset was then used to develop a deep-learning model for semantic segmentation. The resulting model achieved a Dice score of 83.4% on an internal dataset and Dice scores of 79.6% and 83.6% on two external datasets from the Global Wheat Head Detection datasets. While we proposed this approach in the context of wheat head segmentation, it can be generalized to other crop types or, more broadly, to images with dense, repeated patterns such as those found in cellular imagery.
Published: 2024
Full Text: View/download PDF

84. Enhanced Pet Behavior Prediction via S2GAN-Based Heterogeneous Data Synthesis

Author: Jinah Kim and Nammee Moon
Subjects: behavior prediction, behavior monitoring, heterogeneous data, data synthesis, generative adversarial network, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Heterogeneous data have been used to enhance behavior prediction performance; however, it involves issues such as missing data, which need to be addressed. This paper proposes enhanced pet behavior prediction via Sensor to Skeleton Generative Adversarial Networks (S2GAN)-based heterogeneous data synthesis. The S2GAN model synthesizes the key features of video skeletons based on collected nine-axis sensor data and replaces missing data, thereby enhancing the accuracy of behavior prediction. In this study, data collected from 10 pets in a real-life-like environment were used to conduct recognition experiments on 9 commonly occurring types of indoor behavior. Experimental results confirmed that the proposed S2GAN-based synthesis method effectively resolves possible missing data issues in real environments and significantly improves the performance of the pet behavior prediction model. Additionally, by utilizing data collected under conditions similar to the real environment, the method enables more accurate and reliable behavior prediction. This research demonstrates the importance and utility of synthesizing heterogeneous data in behavior prediction, laying the groundwork for applications in various fields such as abnormal behavior detection and monitoring.
Published: 2024
Full Text: View/download PDF

85. Element Information Enhancement for Diagram Question Answering with Synthetic Data

Author: Zhang, Yadong, Chen, Yang, Ren, Yupei, Lan, Man, Chen, Yuefeng, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Ningyu, editor, Wang, Meng, editor, Wu, Tianxing, editor, Hu, Wei, editor, and Deng, Shumin, editor
Published: 2022
Full Text: View/download PDF

86. Towards Real-World HDRTV Reconstruction: A Data Synthesis-Based Approach

Author: Cheng, Zhen, Wang, Tao, Li, Yong, Song, Fenglong, Chen, Chang, Xiong, Zhiwei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

87. BézierPalm: A Free Lunch for Palmprint Recognition

Author: Zhao, Kai, Shen, Lei, Zhang, Yingyi, Zhou, Chuhan, Wang, Tao, Zhang, Ruixin, Ding, Shouhong, Jia, Wei, Shen, Wei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

88. Data Synthesis and Iterative Refinement for Neural Semantic Parsing without Annotated Logical Forms

Author: Wu, Shan, Chen, Bo, Han, Xianpei, Sun, Le, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sun, Maosong, editor, Liu, Yang, editor, Che, Wanxiang, editor, Feng, Yang, editor, Qiu, Xipeng, editor, Rao, Gaoqi, editor, and Chen, Yubo, editor
Published: 2022
Full Text: View/download PDF

89. Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata

Author: Little, Claire, Elliot, Mark, Allmendinger, Richard, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Domingo-Ferrer, Josep, editor, and Laurent, Maryline, editor
Published: 2022
Full Text: View/download PDF

90. 3D Reconstruction of Medical Image Based on Improved Ray Casting Algorithm

Author: Yu, Wang, Ning, Gong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, El Yacoubi, Mounîm, editor, Granger, Eric, editor, Yuen, Pong Chi, editor, Pal, Umapada, editor, and Vincent, Nicole, editor
Published: 2022
Full Text: View/download PDF

91. Systematic Review and Evidence-Based Research in Dentistry

Author: Tabatabaei, Fahimeh, Tayebi, Lobat, Tabatabaei, Fahimeh, and Tayebi, Lobat
Published: 2022
Full Text: View/download PDF

92. Software Application Profile: The Anchored Multiplier calculator—a Bayesian tool to synthesize population size estimates

Author: Wesson, Paul D, McFarland, Willi, Qin, Cong Charlie, and Mirzazadeh, Ali
Subjects: Mathematical Sciences, Statistics, Bayes Theorem, Female, HIV Infections, Humans, Iran, Models, Statistical, Population Density, Population Surveillance, Sex Workers, Software, Bayesian modelling, population size estimation, key populations, data synthesis, Public Health and Health Services, Epidemiology, Public health
Abstract: Estimating the number of people in hidden populations is needed for public health research, yet available methods produce highly variable and uncertain results. The Anchored Multiplier calculator uses a Bayesian framework to synthesize multiple population size estimates to generate a consensus estimate. Users submit point estimates and lower/upper bounds which are converted to beta probability distributions and combined to form a single posterior probability distribution. The Anchored Multiplier calculator is available as a web browser-based application. The software allows for unlimited empirical population size estimates to be submitted and combined according to Bayes Theorem to form a single estimate. The software returns output as a forest plot (to visually compare data inputs and the final Anchored Multiplier estimate) and a table that displays results as population percentages and counts. The web application 'Anchored Multiplier Calculator' is free software and is available at [http://globalhealthsciences.ucsf.edu/resources/tools] or directly at [http://anchoredmultiplier.ucsf.edu/].
Published: 2019

93. Effects and parameters of community-based exercise on motor symptoms in Parkinson’s disease: a meta-analysis

Author: Chun-Lan Yang, Jia-Peng Huang, Ting-Ting Wang, Ying-Chao Tan, Yin Chen, Zi-Qi Zhao, Chao-Hua Qu, and Yun Qu
Subjects: Data synthesis, Exercise, Movement, Parkinson’s disease, Prescription, Review, Neurology. Diseases of the nervous system, RC346-429
Abstract: Abstract Background Community-based exercise is a continuation and complement to inpatient rehabilitation for Parkinson's disease and does not require a professional physical therapist or equipment. The effects, parameters, and forms of each exercise are diverse, and the effect is affected by many factors. A meta-analysis was conducted to determine the effect and the best parameters for improving motor symptoms and to explore the possible factors affecting the effect of community-based exercise. Methods We conducted a comprehensive search of six databases: PEDro, PubMed/Medline, CENTRAL, Scopus, Embase, and WOS. Studies that compared community-based exercise with usual care were included. The intervention mainly included dance, Chinese martial arts, Nordic walking, and home-based exercise. The primary outcome measure was the Unified Parkinson’s Disease Rating Scale part III (UPDRS-III) score. The mean difference (95% CI) was used to calculate the treatment outcomes of continuous outcome variables, and the I2 statistic was used to estimate the heterogeneity of the statistical analysis. We conducted subgroup analysis and meta-regression analysis to determine the optimal parameters and the most important influencing factors of the exercise effect. Results Twenty-two studies that enrolled a total of 809 subjects were included in the analysis. Exercise had a positive effect on the UPDRS-III (MD = -5.83; 95% CI, -8.29 to -3.37), Timed Up and Go test (MD = -2.22; 95% CI -3.02 to -1.42), UPDRS ((MD = -7.80; 95% CI -10.98 to -6.42), 6-Minute Walk Test (MD = 68.81; 95% CI, 32.14 to 105.48), and Berg Balance Scale (MD = 4.52; 95% CI, 2.72 to 5.78) scores. However, the heterogeneity of each included study was obvious. Weekly frequency, age, and duration of treatment were all factors that potentially influenced the effect. Conclusions This meta-analysis suggests that community-based exercise may benefit motor function in patients with PD. The most commonly used modalities of exercise were tango and tai chi, and the most common prescription was 60 min twice a week. Future studies should consider the influence of age, duration of treatment, and weekly frequency on the effect of exercise. PROSPERO trial registration number CRD42022327162.
Published: 2022
Full Text: View/download PDF

94. Data Synthesis for Alfalfa Biomass Yield Estimation

Author: Jonathan Vance, Khaled Rasheed, Ali Missaoui, and Frederick W. Maier
Subjects: machine learning, data synthesis, generative models, alfalfa, biomass, precision agriculture, Electronic computers. Computer science, QA75.5-76.95
Abstract: Alfalfa is critical to global food security, and its data is abundant in the U.S. nationally, but often scarce locally, limiting the potential performance of machine learning (ML) models in predicting alfalfa biomass yields. Training ML models on local-only data results in very low estimation accuracy when the datasets are very small. Therefore, we explore synthesizing non-local data to estimate biomass yields labeled as high, medium, or low. One option to remedy scarce local data is to train models using non-local data; however, this only works about as well as using local data. Therefore, we propose a novel pipeline that trains models using data synthesized from non-local data to estimate local crop yields. Our pipeline, synthesized non-local training (SNLT pronounced like sunlight), achieves a gain of 42.9% accuracy over the best results from regular non-local and local training on our very small target dataset. This pipeline produced the highest accuracy of 85.7% with a decision tree classifier. From these results, we conclude that SNLT can be a useful tool in helping to estimate crop yields with ML. Furthermore, we propose a software application called Predict Your CropS (PYCS pronounced like Pisces) designed to help farmers and researchers estimate and predict crop yields based on pretrained models.
Published: 2022
Full Text: View/download PDF

95. Laparoscopic versus ultrasoundguided transversus abdominis plane block for postoperative pain management in minimally invasive colorectal surgery: a meta-analysis protocol.

Author: Wenming Yang, Tao Yuan, Zhaolun Cai, Qin Ma, Xueting Liu, Hang Zhou, Siyuan Qiu, and Lie Yang
Subjects: POSTOPERATIVE pain treatment, TRANSVERSUS abdominis muscle, MINIMALLY invasive procedures, INFLAMMATORY bowel diseases, SURGICAL site infections
Abstract: Introduction: Transversus abdominis plane block (TAPB) is now commonly administered for postoperative pain control and reduced opioid consumption in patients undergoing major colorectal surgeries, such as colorectal cancer, diverticular disease, and inflammatory bowel disease resection. However, there remain several controversies about the effectiveness and safety of laparoscopic TAPB compared to ultrasound-guided TAPB. Therefore, the aim of this study is to integrate both direct and indirect comparisons to identify a more effective and safer TAPB approach. Materials and methods: Systematic electronic literature surveillance will be performed in the PubMed, Embase, Cochrane Central Register of Controlled Trials (CENTRAL), and ClinicalTrials.gov databases for eligible studies through July 31, 2023. The Cochrane Risk of Bias version 2 (RoB 2) and Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) tools will be applied to scrutinize the methodological quality of the selected studies. The primary outcomes will include (1) opioid consumption at 24 hours postoperatively and (2) pain scores at 24 hours postoperatively both at rest and at coughing and movement according to the numerical rating scale (NRS). Additionally, the probability of TAPB-related adverse events, overall postoperative 30-day complications, postoperative 30-day ileus, postoperative 30-day surgical site infection, postoperative 7-day nausea and vomiting, and length of stay will be analyzed as secondary outcome measures. The findings will be assessed for robustness through subgroup analyses and sensitivity analyses. Data analyses will be performed using RevMan 5.4.1 and Stata 17.0. P value of less than 0.05 will be defined as statistically significant. The certainty of evidence will be examined via the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) working group approach. Ethics and dissemination: Owing to the nature of the secondary analysis of existing data, no ethical approval will be required. Our meta-analysis will summarize all the available evidence for the effectiveness and safety of TAPB approaches for minimally invasive colorectal surgery. High-quality peerreviewed publications and presentations at international conferences will facilitate disseminating the results of this study, which are expected to inform future clinical trials and help anesthesiologists and surgeons determine the optimal tailored clinical practice for perioperative pain management. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

96. Social Science and Consensus in Estimates of the US Jewish Population: Response to Sasson and DellaPergola.

Author: Saxe, Leonard, Tighe, Elizabeth, Magidin de Kramer, Raquel, Nussbaum, Daniel, and Parmer, Daniel
Subjects: *JUDAISM, *JEWISH children, *JEWISH communities, *JEWISH studies, *JEWISH identity, *CONSENSUS (Social sciences)
Abstract: In response to Isaac Sasson and Sergio DellaPergola's commentaries on our assessment of the validity of the Pew Research Center's 2020 estimate of 7.5 million US Jewish adults and children (Tighe et al. 2022), we address key points of agreement and contention in the validity of the estimate; in particular, how the Jewish population is identified and defined. We argue that Pew's definition of the Jewish population is consistent with major studies of American Jewry, from NJPS 1990 to recent local Jewish community studies. Applying a consistent definition that includes the growing group of "Jews of no religion" with one Jewish parent, as Pew Research Center does, allows for a faithful comparison across national and local studies and a more accurate understanding of levels of Jewish engagement and expressions of Jewish identity. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

97. According to their Numbers: Assessing the Pew Research Center's Estimate of 7.5 Million Jewish Americans.

Author: Tighe, Elizabeth, Saxe, Leonard, Parmer, Daniel, Nussbaum, Daniel, and de Kramer, Raquel Magidin
Subjects: *AMERICAN Jews, *INTERMARRIAGE, *JEWISH children, *RELIGIOUS groups, *JEWISH identity, *RESEARCH institutes
Abstract: The Pew Research Center's survey, Jewish Americans in 2020, was designed to provide estimates of the size of the US Jewish population, sociodemographic data on issues such as intermarriage, child-rearing, engagement in Jewish communal life, and a description of American Jewish attitudes. A sophisticated sample design was employed to ensure accurate and generalizable assessments of the population. Because Jews are a small sub-group and the US government does not collect census data on religious groups, creating estimates is a non-trivial task. The focus of this paper is on the validity of Pew's estimate of 7.5 million US Jewish adults and children, 2.4% of the overall US population. The estimate is an important standalone indicator and is the basis for assessments of current Jewish attitudes and behavior. This paper considers the underlying construct of Jewish identity and its operationalization by Pew and evaluates the convergent validity of Pew's findings. The efforts to define "who is a Jew" in sociodemographic surveys is described, and a set of methodological challenges to creating estimates are considered. The results of this review indicate that Pew's criteria for inclusion in the population estimate comports with long-standing views of how to assess the Jewish population. Furthermore, Pew's estimate of 7.5 million Jewish Americans is consistent with other recent demographic studies of the population. Their conclusions about a growing US Jewish population suggest a new narrative of American Jewish life that reflects the diversity of ways in which Jewish identity is expressed. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

98. Test–retest reliability of the EUROFIT test battery: a review.

Author: Grgic, Jozo
Subjects: *RELIABILITY in engineering, *TEST reliability, *STATISTICAL reliability, *EQUILIBRIUM testing, *PHYSICAL fitness testing
Abstract: Purpose: While several studies have examined the reliability of the EUROFIT test battery, the findings are conflicting. Therefore, this paper aimed to conduct a review of studies that explored the reliability of the EUROFIT test battery. Methods: Seven databases were searched to find studies that investigated the reliability of the EUROFIT test battery. From all included studies, intra-class correlation coefficients for the nine tests used in EUROFIT were extracted. The COSMIN checklist was used to evaluate the methodological quality of the studies. Results: Six excellent quality studies were included in the review. The following findings were observed in the included studies: (a) the flamingo balance test has moderate-to-good reliability; (b) plate tapping, handgrip strength, sit-ups, bent-arm hang, 10 × 5-m agility shuttle run, and the 20-m multistage shuttle run have moderate-to-excellent reliability; and (c) the sit-and-reach and standing board jump tests have good-to-excellent reliability. Conclusion: Overall, the findings of this review suggest that the EUROFIT can be used as a reliable battery of tests to assess physical fitness in research and practice. Still, as there were only six included studies, more research in different populations is needed. Future studies are also required to explore the influence of variables (e.g., familiarization with the exercise tests) that may impact the reliability of the EUROFIT test battery. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

99. P‐97: Applying a generative model to improve TFT measurement capacity performance.

Author: Park, Kyongtae and Khim, Taeyoung
Subjects: LANGUAGE models, NOISE measurement, MASS production
Abstract: AMOLED is based on current driving, so the current characteristics of TFT IV(current per voltage) have much influence. Now, as the technology of AMOLED is expanded to HOP(hybrid of Oxide and Polysilicon TFT) and UPC(Under Panel Camera), polysilicon and oxide TFT are used in combination, and TFTs of various sizes must be used at the same time. So, first, it is attempted to reduce the IV measurement points and improve them with the known interpolation method. However, there is a limit to accurate prediction due to the non‐linear characteristics of TFT and noise measurement in the off region of TFT. In this paper, we applied an asymmetric neural network structure to overcome this limitation of reduction decoding. To do this, we developed an asymmetric autoencoder to decode or reconstruct TFT IV data from small sampled IV measurements (14~35%). Then, to overcome the error estimation of generating many errors in the surrounding interpolation because noise is included in the TFT off region, a function of partially removing noise only in the off region is introduced. In addition, to overcome the performance decrease problem due to the minimal amount of abnormal data, which should be accurately predicted in a situation close to abnormal, generative models were applied to overcome it. The IV information to be generated was encoded using a pretrained large language model to reduce the dimensionality and converted into text, which was then trained on the distillate GPT‐2 model and used as a generative model. In the general interpolation method, the performance decreases as the number of measurement samples are reduced; the method proposed in this paper maintains its performance even if the sampling is under 20%. This means that it is adequate to restore only a small portion of information using the latent space information that has learned the TFT IV saturation and linear mode characteristics. This is applied to mass production by increasing the measurement capability without additional equipment investment. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

100. Exploring the utility of synthetic data to extract more value from sensitive health data assets: A focused example in perinatal epidemiology.

Author: Braddon, Amy Elise, Robinson, Suzanne, Alati, Rosa, and Betts, Kim S.
Subjects: *PRENATAL exposure, *DATA privacy, *ELECTRONIC health records, *AFFECTIVE disorders, *BIRTH weight, *LOGISTIC regression analysis
Abstract: Background: Privacy, access and security concerns can hinder the availability of health data for research. The use of synthesised data in place of de‐identified electronic health records (EHRs) presents an opportunity to conduct research while minimising privacy concerns. Objectives: To examine whether synthesised data can replicate two prenatal epidemiological associations: between prenatal smoking and lower birthweight, and between prenatal mood disorders and lower birthweight, using data synthesised from de‐identified health administrative data collections. Methods: We generated two synthetic datasets, using parametric and non‐parametric data generating methods, and examined the synthetic data for evidence of privacy concerns. Next, univariable and multivariable logistic regression was utilised to estimate the associations in both synthetic datasets, with results then compared to the real data. Results: Both synthesised datasets performed well in identifying the reduction in birthweight associated with prenatal smoking, while the non‐parametric data underestimated the reduction in birthweight associated with prenatal mood disorders. Improbable relationships between some variables were identified in the parametric synthesised data, however, these can be addressed with simple rules during data synthesis. No duplicate rows (i.e., exact copies of de‐identified data) were found in the parametric data, while only 0.6% of the rows in the non‐parametric data were duplicated. Conclusions: Both synthesised datasets performed well in replicating the statistical properties of the original data while addressing privacy issues. Data synthesis methods provide an opportunity for researchers to utilise health data while managing privacy and security concerns. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,972 results on '"data synthesis"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources