14,186 results on '"Synthetic data"'
Search Results
2. Advancing Handwritten Text Detection by Synthetic Text
- Author
-
Muth, Markus, Peer, Marco, Kleber, Florian, Sablatnig, Robert, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Adabot: An Adaptive Trading Bot Using an Ensemble of Phase-Specific Few-Shot Learners to Adapt to the Changing Market Dynamics
- Author
-
Upadhyay, Vishvajeet, Paul, Angshuman, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
4. A Simple Background Augmentation Method for Object Detection with Diffusion Model
- Author
-
Li, Yuhang, Dong, Xin, Chen, Chen, Zhuang, Weiming, Lyu, Lingjuan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
5. MarkedDPR: Enhancing Dense Passage Retrieval with Exact Match Signals and Synthetic Data Augmentation
- Author
-
Oussaidene, Smail, Said Lhadj, Lynda, Boughanem, Mohand, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Barhamgi, Mahmoud, editor, Wang, Hua, editor, and Wang, Xin, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Semantic Bone Structure Segmentation in 2D Image Data: Towards Total Knee Arthroplasty
- Author
-
Neiss-Theuerkauff, Tobias, Schierbaum, Arne, Luhmann, Thomas, Sieberth, Till, Wallhoff, Frank, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bramer, Max, editor, and Stahl, Frederic, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Can OOD Object Detectors Learn from Foundation Models?
- Author
-
Liu, Jiahui, Wen, Xin, Zhao, Shizhen, Chen, Yingxian, Qi, Xiaojuan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
8. Synthetic Data: Generate Avatar Data on Demand
- Author
-
Lebrun, Thomas, Béziaud, Louis, Allard, Tristan, Boutet, Antoine, Gambs, Sébastien, Maouche, Mohamed, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Barhamgi, Mahmoud, editor, Wang, Hua, editor, and Wang, Xin, editor
- Published
- 2025
- Full Text
- View/download PDF
9. Generating and Evolving Real-Life Like Synthetic Data for e-Government Services Without Using Real-World Raw Data
- Author
-
Tammisto, Maj-Annika, Pfahl, Dietmar, Shah, Faiz Ali, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pfahl, Dietmar, editor, Gonzalez Huerta, Javier, editor, Klünder, Jil, editor, and Anwar, Hina, editor
- Published
- 2025
- Full Text
- View/download PDF
10. Synthetic Data Generation for Machine Learning Models with Cognitive Agent Simulations
- Author
-
Blythe, Jim, Tregubov, Alexey, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mathieu, Philippe, editor, and De la Prieta, Fernando, editor
- Published
- 2025
- Full Text
- View/download PDF
11. Meta-TadGAN: Time Series Anomaly Detection Using TadGAN with Meta-features
- Author
-
Silva, Inês Oliveira e, Soares, Carlos, Cerqueira, Vitor, Rodrigues, Arlete, Bastardo, Pedro, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Santos, Manuel Filipe, editor, Machado, José, editor, Novais, Paulo, editor, Cortez, Paulo, editor, and Moreira, Pedro Miguel, editor
- Published
- 2025
- Full Text
- View/download PDF
12. Synthetic Data for Robust Identification of Typical and Atypical Serotonergic Neurons Using Convolutional Neural Networks
- Author
-
Corradetti, Daniele, Bernardi, Alessandro, Corradetti, Renato, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Santos, Manuel Filipe, editor, Machado, José, editor, Novais, Paulo, editor, Cortez, Paulo, editor, and Moreira, Pedro Miguel, editor
- Published
- 2025
- Full Text
- View/download PDF
13. Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images
- Author
-
Zhang, David Junhao, Xu, Mutian, Wu, Jay Zhangjie, Xue, Chuhui, Zhang, Wenqing, Han, Xiaoguang, Bai, Song, Shou, Mike Zheng, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
14. PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
- Author
-
Li, Zhenyu, Bhat, Shariq Farooq, Wonka, Peter, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
15. Practical and Ethical Considerations for Generative AI in Medical Imaging
- Author
-
Jha, Debesh, Rauniyar, Ashish, Hagos, Desta Haileselassie, Sharma, Vanshali, Tomar, Nikhil Kumar, Zhang, Zheyuan, Isler, Ilkin, Durak, Gorkem, Wallace, Michael, Yazici, Cemal, Berzin, Tyler, Biswas, Koushik, Bagci, Ulas, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Puyol-Antón, Esther, editor, Zamzmi, Ghada, editor, Feragen, Aasa, editor, King, Andrew P., editor, Cheplygina, Veronika, editor, Ganz-Benjaminsen, Melanie, editor, Ferrante, Enzo, editor, Glocker, Ben, editor, Petersen, Eike, editor, Baxter, John S. H., editor, Rekik, Islem, editor, and Eagleson, Roy, editor
- Published
- 2025
- Full Text
- View/download PDF
16. On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models
- Author
-
Daum, Deniz, Osuala, Richard, Riess, Anneliese, Kaissis, Georgios, Schnabel, Julia A., Di Folco, Maxime, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mukhopadhyay, Anirban, editor, Oksuz, Ilkay, editor, Engelhardt, Sandy, editor, Mehrof, Dorit, editor, and Yuan, Yixuan, editor
- Published
- 2025
- Full Text
- View/download PDF
17. FISHing in Uncertainty: Synthetic Contrastive Learning for Genetic Aberration Detection
- Author
-
Gutwein, Simon, Kampel, Martin, Taschner-Mandl, Sabine, Licandro, Roxane, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sudre, Carole H., editor, Mehta, Raghav, editor, Ouyang, Cheng, editor, Qin, Chen, editor, Rakic, Marianne, editor, and Wells, William M., editor
- Published
- 2025
- Full Text
- View/download PDF
18. Learning Domain-Invariant Spatio-Temporal Visual Cues for Video-Based Crowd Panic Detection
- Author
-
Calle, Javier, Unzueta, Luis, Leskovsky, Peter, García, Jorge, Akhgar, Babak, Series Editor, Gkotsis, Ilias, editor, Kavallieros, Dimitrios, editor, Stoianov, Nikolai, editor, Vrochidis, Stefanos, editor, and Diagourtas, Dimitrios, editor
- Published
- 2025
- Full Text
- View/download PDF
19. AFreeCA: Annotation-Free Counting for All
- Author
-
D’Alessandro, Adriano, Mahdavi-Amiri, Ali, Hamarneh, Ghassan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
20. New Metrics to Benchmark and Improve BIM Visibility Within a Synthetic Image Generation Process for Computer Vision Progress Tracking
- Author
-
Nunez-Morales, Juan D., Hsu, Shun-Hsiang, Ibrahim, Amir, Golparvar-Fard, Mani, di Prisco, Marco, Series Editor, Chen, Sheng-Hong, Series Editor, Vayas, Ioannis, Series Editor, Kumar Shukla, Sanjay, Series Editor, Sharma, Anuj, Series Editor, Kumar, Nagesh, Series Editor, Wang, Chien Ming, Series Editor, Cui, Zhen-Dong, Series Editor, Lu, Xinzheng, Series Editor, Desjardins, Serge, editor, Poitras, Gérard J., editor, and Nik-Bakht, Mazdak, editor
- Published
- 2025
- Full Text
- View/download PDF
21. Private measures, random walks, and synthetic data.
- Author
-
Boedihardjo, March, Vershynin, Roman, and Strohmer, Thomas
- Subjects
Differential privacy ,Random walks ,Synthetic data - Abstract
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex-but very common-machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget ε bounded away from zero. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.
- Published
- 2024
22. Synthetic Data: Methods, Use Cases, and Risks
- Author
-
De Cristofaro, Emiliano
- Subjects
Information and Computing Sciences ,Human-Centred Computing ,Synthetic data ,Data privacy ,Data models ,Privacy ,Training ,Training data ,Security ,Computation Theory and Mathematics ,Computer Software ,Data Format ,Strategic ,Defence & Security Studies ,Cybersecurity and privacy - Published
- 2024
23. Generating Manufacturing Distributions for Sampling-based Tolerance Analysis using Deep Learning Models.
- Author
-
Schaechtl, Paul, Roth, Martin, Bräu, Julian, Goetz, Stefan, Schleich, Benjamin, and Wartzack, Sandro
- Abstract
Sampling-based tolerance analysis is a powerful tool for evaluating the quality of functional products, but requires realistic manufacturing distributions. However, the determination of resulting manufacturing distributions is usually associated with a high financial and time expenditure, especially for novel technologies such as Additive Manufacturing. Usually, sampling techniques are used to reproduce the original distribution of manufacturing variations based on statistical moments. In most cases, simplifying assumptions are made for this purpose, potentially leading to an inadequate representation of the correlation between machine and process parameters in the resulting distribution. In the worst case, this can lead to a falsification of the tolerance analysis results. Aiming to address this challenge, this paper presents an approach to imitate real-world manufacturing distributions using generative Machine Learning techniques based on Deep Learning with small real data sets. This enables a realistic reproduction of quasi-real manufacturing distributions and omits conventional sampling techniques. The general procedure and its applicability are shown via illustrative use cases from the tolerancing domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Local genetic correlation via knockoffs reduces confounding due to cross-trait assortative mating.
- Author
-
Ma, Shiyang, Wang, Fan, Border, Richard, Buxbaum, Joseph, Zaitlen, Noah, and Ionita-Laza, Iuliana
- Subjects
- *
GENETIC correlations , *ASSORTATIVE mating , *GENOME-wide association studies , *ATTENTION-deficit hyperactivity disorder , *LINKAGE disequilibrium - Abstract
Local genetic correlation analysis is an important tool for identifying genetic loci with shared biology across traits. Recently, Border et al. have shown that the results of these analyses are confounded by cross-trait assortative mating (xAM), leading to many false-positive findings. Here, we describe LAVA-Knock, a local genetic correlation method that builds off an existing genetic correlation method, LAVA, and augments it by generating synthetic data in a way that preserves local and long-range linkage disequilibrium (LD), allowing us to reduce the confounding induced by xAM. We show in simulations based on a realistic xAM model and in genome-wide association study (GWAS) applications for 630 trait pairs that LAVA-Knock can greatly reduce the bias due to xAM relative to LAVA. Furthermore, we show a significant positive correlation between the reduction in local genetic correlations and estimates in the literature of cross-mate phenotype correlations; in particular, pairs of traits that are known to have high cross-mate phenotype correlation values have a significantly higher reduction in the number of local genetic correlations compared with other trait pairs. A few representative examples include education and intelligence, education and alcohol consumption, and attention-deficit hyperactivity disorder and depression. These results suggest that LAVA-Knock can reduce confounding due to both short-range LD and long-range LD induced by xAM. Performing local genetic correlation is important for identifying genetic loci with shared biology across traits. However, results can be confounded by cross-trait assortative mating (xAM), leading to many false-positive findings. Ma et al. propose LAVA-Knock, a local genetic correlation method via knockoffs that is shown to reduce the confounding. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Kernel‐Based Bootstrap Synthetic Data to Estimate Measurement Uncertainty in Analytical Sciences.
- Author
-
Feinberg, Max, Clémençon, Stephan, Rudaz, Serge, and Boccard, Julien
- Abstract
Measurement uncertainty (MU) is becoming a key figure of merit for analytical methods, and estimating MU from method validation data is cost‐effective and practical. Since MU can be defined as a coverage interval of a given result, the computation of statistical prediction intervals is a possible approach, but the quality of the intervals is questionable when the number of available data is reduced. In this context, the bootstrap procedure constitutes an efficient strategy to increase the observed data variability. While applying naive bootstrap to validation data raises some computational challenges, the use of smooth bootstrap is much more interesting when synthetic data are generated using an adapted kernel density estimation algorithm. MU can be directly obtained in a very convenient way as an uncertainty function applicable to any unknown future measurement. This publication presents the advantages and disadvantages of this new method illustrated using diverse in‐house and interlaboratory validation data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Enhancing visual autonomous navigation in row-based crops with effective synthetic data generation.
- Author
-
Martini, Mauro, Ambrosio, Marco, Navone, Alessandro, Tuberga, Brenno, and Chiaberge, Marcello
- Subjects
- *
VISUAL perception , *PRECISION farming , *ROBUST control , *ROBOT control systems , *ROBOTICS , *MOBILE robots - Abstract
Introduction: Service robotics is recently enhancing precision agriculture enabling many automated processes based on efficient autonomous navigation solutions. However, data generation and in-field validation campaigns hinder the progress of large-scale autonomous platforms. Simulated environments and deep visual perception are spreading as successful tools to speed up the development of robust navigation with low-cost RGB-D cameras. Materials and methods: In this context, the contribution of this work resides in a complete framework to fully exploit synthetic data for a robust visual control of mobile robots. A wide realistic multi-crops dataset is accurately generated to train deep semantic segmentation networks and enabling robust performance in challenging real-world conditions. An automatic parametric approach enables an easy customization of virtual field geometry and features for a fast reliable evaluation of navigation algorithms. Results and conclusion: The high quality of the generated synthetic dataset is demonstrated by an extensive experimentation with real crops images and benchmarking the resulting robot navigation both in virtual and real fields with relevant metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Domain Generalization via Ensemble Stacking for Face Presentation Attack Detection.
- Author
-
Muhammad, Usman, Laaksonen, Jorma, Romaissa Beddiar, Djamila, and Oussalah, Mourad
- Subjects
- *
ARTIFICIAL neural networks , *HUMAN facial recognition software , *RECURRENT neural networks , *DEEP learning , *GENERALIZATION , *FACE perception - Abstract
Face presentation attack detection (PAD) plays a pivotal role in securing face recognition systems against spoofing attacks. Although great progress has been made in designing face PAD methods, developing a model that can generalize well to unseen test domains remains a significant challenge. Moreover, due to the different types of spoofing attacks, creating a dataset with a sufficient number of samples for training deep neural networks is a laborious task. This work proposes a comprehensive solution that combines synthetic data generation and deep ensemble learning to enhance the generalization capabilities of face PAD. Specifically, synthetic data is generated by blending a static image with spatiotemporal-encoded images using alpha composition and video distillation. In this way, we simulate motion blur with varying alpha values, thereby generating diverse subsets of synthetic data that contribute to a more enriched training set. Furthermore, multiple base models are trained on each subset of synthetic data using stacked ensemble learning. This allows the models to learn complementary features and representations from different synthetic subsets. The meta-features generated by the base models are used as input for a new model called the meta-model. The latter combines the predictions from the base models, leveraging their complementary information to better handle unseen target domains and enhance overall performance. Experimental results from seven datasets—WMCA, CASIA-SURF, OULU-NPU, CASIA-MFSD, Replay-Attack, MSU-MFSD, and SiW-Mv2—highlight the potential to enhance presentation attack detection by using large-scale synthetic data and a stacking-based ensemble approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. An overview of proposals towards the privacy-preserving publication of trajectory data.
- Author
-
Miranda-Pascual, Àlex, Guerra-Balboa, Patricia, Parra-Arnau, Javier, Forné, Jordi, and Strufe, Thorsten
- Subjects
- *
DATA privacy , *TRAFFIC engineering , *EVALUATION methodology , *PRIVACY , *DEEP learning , *CLASSIFICATION - Abstract
The privacy risks of processing human locations and their trajectories have been demonstrated by a large number of studies and real-world incidents. As a result, many efforts are aimed at making human location trajectories available for processing while protecting the privacy of individuals. A majority of these, however, are based on concepts and evaluation methodologies that do not always provide convincing results or obvious guarantees. The processing of locations and trajectories yields benefits in numerous domains, from municipal development over traffic engineering to personalized navigation and recommendations. It can also enable a variety of promising, entirely new applications, and is, therefore, the focus of many ongoing projects. With this article, we describe common trajectory types and representations and give a classification of meaningful utility measures, describe risks and attacks, and systematize previously published privacy notions. We then survey the field of protection mechanisms, classifying them into approaches of syntactic privacy, masking for differential privacy (DP), and generative approaches with DP for synthetic data. Key insights are that syntactic notions have serious drawbacks, especially in the field of trajectory data, but also that a large part of the literature that claims DP guarantees is considerably flawed. We also gather evidence that there may be hidden potential in the development of synthetic data generators, probably especially using deep learning with DP, since the utility of synthetic data has not been very satisfactory so far. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. CNN-based transfer learning for forest aboveground biomass prediction from ALS point cloud tomography.
- Author
-
Schäfer, Jannika, Winiwarter, Lukas, Weiser, Hannah, Höfle, Bernhard, Schmidtlein, Sebastian, Novotný, Jan, Krok, Grzegorz, Stereńczak, Krzysztof, Hollaus, Markus, and Fassnacht, Fabian Ewald
- Abstract
This study presents a new approach for predicting forest aboveground biomass (AGB) from airborne laser scanning (ALS) data: AGB is predicted from sequences of images depicting vertical cross-sections through the ALS point clouds. A 3D version of the VGG16 convolutional neural network (CNN) with initial weights transferred from pre-training on the ImageNet dataset was used. The approach was tested on datasets from Canada, Poland, and the Czech Republic. To analyse the effect of training sample size on model performance, different-sized samples ranging from 10 to 375 ground plots were used. The CNNs were compared with random forest models (RFs) trained on point cloud metrics. At the maximum number of training samples, the difference in RMSE between observed and predicted AGB of CNNs and RFs ranged from −2 t/ha to 5 t/ha, and the difference in squared Pearson correlation coefficient ranged from −0.05 to 0.06. Additional pre-training on synthetic data derived from virtual laser scanning of simulated forest stands could only improve the prediction performance of the CNNs when only a few real training samples (10–40) were available. While 3D CNNs trained on cross-section images derived from real data showed promising results, RFs remain a competitive alternative. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Deep-learning-based image reconstruction with limited data: generating synthetic raw data using deep learning.
- Author
-
Zijlstra, Frank and While, Peter Thomas
- Subjects
MAGNETIC resonance imaging ,DEEP learning - Abstract
Object: Deep learning has shown great promise for fast reconstruction of accelerated MRI acquisitions by learning from large amounts of raw data. However, raw data is not always available in sufficient quantities. This study investigates synthetic data generation to complement small datasets and improve reconstruction quality. Materials and methods: An adversarial auto-encoder was trained to generate phase and coil sensitivity maps from magnitude images, which were combined into synthetic raw data. On a fourfold accelerated MR reconstruction task, deep-learning-based reconstruction networks were trained with varying amounts of training data (20 to 160 scans). Test set performance was compared between baseline experiments and experiments that incorporated synthetic training data. Results: Training with synthetic raw data showed decreasing reconstruction errors with increasing amounts of training data, but importantly this was magnitude-only data, rather than real raw data. For small training sets, training with synthetic data decreased the mean absolute error (MAE) by up to 7.5%, whereas for larger training sets the MAE increased by up to 2.6%. Discussion: Synthetic raw data generation improved reconstruction quality in scenarios with limited training data. A major advantage of synthetic data generation is that it allows for the reuse of magnitude-only datasets, which are more readily available than raw datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Robust high frequency seismic bandwidth extension with a deep neural network trained using synthetic data.
- Author
-
Zwartjes, Paul and Jewoo Yoo
- Subjects
ARTIFICIAL intelligence ,MACHINE learning ,CONVOLUTIONAL neural networks ,GEOPHYSICISTS ,INFORMATION storage & retrieval systems - Abstract
Geophysicists interpreting seismic reflection data aim for the highest resolution possible as this facilitates the interpretation and discrimination of subtle geological features. Various deterministic methods based on Wiener filtering exist to increase the temporal frequency bandwidth and compress the seismic wavelet in a process called spectral shaping. Auto-encoder neural networks with convolutional layers have been applied to this problem, with encouraging results, but the problem of generalization to unseen data remains. Most published works have used supervised learning with training data constructed from field seismic data or synthetic seismic data generated based on measured well logs or based on seismic wavefield modelling. This leads to satisfactory results on datasets similar to the training data but requires re-training of the networks for unseen data with different characteristics. In this work seek to improve the generalization, not by experimenting with network architecture (we use a conventional U-net with some small modifications), but by adopting a different approach to creating the training data for the supervised learning process. Although the network is important, at this stage of development we see more improvement in prediction results by altering the design of the training data than by architectural changes. The approach we take is to create synthetic training data consisting of simple geometric shapes convolved with a seismic wavelet. We created a very diverse training dataset consisting of 9000 seismic images with between 5 and 300 seismic events resembling seismic reflections that have geophysically motived perturbations in terms of shape and character. The 2D U-net we have trained can boost robustly and recursively the dominant frequency by 50%. We demonstrate this on unseen field data with different bandwidths and signalto- noise ratios. Additionally, this 2D U-net can handle non-stationary wavelets and overlapping events of different bandwidth without creating excessive ringing. It is also robust in the presence of noise. The significance of this result is that it simplifies the effort of bandwidth extension and demonstrates the usefulness of autoencoder neural network for geophysical data processing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Development of a cerebellar ataxia diagnosis model using conditional GAN-based synthetic data generation for visuomotor adaptation task.
- Author
-
Kim, Jinah, Woo, Sung-Ho, Kim, Taekyung, Yoon, Won Tae, Shin, Jung Hwan, Lee, Jee-Young, and Ryu, Jeh-Kwang
- Subjects
- *
GENERATIVE adversarial networks , *CEREBELLAR ataxia , *DEEP learning , *DIGITAL health , *EARLY diagnosis - Abstract
This study proposes a synthetic data generation model to create a classification framework for cerebellar ataxia patients using trajectory data from the visuomotor adaptation task. The classification objectives include patients with cerebellar ataxia, age-matched normal individuals, and young healthy subjects. Synthetic data for the three classes is generated based on class conditions and random noise by leveraging a combination of conditional adversarial generative neural networks and reconstruction networks. This synthetic data, alongside real data, is utilized as training data for the patient classification model to enhance classification accuracy. The fidelity of the synthetic data is assessed visually to measure the validity and diversity of the generated data qualitatively while quantitatively evaluating distribution similarity to real data. Furthermore, the clinical efficacy of the patient classification model employing synthetic data is demonstrated by showcasing improved classification accuracy through a comparative analysis between results obtained using solely real data and those obtained when both real and synthetic data are utilized. This methodological approach holds promise in addressing data insufficiency in the digital healthcare domain, employing deep learning methodologies, and developing early disease diagnosis tools. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Fractals as Pre-Training Datasets for Anomaly Detection and Localization.
- Author
-
Ugwu, Cynthia I., Caruso, Emanuele, and Lanz, Oswald
- Subjects
- *
DATA security , *MACHINE learning , *FRACTALS , *PRIVACY , *POSSIBILITY - Abstract
Anomaly detection is crucial in large-scale industrial manufacturing as it helps to detect and localize defective parts. Pre-training feature extractors on large-scale datasets is a popular approach for this task. Stringent data security, privacy regulations, high costs, and long acquisition time hinder the development of large-scale datasets for training and benchmarking. Despite recent work focusing primarily on the development of new anomaly detection methods based on such extractors, not much attention has been paid to the importance of the data used for pre-training. This study compares representative models pre-trained with fractal images against those pre-trained with ImageNet, without subsequent task-specific fine-tuning. We evaluated the performance of eleven state-of-the-art methods on MVTecAD, MVTec LOCO AD, and VisA, well-known benchmark datasets inspired by real-world industrial inspection scenarios. Further, we propose a novel method to create a dataset by combining the dynamically generated fractal images creating a "Multi-Formula" dataset. Even though pre-training with ImageNet leads to better results, fractals can achieve close performance to ImageNet under proper parametrization. This opens up the possibility for a new research direction where feature extractors could be trained on synthetically generated abstract datasets mitigating the ever-increasing demand for data in machine learning while circumventing privacy and security concerns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Yet Another Discriminant Analysis (YADA): A Probabilistic Model for Machine Learning Applications.
- Author
-
Field Jr., Richard V., Smith, Michael R., Wuest, Ellery J., and Ingram, Joe B.
- Subjects
- *
DISTRIBUTION (Probability theory) , *MARGINAL distributions , *DEEP learning , *DISCRIMINANT analysis , *DECISION making - Abstract
This paper presents a probabilistic model for various machine learning (ML) applications. While deep learning (DL) has produced state-of-the-art results in many domains, DL models are complex and over-parameterized, which leads to high uncertainty about what the model has learned, as well as its decision process. Further, DL models are not probabilistic, making reasoning about their output challenging. In contrast, the proposed model, referred to as Yet Another Discriminate Analysis(YADA), is less complex than other methods, is based on a mathematically rigorous foundation, and can be utilized for a wide variety of ML tasks including classification, explainability, and uncertainty quantification. YADA is thus competitive in most cases with many state-of-the-art DL models. Ideally, a probabilistic model would represent the full joint probability distribution of its features, but doing so is often computationally expensive and intractable. Hence, many probabilistic models assume that the features are either normally distributed, mutually independent, or both, which can severely limit their performance. YADA is an intermediate model that (1) captures the marginal distributions of each variable and the pairwise correlations between variables and (2) explicitly maps features to the space of multivariate Gaussian variables. Numerous mathematical properties of the YADA model can be derived, thereby improving the theoretic underpinnings of ML. Validation of the model can be statistically verified on new or held-out data using native properties of YADA. However, there are some engineering and practical challenges that we enumerate to make YADA more useful. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Synthetic Data for Deep Learning in Computer Vision & Medical Imaging: A Means to Reduce Data Bias.
- Author
-
Paproki, Anthony, Salvado, Olivier, and Fookes, Clinton
- Subjects
- *
ARTIFICIAL neural networks , *ALGORITHMIC bias , *PATTERN recognition systems , *SUPERVISED learning , *ARTIFICIAL intelligence , *DEEP learning , *LUNGS , *HUMAN skin color - Published
- 2024
- Full Text
- View/download PDF
36. Noninvasive Deep Learning Analysis for Smith–Magenis Syndrome Classification.
- Author
-
Núñez-Vidal, Esther, Fernández-Ruiz, Raúl, Álvarez-Marquina, Agustín, Hidalgo-delaGuía, Irene, Garayzábal-Heinze, Elena, Hristov-Kalamov, Nikola, Domínguez-Mateos, Francisco, Conde, Cristina, and Martínez-Olalla, Rafael
- Subjects
ARTIFICIAL neural networks ,VOICE analysis ,SPEECH synthesis ,RARE diseases ,GENETIC testing - Abstract
Smith–Magenis syndrome (SMS) is a rare, underdiagnosed condition due to limited public awareness of genetic testing and a lengthy diagnostic process. Voice analysis can be a noninvasive tool for monitoring and detecting SMS. In this paper, the cepstral peak prominence and mel-frequency cepstral coefficients are used as disease monitoring and detection metrics. In addition, an efficient neural network, incorporating synthetic data processes, was used to detect SMS in a cohort of individuals with the disease. Three study cases were conducted with a set of 19 SMS patients and 292 controls. The three study cases employed various oversampling and undersampling techniques, including SMOTE, random oversampling, NearMiss, random undersampling, and 16 additional methods, resulting in balanced accuracies ranging from 69% to 92%. This is the first study using a neural network model to focus on a rare genetic syndrome using phonation analysis data. By using synthetic data (oversampling and undersampling) and a CNN, it was possible to detect SMS with high levels of accuracy. Voice analysis and deep learning techniques have proven to be a useful and noninvasive method. This is a finding that may help in the complex identification of this syndrome as well as other rare diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging.
- Author
-
Stanley, Emma A M, Souza, Raissa, Winder, Anthony J, Gulve, Vedant, Amador, Kimberly, Wilms, Matthias, and Forkert, Nils D
- Abstract
Objective Artificial intelligence (AI) models trained using medical images for clinical tasks often exhibit bias in the form of subgroup performance disparities. However, since not all sources of bias in real-world medical imaging data are easily identifiable, it is challenging to comprehensively assess their impacts. In this article, we introduce an analysis framework for systematically and objectively investigating the impact of biases in medical images on AI models. Materials and Methods Our framework utilizes synthetic neuroimages with known disease effects and sources of bias. We evaluated the impact of bias effects and the efficacy of 3 bias mitigation strategies in counterfactual data scenarios on a convolutional neural network (CNN) classifier. Results The analysis revealed that training a CNN model on the datasets containing bias effects resulted in expected subgroup performance disparities. Moreover, reweighing was the most successful bias mitigation strategy for this setup. Finally, we demonstrated that explainable AI methods can aid in investigating the manifestation of bias in the model using this framework. Discussion The value of this framework is showcased in our findings on the impact of bias scenarios and efficacy of bias mitigation in a deep learning model pipeline. This systematic analysis can be easily expanded to conduct further controlled in silico trials in other investigations of bias in medical imaging AI. Conclusion Our novel methodology for objectively studying bias in medical imaging AI can help support the development of clinical decision-support tools that are robust and responsible. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Completing 3D point clouds of individual trees using deep learning.
- Author
-
Bornand, Aline, Abegg, Meinrad, Morsdorf, Felix, and Rehush, Nataliia
- Subjects
POINT cloud ,DECIDUOUS plants ,ARTIFICIAL intelligence ,TREE height ,CLOUD forests - Abstract
In close‐range remote sensing data collected in a forest, occlusion often causes incomplete or sparse point cloud representations of individual trees, impeding accurate 3D reconstruction of tree architecture and estimation of tree height and volume. Recent developments in deep learning (DL) for 3D data have produced approaches for point cloud completion, which could potentially be applied to trees.We explored the potential of a DL approach to fill gaps in dense point clouds representing the main structures of deciduous trees by applying an existing transformer‐based completion model (PoinTr). Complete point clouds are required as training data, but even dense terrestrial laser scanning (TLS) data sets contain gaps caused by occlusion, making it nearly impossible to acquire such data. We therefore investigated the ability of point cloud completion models trained on a range of synthetic data sets to handle occlusion patterns in real‐world point clouds.Despite the limited data set, we successfully fine‐tuned a general pre‐trained completion model to fill gaps within 1 m3 segments of tree point clouds. Fine‐tuning on synthetic tree data improved the model's ability to complete tree objects compared with training on diverse artificial objects. However, the quality of the predictions was influenced by the level of sophistication of the synthetic data. Our results demonstrate that incorporating even limited real‐world TLS data during training can considerably improve completion results but may introduce additional noise in the predictions.3D point cloud completion with DL has the potential to improve and fill gaps in point clouds of individual trees, facilitating further steps in the processing and analysis of 3D forest data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Synthetic data for privacy-preserving clinical risk prediction.
- Author
-
Qian, Zhaozhi, Callender, Thomas, Cebere, Bogdan, Janes, Sam M., Navani, Neal, and van der Schaar, Mihaela
- Subjects
- *
DATA release , *PROGNOSTIC models , *MACHINE learning , *LUNG cancer , *INFORMATION sharing - Abstract
Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches—such as federated learning—analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act in place of real data for a wide range of use cases. However, the role that synthetic data might play in all aspects of clinical model development remains unknown. In this work, we used state-of-the-art generators explicitly designed for privacy preservation to create a synthetic version of ever-smokers in the UK Biobank before building prognostic models for lung cancer under several data release assumptions. We demonstrate that synthetic data can be effectively used throughout the medical prognostic modeling pipeline even without eventual access to the real data. Furthermore, we show the implications of different data release approaches on how synthetic biobank data could be deployed within the healthcare system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Method for Enhancing AI Accuracy in Pressure Injury Detection Using Real and Synthetic Datasets.
- Author
-
Kim, Jaeseung, Kim, Mujung, Youn, Heejun, Lee, Seunghyun, Kwon, Soonchul, and Park, Kyung Hee
- Subjects
PRESSURE ulcers ,STABLE Diffusion ,ARTIFICIAL intelligence ,IMAGE recognition (Computer vision) ,NOSOLOGY ,CHATBOTS - Abstract
Pressure injuries pose significant health risks, especially for the elderly, immobile individuals, and those with sensory impairments. These injuries can rapidly become chronic, making initial diagnosis important. Due to the difficulty of transporting patients from local health facilities to higher-level general hospitals for treatment, it is essential to utilize telemedicine tools, such as chatbots, to ensure rapid initial diagnosis. Recent advances in artificial intelligence have demonstrated potential for medical imaging and disease classification. Ongoing research in the field of dermatological diseases focuses on disease classification. However, the assessment accuracy of artificial intelligence is often limited by unequal class distributions and insufficient dataset quantities. In this study, we aim to enhance the accuracy of artificial intelligence models by generating synthetic datasets. Specifically, we focused on training models for Pressure Injury assessment using both real and synthetic datasets. We used PI data at a domestic medical university. As part of our supplementary research, we established a chatbot system to facilitate the assessment of pressure injuries. Using both constructed and synthetic data, we achieved a top-1 accuracy of 92.03%. The experimental results demonstrate that combining real and synthetic data significantly improves model accuracy. These findings suggest that synthetic datasets can be effectively utilized to address the limitations of small-scale datasets in medical applications. Future research should explore the use of diverse synthetic data generation methods and validate model performance on a variety of datasets to enhance the generalization and robustness of AI models for Pressure Injury assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Insulin Resistance and Impaired Insulin Secretion Predict Incident Diabetes: A Statistical Matching Application to the Two Korean Nationwide, Population-Representative Cohorts.
- Author
-
Hyemin Jo, Soyeon Ahn, Jung Hun Ohn, Cheol Min Shin, Eunjeong Ji, Donggil Kim, Sung Jae Jung, and Joongyub Lee
- Subjects
- *
DATA privacy , *STATISTICAL matching , *INSULIN resistance , *NATIONAL health insurance , *MEDICAL screening - Abstract
Background: To evaluate whether insulin resistance and impaired insulin secretion are useful predictors of incident diabetes in Koreans using nationwide population-representative data to enhance data privacy. Methods: This study analyzed the data of individuals without diabetes aged >40 years from the Korea National Health and Nutrition Examination Survey (KNHANES) 2007–2010 and 2015 and the National Health Insurance Service-National Health Screening Cohort (NHIS-HEALS). Owing to privacy concerns, these databases cannot be linked using direct identifiers. Therefore, we generated 10 synthetic datasets, followed by statistical matching with the NHIS-HEALS. Homeostasis model assessment of insulin resistance (HOMA-IR) and homeostasis model assessment of β-cell function (HOMA-β) were used as indicators of insulin resistance and insulin secretory function, respectively, and diabetes onset was captured in NHIS-HEALS. Results: A median of 4,580 (range, 4,463 to 4,761) adults were included in the analyses after statistical matching of 10 synthetic KNHANES and NHIS-HEALS datasets. During a mean follow-up duration of 5.8 years, a median of 4.7% (range, 4.3% to 5.0%) of the participants developed diabetes. Compared to the reference low–HOMA-IR/high–HOMA-β group, the high–HOMA-IR/low– HOMA-β group had the highest risk of diabetes, followed by high–HOMA-IR/high–HOMA-β group and low–HOMA-IR/low– HOMA-β group (median adjusted hazard ratio [ranges]: 3.36 [1.86 to 6.05], 1.81 [1.01 to 3.22], and 1.68 [0.93 to 3.04], respectively). Conclusion: Insulin resistance and impaired insulin secretion are robust predictors of diabetes in the Korean population. A retrospective cohort constructed by combining cross-sectional synthetic and longitudinal claims-based cohort data through statistical matching may be a reliable resource for studying the natural history of diabetes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. An Image-Based Sensor System for Low-Cost Airborne Particle Detection in Citizen Science Air Quality Monitoring.
- Author
-
Ali Shah, Syed Mohsin, Casado-Mansilla, Diego, and López-de-Ipiña, Diego
- Subjects
- *
IMAGE processing , *ENVIRONMENTAL monitoring , *AIR pollution , *COMMUNITY involvement , *RANDOM noise theory , *AIR quality monitoring - Abstract
Air pollution poses significant public health risks, necessitating accurate and efficient monitoring of particulate matter (PM). These organic compounds may be released from natural sources like trees and vegetation, as well as from anthropogenic, or human-made sources including industrial activities and motor vehicle emissions. Therefore, measuring PM concentrations is paramount to understanding people's exposure levels to pollutants. This paper introduces a novel image processing technique utilizing photographs/pictures of Do-it-Yourself (DiY) sensors for the detection and quantification of P M 10 particles, enhancing community involvement and data collection accuracy in Citizen Science (CS) projects. A synthetic data generation algorithm was developed to overcome the challenge of data scarcity commonly associated with citizen-based data collection to validate the image processing technique. This algorithm generates images by precisely defining parameters such as image resolution, image dimension, and PM airborne particle density. To ensure these synthetic images mimic real-world conditions, variations like Gaussian noise, focus blur, and white balance adjustments and combinations were introduced, simulating the environmental and technical factors affecting image quality in typical smartphone digital cameras. The detection algorithm for P M 10 particles demonstrates robust performance across varying levels of noise, maintaining effectiveness in realistic mobile imaging conditions. Therefore, the methodology retains sufficient accuracy, suggesting its practical applicability for environmental monitoring in diverse real-world conditions using mobile devices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Synthetic Data for Video Surveillance Applications of Computer Vision: A Review.
- Author
-
Delussu, Rita, Putzu, Lorenzo, and Fumera, Giorgio
- Subjects
- *
OBJECT recognition (Computer vision) , *COMPUTER vision , *IMAGE analysis , *BEHAVIORAL assessment , *APPLICATION software , *VIDEO surveillance , *DEEP learning - Abstract
In recent years, there has been a growing interest in synthetic data for several computer vision applications, such as automotive, detection and tracking, surveillance, medical image analysis and robotics. Early use of synthetic data was aimed at performing controlled experiments under the analysis by synthesis approach. Currently, synthetic data are mainly used for training computer vision models, especially deep learning ones, to address well-known issues of real data, such as manual annotation effort, data imbalance and bias, and privacy-related restrictions. In this work, we survey the use of synthetic training data focusing on applications related to video surveillance, whose relevance has rapidly increased in the past few years due to their connection to security: crowd counting, object and pedestrian detection and tracking, behaviour analysis, person re-identification and face recognition. Synthetic training data are even more interesting in this kind of application, to address further, specific issues arising, e.g., from typically unconstrained image or video acquisition conditions and cross-scene application scenarios. We categorise and discuss the existing methods for creating synthetic data, analyse the synthetic data sets proposed in the literature for each of the considered applications, and provide an overview of their effectiveness as training data. We finally discuss whether and to what extent the existing synthetic data sets mitigate the issues of real data, highlight existing open issues, and suggest future research directions in this field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Complete blood count as a biomarker for preeclampsia with severe features diagnosis: a machine learning approach.
- Author
-
Araújo, Daniella Castro, de Macedo, Alexandre Afonso, Veloso, Adriano Alonso, Alpoim, Patricia Nessralla, Gomes, Karina Braga, Carvalho, Maria das Graças, and Dusse, Luci Maria SantAna
- Subjects
- *
MACHINE learning , *BLOOD cell count , *DATA augmentation , *ARTIFICIAL intelligence , *STATISTICAL smoothing - Abstract
Objective: This study introduces the complete blood count (CBC), a standard prenatal screening test, as a biomarker for diagnosing preeclampsia with severe features (sPE), employing machine learning models. Methods: We used a boosting machine learning model fed with synthetic data generated through a new methodology called DAS (Data Augmentation and Smoothing). Using data from a Brazilian study including 132 pregnant women, we generated 3,552 synthetic samples for model training. To improve interpretability, we also provided a ridge regression model. Results: Our boosting model obtained an AUROC of 0.90±0.10, sensitivity of 0.95, and specificity of 0.79 to differentiate sPE and non-PE pregnant women, using CBC parameters of neutrophils count, mean corpuscular hemoglobin (MCH), and the aggregate index of systemic inflammation (AISI). In addition, we provided a ridge regression equation using the same three CBC parameters, which is fully interpretable and achieved an AUROC of 0.79±0.10 to differentiate the both groups. Moreover, we also showed that a monocyte count lower than 490 / m m 3 yielded a sensitivity of 0.71 and specificity of 0.72. Conclusion: Our study showed that ML-powered CBC could be used as a biomarker for sPE diagnosis support. In addition, we showed that a low monocyte count alone could be an indicator of sPE. Significance: Although preeclampsia has been extensively studied, no laboratory biomarker with favorable cost-effectiveness has been proposed. Using artificial intelligence, we proposed to use the CBC, a low-cost, fast, and well-spread blood test, as a biomarker for sPE. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. SAR 영상 탐지식별 네트워크벤치마크및위상 오차에 의한열화분석.
- Author
-
백민영, 옥재우, and 신희섭
- Abstract
SAR images are more difficult to analyze and acquire than optical images; therefore, high-quality datasets for deep learning models are insufficient. In this study, a new detection dataset was generated based on the MSTAR dataset to train detection networks for ground military targets. To verify the dataset and analyze the performance of the ground target detection network in SAR images, various mod- els were used to compare the detection ability, resulting in an overall good performance of mAP 0.8 or higher. In addition, we analyzed the performance change trends if phase errors were included in the test images. In terms of image and detection performance degradation, we determined an acceptable range for the phase error that did not significantly reduce the detection performance. This is expected to contribute to the creation of more robust networks when building SAR image datasets and models in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Bias Mitigation via Synthetic Data Generation: A Review.
- Author
-
Shahul Hameed, Mohamed Ashik, Qureshi, Asifa Mehmood, and Kaushik, Abhishek
- Subjects
ARTIFICIAL intelligence ,HEALTH equity ,DATA quality ,FAIRNESS ,FORECASTING - Abstract
Artificial intelligence (AI) is widely used in healthcare applications to perform various tasks. Although these models have great potential to improve the healthcare system, they have also raised significant ethical concerns, including biases that increase the risk of health disparities in medical applications. The under-representation of a specific group can lead to bias in the datasets that are being replicated in the AI models. These disadvantaged groups are disproportionately affected by bias because they may have less accurate algorithmic forecasts or underestimate the need for treatment. One solution to eliminate bias is to use synthetic samples or artificially generated data to balance datasets. Therefore, the purpose of this study is to review and evaluate how synthetic data can be generated and used to mitigate biases, specifically focusing on the medical domain. We explored high-quality peer-reviewed articles that were focused on synthetic data generation to eliminate bias. These studies were selected based on our defined inclusion criteria and exclusion criteria and the quality of the content. The findings reveal that generated synthetic data can help improve accuracy, precision, and fairness. However, the effectiveness of synthetic data is closely dependent on the quality of the data generation process and the initial datasets used. The study also highlights the need for continuous improvement in synthetic data generation techniques and the importance of evaluation metrics for fairness in AI models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.
- Author
-
Klein, Jonathan, Waller, Rebekah, Pirk, Sören, Pałubicki, Wojtek, Tester, Mark, and Michels, Dominik L.
- Subjects
ARTIFICIAL intelligence ,MACHINE learning ,AGRICULTURE ,TOMATOES ,MODELS & modelmaking - Abstract
The rise of artificial intelligence (AI) and in particularmodernmachine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are themain cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using nontask-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A Novel Taxonomy for Navigating and Classifying Synthetic Data in Healthcare Applications.
- Author
-
VAN DIJK, Bram, ul ISLAM, Saif, ACHTERBERG, Jim, MUHAMMAD WASEEM, Hafiz, GALLOS, Parisis, EPIPHANIOU, Gregory, MAPLE, Carsten, HAAS, Marcel, and SPRUIT, Marco
- Abstract
Data-driven technologies have improved the efficiency, reliability and effectiveness of healthcare services, but come with an increasing demand for data, which is challenging due to privacy-related constraints on sharing data in healthcare contexts. Synthetic data has recently gained popularity as potential solution, but in the flurry of current research it can be hard to oversee its potential. This paper proposes a novel taxonomy of synthetic data in healthcare to navigate the landscape in terms of three main varieties. Data Proportion comprises different ratios of synthetic data in a dataset and associated pros and cons. Data Modality refers to the different data formats amenable to synthesis and format-specific challenges. Data Transformation concerns improving specific aspects of a dataset like its utility or privacy with synthetic data. Our taxonomy aims to help researchers in the healthcare domain interested in synthetic data to grasp what types of datasets, data modalities, and transformations are possible with synthetic data, and where the challenges and overlaps between the varieties lie. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Investigating the Sim-to-Real Generalizability of Deep Learning Object Detection Models.
- Author
-
Rüter, Joachim, Durak, Umut, and Dauer, Johann C.
- Subjects
OBJECT recognition (Computer vision) ,COMPUTER vision ,AIRPLANE air refueling ,DEEP learning ,SIMULATION methods & models - Abstract
State-of-the-art object detection models need large and diverse datasets for training. As these are hard to acquire for many practical applications, training images from simulation environments gain more and more attention. A problem arises as deep learning models trained on simulation images usually have problems generalizing to real-world images shown by a sharp performance drop. Definite reasons and influences for this performance drop are not yet found. While previous work mostly investigated the influence of the data as well as the use of domain adaptation, this work provides a novel perspective by investigating the influence of the object detection model itself. Against this background, first, a corresponding measure called sim-to-real generalizability is defined, comprising the capability of an object detection model to generalize from simulation training images to real-world evaluation images. Second, 12 different deep learning-based object detection models are trained and their sim-to-real generalizability is evaluated. The models are trained with a variation of hyperparameters resulting in a total of 144 trained and evaluated versions. The results show a clear influence of the feature extractor and offer further insights and correlations. They open up future research on investigating influences on the sim-to-real generalizability of deep learning-based object detection models as well as on developing feature extractors that have better sim-to-real generalizability capabilities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Validation Assessment of Privacy‐Preserving Synthetic Electronic Health Record Data: Comparison of Original Versus Synthetic Data on Real‐World COVID‐19 Vaccine Effectiveness.
- Author
-
Wang, Echo, Mott, Katrina, Zhang, Hongtao, Gazit, Sivan, Chodick, Gabriel, and Burcu, Mehmet
- Abstract
Purpose: To assess the validity of privacy‐preserving synthetic data by comparing results from synthetic versus original EHR data analysis. Methods: A published retrospective cohort study on real‐world effectiveness of COVID‐19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID‐19 infection, symptomatic COVID‐19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results. Results: The distribution of demographic and clinical characteristics demonstrated very small difference (< 0.01 SMD). In the comparison of vaccine effectiveness assessed in relative risk reduction between synthetic versus original data, there was a 100% decision agreement, 100% estimate agreement, and a high level of confidence interval overlap (88.7%–99.7%) in all five replicates across all subgroups. Similar findings were achieved in the assessment of vaccine effectiveness against symptomatic COVID‐19 Infection. In the comparison of hazard ratios for COVID 19‐related hospitalization and odds ratio for symptomatic COVID‐19 Infection, the Wald tests suggested no significant difference between respective effect estimates in all five replicates for all patient subgroups but there were disagreements in estimate and decision metrices in some subgroups and replicates. Conclusions: Overall, comparison of synthetic versus original real‐world data demonstrated good validity and reliability. Transparency on the process to generate high fidelity synthetic data and assurances of patient privacy are warranted. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.