105 results on '"Artur Dubrawski"'
Search Results
2. Continuous ECG monitoring should be the heart of bedside AI-based predictive analytics monitoring for early detection of clinical deterioration
- Author
-
Oliver J, Monfredi, Christopher C, Moore, Brynne A, Sullivan, Jessica, Keim-Malpass, Karen D, Fairchild, Tyler J, Loftus, Azra, Bihorac, Katherine N, Krahn, Artur, Dubrawski, Douglas E, Lake, J Randall, Moorman, and Gilles, Clermont
- Subjects
Cardiology and Cardiovascular Medicine ,Article - Abstract
The idea that we can detect subacute potentially catastrophic illness earlier by using statistical models trained on clinical data is now well-established. We review evidence that supports the role of continuous cardiorespiratory monitoring in these predictive analytics monitoring tools. In particular, we review how continuous ECG monitoring reflects the patient and not the clinician, is less likely to be biased, is unaffected by changes in practice patterns, captures signatures of illnesses that are interpretable by clinicians, and is an underappreciated and underutilized source of detailed information for new mathematical methods to reveal.
- Published
- 2023
- Full Text
- View/download PDF
3. Explosion Discrimination Using Seismic Gradiometry and Spectrally Filtered Principal Components: Controlled Field Experiments
- Author
-
Cristian Challu, Christian Poppeliers, Predrag Punoševac, and Artur Dubrawski
- Subjects
Geophysics ,Geochemistry and Petrology - Abstract
Spectrally filtered principal component analysis (SFPCA) is a method we developed to discriminate between seismic source types. It is based on the well-known principal component analysis but applied to seismic gradiometric data. In this article, we build on our previous efforts by testing the method on data collected in a small-scale field experiment using two source types generated by manually striking the ground at various source–receiver distances (source type A) and orientations relative to the ground surface (source type B). Using the SFPCA method that we originally developed in Challu et al. (2021), we found that we can achieve good discrimination performance for a wide range of experimental geometries and noise conditions. In addition to testing the SFPCA method using a supervised learning approach, we present an SFPCA-based discrimination method using an anomaly detection paradigm. Specifically, given a population of event-specific data (e.g., source type A), we demonstrate that an event with source type B will fall outside the accepted population range of source type A. Thus, SFPCA may have value as a seismic discriminant in the form of an anomaly detector, which may be useful if a sufficient training dataset is not available.
- Published
- 2022
- Full Text
- View/download PDF
4. Analyzing the Performance of Bayesian Aggregation Under Erroneous Environmental Beliefs
- Author
-
Dan Howarth, James Kyle Miller, and Artur Dubrawski
- Subjects
Nuclear and High Energy Physics ,Nuclear Energy and Engineering ,Electrical and Electronic Engineering - Published
- 2022
- Full Text
- View/download PDF
5. Forecasting imminent atrial fibrillation in long-term ECG recordings
- Author
-
Sydney Rooney, Roman Kaufman, Salah Al-Zaiti, Artur Dubrawski, Gilles Clermont, and J. Kyle Miller
- Subjects
Cardiology and Cardiovascular Medicine - Published
- 2023
- Full Text
- View/download PDF
6. Incorporation of machine learning and signal quality indicators can significantly suppress false respiratory alerts during in-hospital bedside monitoring
- Author
-
Vedant Sanil, Artur Dubrawski, Gus Welter, Kyle Miller, Joo Heung Yoon, Theodore Lagattuta, Michael R. Pinsky, Marilyn Hravnak, Gilles Clermont, and Salah Al-Zaiti
- Subjects
Cardiology and Cardiovascular Medicine - Published
- 2023
- Full Text
- View/download PDF
7. Empirical genomic methods for tracking plasmid spread among healthcare-associated bacteria
- Author
-
Daniel Evans, Alexander Sundermann, Marissa Griffith, Vatsala Srinivasa, Mustapha Mustapha, Jieshi Chen, Artur Dubrawski, Vaughn Cooper, Lee Harrison, and Daria Van Tyne
- Abstract
SummaryBackgroundHealthcare-associated bacterial pathogens frequently carry plasmids that contribute to antibiotic resistance and virulence. The horizontal transfer of plasmids in healthcare settings has been previously documented, but genomic and epidemiologic methods to study this phenomenon remain underdeveloped. The objectives of this study were to develop a method to systematically resolve and track plasmids circulating in a single hospital, and to identify epidemiologic links that indicated likely horizontal plasmid transfer.MethodsWe derived empirical thresholds of plasmid sequence similarity from comparisons of plasmids carried by bacterial isolates infecting individual patients over time, or involved in hospital outbreaks. We then applied those metrics to perform a systematic screen of 3,074 genomes of nosocomial bacterial isolates from a single hospital for the presence of 89 plasmids. We also collected and reviewed data from electronic health records for evidence of geotemporal associations between patients infected with bacteria encoding plasmids of interest.FindingsOur analyses determined that 95% of analyzed genomes maintained roughly 95% of their plasmid genetic content at a nucleotide identity at least 99·985%. Applying these similarity thresholds to identify horizontal plasmid transfer identified 45 plasmids circulating among clinical isolates. Ten plasmids met criteria for geotemporal links associated with horizontal transfer. Several plasmids with shared backbones also encoded different additional mobile genetic element content, and these elements were variably present among the sampled clinical isolate genomes.InterpretationThe horizontal transfer of plasmids among nosocomial bacterial pathogens is frequent within hospitals and can be monitored with whole genome sequencing and comparative genomics approaches. These approaches should incorporate both nucleotide identity and reference sequence coverage to study the dynamics of plasmid transfer in the hospital.FundingThis research was supported by the US National Institute of Allergy and Infectious Disease (NIAID) and the University of Pittsburgh School of Medicine.RESEARCH IN CONTEXTEvidence before this studyA search of PubMed for research articles containing the search terms “plasmid”, “transfer”, “epidemiology”, “hospital”, and “patients” identified 115 peer-reviewed manuscripts published before 01 January 2022. Twenty-four manuscripts documented the dissemination of one or more plasmids by horizontal transfer in a hospital setting. Most of these prior studies focused on a single plasmid, outbreak, antibiotic resistance gene or pathogen species, and none established an a priori approach to identify plasmids circulating among non-clonal bacterial genomes. While prior studies have quantified plasmid preservation and nucleotide identity, similarity thresholds to infer horizontal transfer were neither uniform across studies nor systematically derived from empirical data.Added value of this studyThis study advances the field of genomic epidemiology by proposing and demonstrating the utility of empirically derived thresholds of plasmid sequence similarity for inferring horizontal transfer in healthcare settings. It also advances the field by tracking horizontal plasmid transfer within a single hospital at a hitherto unprecedented scale, examining the evidence of horizontal transfer of 89 plasmids among thousands of clinical bacterial isolates sampled from a single medical center. Our systematic review of patient healthcare data related to horizontal transfer also occurred at a breadth not previously undertaken in hospital epidemiology.Implications of all the available evidenceWhen successfully integrated into contemporary methods for surveillance of nosocomial pathogens, comparative genomics can be used to track and intervene directly against the dissemination of plasmids that exacerbate virulence and antimicrobial resistance in healthcare-associated bacterial infections. Standardized thresholds of plasmid identity benefit epidemiologic investigations of horizontal transfer similar to those offered by establishing uniform thresholds of genome identity for investigations of bacterial transmission.
- Published
- 2022
- Full Text
- View/download PDF
8. Explosion Discrimination Using Seismic Gradiometry and Spectral Filtering of Data
- Author
-
Cristian Challu, Artur Dubrawski, Predrag Punoševac, and Christian Poppeliers
- Subjects
Geophysics ,010504 meteorology & atmospheric sciences ,Geochemistry and Petrology ,Spectral filtering ,010502 geochemistry & geophysics ,01 natural sciences ,Geology ,0105 earth and related environmental sciences ,Remote sensing - Abstract
We present a new method to discriminate between earthquakes and buried explosions using observed seismic data. The method is different from previous seismic discrimination algorithms in two main ways. First, we use seismic spatial gradients, as well as the wave attributes estimated from them (referred to as gradiometric attributes), rather than the conventional three-component seismograms recorded on a distributed array. The primary advantage of this is that a gradiometer is only a fraction of a wavelength in aperture compared with a conventional seismic array or network. Second, we use the gradiometric attributes as input data into a machine learning algorithm. The resulting discrimination algorithm uses the norms of truncated principal components obtained from the gradiometric data to distinguish the two classes of seismic events. Using high-fidelity synthetic data, we show that the data and gradiometric attributes recorded by a single seismic gradiometer performs as well as a conventional distributed array at the event type discrimination task.
- Published
- 2021
- Full Text
- View/download PDF
9. Identification of patients with stable coronary artery disease who benefit from ACE inhibitors using Cox mixture model for heterogeneous treatment effects
- Author
-
Van Le, Chirag Nagpal, and Artur Dubrawski
- Subjects
Critical Care and Intensive Care Medicine - Published
- 2023
- Full Text
- View/download PDF
10. Gamma-Ray Source Detection Under Occlusions and Position Errors in Cluttered Urban Scenes
- Author
-
Dan Howarth, Artur Dubrawski, Kyle Miller, Ian Fawaz, and Jack H. Good
- Subjects
Nuclear and High Energy Physics ,business.industry ,Computer science ,Bayesian probability ,Synchronization ,Positioning technology ,Nuclear Energy and Engineering ,Robustness (computer science) ,Position (vector) ,Face (geometry) ,Global Positioning System ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Scale (map) - Abstract
Spatial aggregation using Bayesian aggregation (BA) is effective in combining multiple measurements to detect weak sources of gamma radiation in mobile source detection applications. To perform spatial aggregation of evidence, the position of the sensor must be estimated over time, in synchronization with gamma-ray measurements. Prevalent low-cost position estimation approaches often suffer from inaccuracies on a scale which can affect aggregation performance. Due to the presence of large buildings, positioning technology modalities such as GPS can show higher levels of error in urban environments. Additionally, urban environments can have highly varying structures, be crowded and dynamic, causing time-varying occlusions in the line-of-sight from the sensor to the source. Both occlusions and errors in sensor position estimation can degrade the source detection performance. We use asymptotic analysis to characterize the magnitude of this degradation. Natural maximum-likelihood and marginalization-based extensions to a BA framework are then used to improve robustness of aggregation in the face of these issues. The proposed approach shows substantially improved robustness to positioning errors and dynamic occlusions when compared to the baseline BA.
- Published
- 2020
- Full Text
- View/download PDF
11. Parsimony of Hemodynamic Monitoring Data Sufficient for the Detection of Hemorrhage
- Author
-
Anthony Wertz, Michael R. Pinsky, Gilles Clermont, and Artur Dubrawski
- Subjects
Data Analysis ,Cardiac output ,medicine.medical_specialty ,Mean arterial pressure ,Swine ,Hemodynamics ,Hemorrhage ,Article ,03 medical and health sciences ,0302 clinical medicine ,Text mining ,030202 anesthesiology ,Internal medicine ,Animals ,Medicine ,Arterial Pressure ,Cardiac Output ,Monitoring, Physiologic ,Receiver operating characteristic ,business.industry ,Hemodynamic Monitoring ,Bleed ,Anesthesiology and Pain Medicine ,Monitoring data ,Cardiology ,Detection performance ,Female ,business ,030217 neurology & neurosurgery - Abstract
BACKGROUND Individualized hemodynamic monitoring approaches are not well validated. Thus, we evaluated the discriminative performance improvement that might occur when moving from noninvasive monitoring (NIM) to invasive monitoring and with increasing levels of featurization associated with increasing sampling frequency and referencing to a stable baseline to identify bleeding during surgery in a porcine model. METHODS We collected physiologic waveform (WF) data (250 Hz) from NIM, central venous (CVC), arterial (ART), and pulmonary arterial (PAC) catheters, plus mixed venous O2 saturation and cardiac output from 38 anesthetized Yorkshire pigs bled at 20 mL/min until a mean arterial pressure of 30 mm Hg following a 30-minute baseline period. Prebleed physiologic data defined a personal stable baseline for each subject independently. Nested models were evaluated using simple hemodynamic metrics (SM) averaged over 20-second windows and sampled every minute, beat to beat (B2B), and WF using Random Forest Classification models to identify bleeding with or without normalization to personal stable baseline, using a leave-one-pig-out cross-validation to minimize model overfitting. Model hyperparameters were tuned to detect stable or bleeding states. Bleeding models were compared use both each subject's personal baseline and a grouped-average (universal) baseline. Timeliness of bleed onset detection was evaluated by comparing the tradeoff between a low false-positive rate (FPR) and shortest time to bleed detection. Predictive performance was evaluated using a variant of the receiver operating characteristic focusing on minimizing FPR and false-negative rates (FNR) for true-positive and true-negative rates, respectively. RESULTS In general, referencing models to a personal baseline resulted in better bleed detection performance for all catheters than using universal baselined data. Increasing granularity from SM to B2B and WF progressively improved bleeding detection. All invasive monitoring outperformed NIM for both time to bleeding detection and low FPR and FNR. In that regard, when referenced to personal baseline with SM analysis, PAC and ART + PAC performed best; for B2B CVC, PAC and ART + PAC performed best; and for WF PAC, CVC, ART + CVC, and ART + PAC performed equally well and better than other monitoring approaches. Without personal baseline, NIM performed poorly at all levels, while all catheters performed similarly for SM, with B2B PAC and ART + PAC performing the best, and for WF PAC, ART, ART + CVC, and ART + PAC performed equally well and better than the other monitoring approaches. CONCLUSIONS Increasing hemodynamic monitoring featurization by increasing sampling frequency and referencing to personal baseline markedly improves the ability of invasive monitoring to detect bleed.
- Published
- 2020
- Full Text
- View/download PDF
12. Discriminating Cognitive Disequilibrium and Flow in Problem Solving: A Semi-Supervised Approach Using Involuntary Dynamic Behavioral Signals
- Author
-
Artur Dubrawski, Lujie Chen, and Mononito Goswami
- Subjects
Facial expression ,21st century skills ,Computer science ,business.industry ,Supervised learning ,Psychological intervention ,Cognition ,General Medicine ,Affect (psychology) ,Coaching ,Task (project management) ,business ,Cognitive load ,Cognitive psychology - Abstract
Problem solving is one of the most important 21st century skills. However, effectively coaching young students in problem solving is challenging because teachers must continuously monitor their cognitive and affective states, and make real-time pedagogical interventions to maximize their learning outcomes. It is an even more challenging task in social environments with limited human coaching resources. To lessen the cognitive load on a teacher and enable affect-sensitive intelligent tutoring, many researchers have investigated automated cognitive and affective detection methods. However, most of the studies use culturally-sensitive indices of affect that are prone to social editing such as facial expressions, and only few studies have explored involuntary dynamic behavioral signals such as gross body movements. In addition, most current methods rely on expensive labelled data from trained annotators for supervised learning. In this paper, we explore a semi-supervised learning framework that can learn low-dimensional representations of involuntary dynamic behavioral signals (mainly gross-body movements) from a modest number of short time series segments. Experiments on a real-world dataset reveal a significant advantage of these representations in discriminating cognitive disequilibrium and flow, as compared to traditional complexity measures from dynamical systems literature, and demonstrate their potential in transferring learned models to previously unseen subjects.
- Published
- 2020
- Full Text
- View/download PDF
13. Weak Supervision for Affordable Modeling of Electrocardiogram Data
- Author
-
Mononito, Goswami, Benedikt, Boecking, and Artur, Dubrawski
- Subjects
Machine Learning ,Electrocardiography ,ComputingMethodologies_PATTERNRECOGNITION ,Heart Diseases ,Heart Rate ,Humans ,Articles - Abstract
Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, yet powerful way to diagnose heart disease. ECG studies using Machine Learning to automatically detect abnormal heartbeats so far depend on large, manually annotated datasets. While collecting vast amounts of unlabeled data can be straightforward, the point-by-point annotation of abnormal heartbeats is tedious and expensive. We explore the use of multiple weak supervision sources to learn diagnostic models of abnormal heartbeats via human designed heuristics, without using ground truth labels on individual data points. Our work is among the first to define weak supervision sources directly on time series data. Results show that with as few as six intuitive time series heuristics, we are able to infer high quality probabilistic label estimates for over 100,000 heartbeats with little human effort, and use the estimated labels to train competitive classifiers evaluated on held out test data.
- Published
- 2022
14. Using Machine Learning to Support Transfer of Best Practices in Healthcare
- Author
-
Sebastian, Caldas, Jieshi, Chen, and Artur, Dubrawski
- Subjects
Machine Learning ,Humans ,Health Facilities ,Articles ,Delivery of Health Care - Abstract
The adoption of best practices has been shown to increase performance in healthcare institutions and is consistently demanded by both patients, payers, and external overseers. Nevertheless, transferring practices between healthcare organizations is a challenging and underexplored task. In this paper, we take a step towards enabling the transfer of best practices by identifying the likely beneficial opportunities for such transfer. Specifically, we analyze the output of machine learning models trained at different organizations with the aims of (i) detecting the opportunity for the transfer of best practices, and (ii) providing a stop-gap solution while the actual transfer process takes place. We show the benefits ofthis methodology on a dataset ofmedical inpatient claims, demonstrating our abilityto identify practice gaps and to support the transfer processes that address these gaps.
- Published
- 2022
15. Whole-genome sequencing surveillance and machine learning for healthcare outbreak detection and investigation: A systematic review and summary
- Author
-
Alexander J. Sundermann, Jieshi Chen, James K. Miller, Elise M. Martin, Graham M. Snyder, Daria Van Tyne, Jane W. Marsh, Artur Dubrawski, and Lee H. Harrison
- Abstract
Background: Whole-genome sequencing (WGS) has traditionally been used in infection prevention to confirm or refute the presence of an outbreak after it has occurred. Due to decreasing costs of WGS, an increasing number of institutions have been utilizing WGS-based surveillance. Additionally, machine learning or statistical modeling to supplement infection prevention practice have also been used. We systematically reviewed the use of WGS surveillance and machine learning to detect and investigate outbreaks in healthcare settings. Methods: We performed a PubMed search using separate terms for WGS surveillance and/or machine-learning technologies for infection prevention through March 15, 2021. Results: Of 767 studies returned using the WGS search terms, 42 articles were included for review. Only 2 studies (4.8%) were performed in real time, and 39 (92.9%) studied only 1 pathogen. Nearly all studies (n = 41, 97.6%) found genetic relatedness between some isolates collected. Across all studies, 525 outbreaks were detected among 2,837 related isolates (average, 5.4 isolates per outbreak). Also, 35 studies (83.3%) only utilized geotemporal clustering to identify outbreak transmission routes. Of 21 studies identified using the machine-learning search terms, 4 were included for review. In each study, machine learning aided outbreak investigations by complementing methods to gather epidemiologic data and automating identification of transmission pathways. Conclusions: WGS surveillance is an emerging method that can enhance outbreak detection. Machine learning has the potential to identify novel routes of pathogen transmission. Broader incorporation of WGS surveillance into infection prevention practice has the potential to transform the detection and control of healthcare outbreaks.
- Published
- 2022
- Full Text
- View/download PDF
16. Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation
- Author
-
Artur Dubrawski, Vincent Jeanselme, and Benedikt Boecking
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Applied Mathematics ,Machine Learning (stat.ML) ,Computer Science Applications ,Machine Learning (cs.LG) - Abstract
Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data.
- Published
- 2022
- Full Text
- View/download PDF
17. INTERPRETABLE TREATMENT PRIORITIZATION RULE DEFINES DIABETIC PATIENTS THAT BENEFIT FROM PROMPT CORONARY REVASCULARIZATION
- Author
-
Chirag Nagpal and Artur Dubrawski
- Subjects
Cardiology and Cardiovascular Medicine - Published
- 2023
- Full Text
- View/download PDF
18. Actionable Model-Centric Explanations (Student Abstract)
- Author
-
Cecilia G. Morales, Nicholas Gisolfi, Robert Edman, James K. Milller, and Artur Dubrawski
- Subjects
General Medicine - Abstract
We recommend using a model-centric, Boolean Satisfiability (SAT) formalism to obtain useful explanations of trained model behavior, different and complementary to what can be gleaned from LIME and SHAP, popular data-centric explanation tools in Artificial Intelligence (AI).We compare and contrast these methods, and show that data-centric methods may yield brittle explanations of limited practical utility.The model-centric framework, however, can offer actionable insights into risks of using AI models in practice. For critical applications of AI, split-second decision making is best informed by robust explanations that are invariant to properties of data, the capability offered by model-centric frameworks.
- Published
- 2022
- Full Text
- View/download PDF
19. Forecasting emergence of COVID-19 variants of concern
- Author
-
James Kyle Miller, Kimberly Elenberg, and Artur Dubrawski
- Subjects
Multidisciplinary ,Genetics, Population ,SARS-CoV-2 ,COVID-19 ,Humans ,Epidemiological Models ,Genetic Fitness ,Pandemics ,Forecasting - Abstract
We consider whether one can forecast the emergence of variants of concern in the SARS-CoV-2 outbreak and similar pandemics. We explore methods of population genetics and identify key relevant principles in both deterministic and stochastic models of spread of infectious disease. Finally, we demonstrate that fitness variation, defined as a trait for which an increase in its value is associated with an increase in net Darwinian fitness if the value of other traits are held constant, is a strong indicator of imminent transition in the viral population.
- Published
- 2021
20. Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data With Competing Risks
- Author
-
Artur Dubrawski, Xinyu Li, and Chirag Nagpal
- Subjects
Risk ,Models, Statistical ,Computer science ,Competing risks ,Censoring (statistics) ,Survival Analysis ,Regression ,Computer Science Applications ,Health Information Management ,Covariate ,Parametric estimation ,Econometrics ,Hazard model ,Humans ,Electrical and Electronic Engineering ,Feature learning ,Biotechnology ,Parametric statistics ,Proportional Hazards Models - Abstract
We describe a new approach to estimating relative risks in time-to-event prediction problems with censored data in a fully parametric manner. Our approach does not require making strong assumptions of constant proportional hazards of the underlying survival distribution, as required by the Cox-proportional hazard model. By jointly learning deep nonlinear representations of the input covariates, we demonstrate the benefits of our approach when used to estimate survival risks through extensive experimentation on multiple real world datasets with different levels of censoring. We further demonstrate advantages of our model in the competing risks scenario. To the best of our knowledge, this is the first work involving fully parametric estimation of survival times with competing risks in the presence of censoring.
- Published
- 2021
21. Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx
- Author
-
Kin G. Olivares, Cristian Challu, Grzegorz Marcjasz, Rafał Weron, and Artur Dubrawski
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Business and International Management ,Machine Learning (cs.LG) - Abstract
We extend the neural basis expansion analysis (NBEATS) to incorporate exogenous factors. The resulting method, called NBEATSx, improves on a well performing deep learning model, extending its capabilities by including exogenous variables and allowing it to integrate multiple sources of useful information. To showcase the utility of the NBEATSx model, we conduct a comprehensive study of its application to electricity price forecasting (EPF) tasks across a broad range of years and markets. We observe state-of-the-art performance, significantly improving the forecast accuracy by nearly 20% over the original NBEATS model, and by up to 5% over other well established statistical and machine learning methods specialized for these tasks. Additionally, the proposed neural network has an interpretable configuration that can structurally decompose time series, visualizing the relative impact of trend and seasonal components and revealing the modeled processes' interactions with exogenous factors. To assist related work we made the code available in https://github.com/cchallu/nbeatsx., Comment: 30 pages, 7 figures, 4 tables
- Published
- 2021
- Full Text
- View/download PDF
22. Outbreak of Pseudomonas aeruginosa Infections from a Contaminated Gastroscope Detected by Whole Genome Sequencing Surveillance
- Author
-
Graham M. Snyder, Alexander J. Sundermann, Vaughn S. Cooper, Artur Dubrawski, Daria Van Tyne, Vatsala R. Srinivasa, Kady Waggle, James K. Miller, Marissa P. Griffith, Ashley M Ayres, Kathleen A. Shutt, A. William Pasculle, Praveen Kumar, Mustapha M. Mustapha, Jieshi Chen, Jane W. Marsh, Chinelo Ezeonwuka, Lee H. Harrison, and Melissa Saul
- Subjects
0301 basic medicine ,Microbiology (medical) ,Healthcare associated infections ,medicine.medical_specialty ,030501 epidemiology ,medicine.disease_cause ,Disease Outbreaks ,03 medical and health sciences ,Pseudomonas aeruginosa Infections ,Electronic health record ,Internal medicine ,medicine ,Humans ,Pseudomonas Infections ,Online Only Articles ,Retrospective Studies ,Whole genome sequencing ,Cross Infection ,Whole Genome Sequencing ,Transmission (medicine) ,Pseudomonas aeruginosa ,business.industry ,Outbreak ,030104 developmental biology ,Infectious Diseases ,0305 other medical science ,business ,Blood stream ,Gastroscopes - Abstract
Background Traditional methods of outbreak investigations utilize reactive whole genome sequencing (WGS) to confirm or refute the outbreak. We have implemented WGS surveillance and a machine learning (ML) algorithm for the electronic health record (EHR) to retrospectively detect previously unidentified outbreaks and to determine the responsible transmission routes. Methods We performed WGS surveillance to identify and characterize clusters of genetically-related Pseudomonas aeruginosa infections during a 24-month period. ML of the EHR was used to identify potential transmission routes. A manual review of the EHR was performed by an infection preventionist to determine the most likely route and results were compared to the ML algorithm. Results We identified a cluster of 6 genetically related P. aeruginosa cases that occurred during a 7-month period. The ML algorithm identified gastroscopy as a potential transmission route for 4 of the 6 patients. Manual EHR review confirmed gastroscopy as the most likely route for 5 patients. This transmission route was confirmed by identification of a genetically-related P. aeruginosa incidentally cultured from a gastroscope used on 4of the 5 patients. Three infections, 2 of which were blood stream infections, could have been prevented if the ML algorithm had been running in real-time. Conclusions WGS surveillance combined with a ML algorithm of the EHR identified a previously undetected outbreak of gastroscope-associated P. aeruginosa infections. These results underscore the value of WGS surveillance and ML of the EHR for enhancing outbreak detection in hospitals and preventing serious infections.
- Published
- 2020
23. Prediction of Hypotension Events with Physiologic Vital Sign Signatures in The Intensive Care Unit
- Author
-
Michael R. Pinsky, Joo Heung Yoon, Gilles Clermont, Vincent Jeanselme, Marilyn Hravnak, and Artur Dubrawski
- Subjects
Male ,Artificial intelligence ,medicine.medical_specialty ,Vital signs ,Medical information ,Critical Care and Intensive Care Medicine ,Risk Assessment ,law.invention ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,030202 anesthesiology ,law ,Time windows ,Internal medicine ,Intensive care ,medicine ,Humans ,Precision Medicine ,Aged ,Monitoring, Physiologic ,Framingham Risk Score ,Receiver operating characteristic ,Vital Signs ,business.industry ,Research ,030208 emergency & critical care medicine ,Middle Aged ,Intensive care unit ,Highly sensitive ,Intensive Care Units ,ROC Curve ,Area Under Curve ,Cardiology ,Female ,Hypotension ,Prediction ,business - Abstract
Background Even brief hypotension is associated with increased morbidity and mortality. We developed a machine learning model to predict the initial hypotension event among intensive care unit (ICU) patients and designed an alert system for bedside implementation. Materials and methods From the Medical Information Mart for Intensive Care III (MIMIC-3) dataset, minute-by-minute vital signs were extracted. A hypotension event was defined as at least five measurements within a 10-min period of systolic blood pressure ≤ 90 mmHg and mean arterial pressure ≤ 60 mmHg. Using time series data from 30-min overlapping time windows, a random forest (RF) classifier was used to predict risk of hypotension every minute. Chronologically, the first half of extracted data was used to train the model, and the second half was used to validate the trained model. The model’s performance was measured with area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Hypotension alerts were generated using risk score time series, a stacked RF model. A lockout time were applied for real-life implementation. Results We identified 1307 subjects (1580 ICU stays) as the hypotension group and 1619 subjects (2279 ICU stays) as the non-hypotension group. The RF model showed AUROC of 0.93 and 0.88 at 15 and 60 min, respectively, before hypotension, and AUPRC of 0.77 at 60 min before. Risk score trajectories revealed 80% and > 60% of hypotension predicted at 15 and 60 min before the hypotension, respectively. The stacked model with 15-min lockout produced on average 0.79 alerts/subject/hour (sensitivity 92.4%). Conclusion Clinically significant hypotension events in the ICU can be predicted at least 1 h before the initial hypotension episode. With a highly sensitive and reliable practical alert system, a vast majority of future hypotension could be captured, suggesting potential real-life utility.
- Published
- 2020
- Full Text
- View/download PDF
24. A Case for Federated Learning: Enabling and Leveraging Inter-Hospital Collaboration
- Author
-
Vincent Jeanselme, S. Caldas, Artur Dubrawski, Michael R. Pinsky, and Gilles Clermont
- Subjects
World Wide Web ,Computer science ,Federated learning - Published
- 2020
- Full Text
- View/download PDF
25. High Resolution Diffuse Optical Tomography using Short Range Indirect Subsurface Imaging
- Author
-
Ashutosh Sabharwal, Akash Kumar Maity, Artur Dubrawski, Srinivasa G. Narasimhan, and Chao Liu
- Subjects
Materials science ,medicine.diagnostic_test ,Scattering ,business.industry ,Physics::Medical Physics ,Resolution (electron density) ,020207 software engineering ,02 engineering and technology ,01 natural sciences ,Diffuse optical imaging ,Light scattering ,Convolution ,010309 optics ,Computational photography ,Optics ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Radiative transfer ,medicine ,Optical tomography ,business - Abstract
Diffuse optical tomography (DOT) is an approach to recover subsurface structures beneath the skin by measuring light propagation beneath the surface. The method is based on optimizing the difference between the images collected and a forward model that accurately represents diffuse photon propagation within a heterogeneous scattering medium. However, to date, most works have used a few source-detector pairs and recover the medium at only a very low resolution. And increasing the resolution requires prohibitive computations/storage. In this work, we present a fast imaging and algorithm for high resolution diffuse optical tomography with a line imaging and illumination system. Key to our approach is a convolution approximation of the forward heterogeneous scattering model that can be inverted to produce deeper than ever before structured beneath the surface. We show that our proposed method can detect reasonably accurate boundaries and relative depth of heterogeneous structures up to a depth of 8 mm below highly scattering medium such as milk. This work can extend the potential of DOT to recover more intricate structures (vessels, tissue, tumors, etc.) beneath the skin for diagnosing many dermatological and cardio-vascular conditions.
- Published
- 2020
- Full Text
- View/download PDF
26. Machine Learning for the Developing World
- Author
-
Artur Dubrawski, William Herlands, Daniel B. Neill, and Maria De-Arteaga
- Subjects
General Computer Science ,business.industry ,Computer science ,Best practice ,Developing country ,02 engineering and technology ,Machine learning ,computer.software_genre ,Field (computer science) ,Management Information Systems ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Research questions ,Artificial intelligence ,business ,International development ,computer - Abstract
Researchers from across the social and computer sciences are increasingly using machine learning to study and address global development challenges. This article examines the burgeoning field of machine learning for the developing world (ML4D). First, we present a review of prominent literature. Next, we suggest best practices drawn from the literature for ensuring that ML4D projects are relevant to the advancement of development objectives. Finally, we discuss how developing world challenges can motivate the design of novel machine learning methodologies. This article provides insights into systematic differences between ML4D and more traditional machine learning applications. It also discusses how technical complications of ML4D can be treated as novel research questions, how ML4D can motivate new research directions, and where machine learning can be most useful.
- Published
- 2018
- Full Text
- View/download PDF
27. Quantifying the Relationship between Large Public Events and Escort Advertising Behavior
- Author
-
Kyle Miller, Emily J. Kennedy, Benedikt Boecking, and Artur Dubrawski
- Subjects
Sociology and Political Science ,Injury control ,Accident prevention ,Computer science ,05 social sciences ,Poison control ,Transportation ,Advertising ,03 medical and health sciences ,0302 clinical medicine ,Anthropology ,0502 economics and business ,Anomaly detection ,030212 general & internal medicine ,Law ,ComputingMilieux_MISCELLANEOUS ,050212 sport, leisure & tourism ,Demography - Abstract
We study online escort advertisement responses to large scale public events using a time series anomaly detection framework. We analyze advertisement volume, approximations of advertiser vo...
- Published
- 2018
- Full Text
- View/download PDF
28. Gamma-Ray Source Detection With Small Sensors
- Author
-
Artur Dubrawski and Kyle Miller
- Subjects
Nuclear and High Energy Physics ,010308 nuclear & particles physics ,Computer science ,business.industry ,Detector ,Gamma ray ,Radiation ,01 natural sciences ,Upper and lower bounds ,010104 statistics & probability ,Nuclear Energy and Engineering ,Relative utility ,0103 physical sciences ,Key (cryptography) ,0101 mathematics ,Electrical and Electronic Engineering ,Photonics ,business ,Algorithm - Abstract
Large detectors can give better background characterization and can detect radiation sources at larger standoff distances. Small detectors, on the other hand, are less expensive, can often get closer to source materials, and can access places that large detectors cannot (e.g., indoor environments). We systematically quantify the impact of detector size and number on source detection in area search applications. We analyze theoretical upper bounds on source detectability and establish first-order approximations thereof. We demonstrate that these approximations give good comparisons of different choices of sensor size and number for practical source detection algorithms using semisynthetic data. Our results indicate that multiple small detectors, in conjunction, can offer superior overall operational utility compared with a single large detector. Finally, we identify differences in detector response and likelihood of a close approach as key determinants of the relative utility of small sensors.
- Published
- 2018
- Full Text
- View/download PDF
29. Sex differences in post cardiac arrest discharge locations
- Author
-
Vincent Jeanselme, Maria De-Arteaga, Jonathan Elmer, Sarah M. Perman, and Artur Dubrawski
- Subjects
RC581-951 ,Bias ,Specialties of internal medicine ,Short Paper ,Sex ,Cardiac arrest ,Earth-Surface Processes - Abstract
Background: We explored sex-based differences in discharge location after resuscitation from cardiac arrest. Methods: We performed a single-center retrospective cohort study including patients hospitalized after resuscitation from cardiac arrest from January 2010 to May 2020. We identified patients from a prospective registry, from which we extracted standard demographic and clinical variables. We explored favorable discharge location, defined as discharge to home or acute rehabilitation for survivors to hospital discharge. We tested the association of sex with the residuals of a multivariable logistic regression built using bidirectional selection to control for clinically relevant covariates. Results: We included 2,278 patients. Mean age was 59 (SD 16), 40% were women, and 77% were admitted after out-of-hospital cardiac arrest. A total of 970 patients (43%) survived to discharge; of those, 607 (63% of survivors) had a favorable discharge location. Female sex showed a weak independent association with unfavorable discharge location (adjusted OR 0.94 (95%CI 0.89–0.99)). Conclusions: Our results suggest a possible sex-based disparity in discharge location after cardiac arrest.
- Published
- 2021
- Full Text
- View/download PDF
30. Risk for Cardiorespiratory Instability Following Transfer to a Monitored Step-Down Unit
- Author
-
Michael R. Pinsky, Lujie Chen, Gilles Clermont, Eliezer Bose, Marilyn Hravnak, Dianxu Ren, Artur Dubrawski, and Leslie A. Hoffman
- Subjects
Adult ,Male ,Patient Transfer ,Pulmonary and Respiratory Medicine ,endocrine system ,medicine.medical_specialty ,Pediatrics ,Respiratory rate ,Hospitalized patients ,030204 cardiovascular system & hematology ,Critical Care and Intensive Care Medicine ,03 medical and health sciences ,0302 clinical medicine ,Pulmonary Heart Disease ,Risk Factors ,Internal medicine ,Transfer (computing) ,medicine ,Humans ,Time to onset ,Aged ,Monitoring, Physiologic ,Original Research ,Physiologic monitoring ,business.industry ,030208 emergency & critical care medicine ,Cardiorespiratory fitness ,General Medicine ,Length of Stay ,Middle Aged ,Cardiology ,Female ,Respiratory Insufficiency ,business ,Hospital Units ,Hospital stay - Abstract
BACKGROUND: Hospitalized patients who develop at least one instance of cardiorespiratory instability (CRI) have poorer outcomes. We sought to describe the admission characteristics, drivers, and time to onset of initial CRI events in monitored step-down unit (SDU) patients. METHODS: Admission characteristics and continuous monitoring data (frequency 1/20 Hz) were recorded in 307 subjects. Vital sign deviations beyond local instability trigger threshold criteria, with a tolerance of 40 s and cumulative duration of 4 of 5 min, were classified as CRI events. The CRI driver was defined as the first vital sign to cross a threshold and meet persistence criteria. Time to onset of initial CRI was the number of days from SDU admission to initial CRI, and duration was length of the initial CRI epoch. RESULTS: Subjects transferred to the SDU from units with higher monitoring capability were more likely to develop CRI (CRI n = 133 [44%] vs no CRI n = 174 [31%] P = .042). Time to onset varied according to the CRI driver. Subjects with at least one CRI event had a longer hospital stay (CRI 11.3 ± 10.2 d vs no CRI 7.8 ± 9.2 d, P < .001) and SDU stay (CRI 6.1 ± 4.9 d vs no CRI 3.5 ± 2.9 d, P < .001). First events were more often due to S(pO(2)), whereas breathing frequency was the most common driver of all CRI. CONCLUSIONS: Initial CRI most commonly occurred due to S(pO(2)) and was associated with prolonged SDU and hospital stay. Findings suggest the need for clinicians to more closely monitor SDU patients transferred from an ICU and parameters (S(pO(2)), breathing frequency) that more commonly precede CRI events.
- Published
- 2017
- Full Text
- View/download PDF
31. Robust Multi-View Representation Learning (Student Abstract)
- Author
-
Artur Dubrawski, James K. Miller, and Sibi Venkatesan
- Subjects
business.industry ,Computer science ,General Medicine ,Extension (predicate logic) ,Machine learning ,computer.software_genre ,Autoencoder ,Simple (abstract algebra) ,Artificial intelligence ,business ,Canonical correlation ,Heuristics ,computer ,Feature learning - Abstract
Multi-view data has become ubiquitous, especially with multi-sensor systems like self-driving cars or medical patient-side monitors. We propose two methods to approach robust multi-view representation learning with the aim of leveraging local relationships between views.The first is an extension of Canonical Correlation Analysis (CCA) where we consider multiple one-vs-rest CCA problems, one for each view. We use a group-sparsity penalty to encourage finding local relationships. The second method is a straightforward extension of a multi-view AutoEncoder with view-level drop-out.We demonstrate the effectiveness of these methods in simple synthetic experiments. We also describe heuristics and extensions to improve and/or expand on these methods.
- Published
- 2020
- Full Text
- View/download PDF
32. Modeling Involuntary Dynamic Behaviors to Support Intelligent Tutoring (Student Abstract)
- Author
-
Chufan Gao, Lujie Chen, Artur Dubrawski, and Mononito Goswami
- Subjects
Facial expression ,21st century skills ,Computer science ,business.industry ,Supervised learning ,Psychological intervention ,Cognition ,General Medicine ,Affect (psychology) ,Coaching ,Task (project management) ,business ,Cognitive load ,Cognitive psychology - Abstract
Problem solving is one of the most important 21st century skills. However, effectively coaching young students in problem solving is challenging because teachers must continuously monitor their cognitive and affective states and make real-time pedagogical interventions to maximize students' learning outcomes. It is an even more challenging task in social environments with limited human coaching resources. To lessen the cognitive load on a teacher and enable affect-sensitive intelligent tutoring, many researchers have investigated automated cognitive and affective detection methods. However, most of the studies use culturally-sensitive indices of affect that are prone to social editing such as facial expressions, and only few studies have explored involuntary dynamic behavioral signals such as gross body movements. In addition, most current methods rely on expensive labelled data from trained annotators for supervised learning. In this paper, we explore a semi-supervised learning framework that can learn low-dimensional representations of involuntary dynamic behavioral signals (mainly gross-body movements) from a modest number of short time series segments. Experiments on a real-world dataset reveal a significant utility of these representations in discriminating cognitive disequilibrium and flow and demonstrate their potential in transferring learned models to previously unseen subjects.
- Published
- 2020
- Full Text
- View/download PDF
33. Prognostication of Neurological Recovery by Analyzing Structural Breaks in EEG Data
- Author
-
Oliver Grothe, Jonathan Elmer, Artur Dubrawski, Jieshi Chen, and David Bethge
- Subjects
Resuscitation ,Multivariate statistics ,medicine.medical_specialty ,medicine.diagnostic_test ,business.industry ,Structural break ,030208 emergency & critical care medicine ,Electroencephalography ,Outcome (probability) ,03 medical and health sciences ,0302 clinical medicine ,Physical medicine and rehabilitation ,Eeg data ,Early prediction ,Hospital admission ,medicine ,business ,030217 neurology & neurosurgery - Abstract
We describe an approach for unsupervised, multivariate yet interpretable structural break testing of rich electroencephalographic (EEG) data time series to perform early prediction of patient outcome after resuscitation from cardiac arrest. Few models exist that reliably determine prognosis among comatose post-arrest patients within hours of hospital admission. We present an efficient method designed to detect anomalous patterns in streaming EEG data that combines scan statistics with multiple structural break tests. Some patterns of change show non-trivial power in prognosticating patient outcomes at clinically relevant prediction horizons. Empirical evaluation of the proposed method shows its potential utility in determining cardiac arrest patient outcomes earlier and more confidently than existing alternatives.
- Published
- 2019
- Full Text
- View/download PDF
34. Social-Affiliation Networks: Patterns and the SOAR Model
- Author
-
Dhivya Eswaran, Christos Faloutsos, Reihaneh Rabbany, and Artur Dubrawski
- Subjects
Structure (mathematical logic) ,Theoretical computer science ,Recursion ,Relation (database) ,Computer science ,Context (language use) ,02 engineering and technology ,020204 information systems ,Friendship graph ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,020201 artificial intelligence & image processing ,Soar ,Generator (mathematics) - Abstract
Given a social-affiliation network – a friendship graph where users have many, binary attributes e.g., check-ins, page likes or group memberships – what rules do its structural properties such as edge or triangle counts follow, in relation to its attributes? More challengingly, how can we synthetically generate networks which provably satisfy those rules or patterns? Our work attempts to answer these closely-related questions in the context of the increasingly prevalent social-affiliation graphs. Our contributions are two-fold: (a) Patterns: we discover three new rules (power laws) in the properties of attribute-induced subgraphs, substructures which connect the friendship structure to affiliations; (b) Model: we propose SOAR– short for SOcial-Affiliation graphs via Recursion– a stochastic model based on recursion and self-similarity, to provably generate graphs obeying the observed patterns. Experiments show that: (i) the discovered rules are useful in detecting deviations as anomalies and (ii) SOAR is fast and scales linearly with network size, producing graphs with millions of edges and attributes in only a few seconds. Code related to this paper is available at: www.github.com/dhivyaeswaran/soar.
- Published
- 2019
- Full Text
- View/download PDF
35. The critical care data exchange format: a proposed flexible data standard for combining clinical and high-frequency physiologic data in critical care
- Author
-
Gilles Clermont, Douglas E. Lake, Joo Heung Yoon, Amanda E Zimmet, Michael R. Pinsky, David M. Maslove, Ryan Bobko, Alexander Hamilton, J. Randall Moorman, Anthony Wertz, Philip Laird, Gus Welter, and Artur Dubrawski
- Subjects
Critical Care ,Physiology ,Computer science ,0206 medical engineering ,Interoperability ,Biomedical Engineering ,Biophysics ,02 engineering and technology ,Hierarchical Data Format ,Hierarchical database model ,03 medical and health sciences ,0302 clinical medicine ,Physiology (medical) ,Humans ,Data collection ,LOINC ,Genomics ,computer.file_format ,020601 biomedical engineering ,Data science ,Data sharing ,Data Standard ,Intensive Care Units ,Data exchange ,computer ,030217 neurology & neurosurgery - Abstract
Objective.To develop a standardized format for exchanging clinical and physiologic data generated in the intensive care unit. Our goal was to develop a format that would accommodate the data collection pipelines of various sites but would not require dataset-specific schemas or ad-hoc tools for decoding and analysis.Approach.A number of centers had independently developed solutions for storing clinical and physiologic data using Hierarchical Data Format-Version 5 (HDF5), a well-supported standard already in use in multiple other fields. These individual solutions involved design choices that made the data difficult to share despite the underlying common framework. A collaborative process was used to form the basis of a proposed standard that would allow for interoperability and data sharing with common analysis tools.Main Results.We developed the HDF5-based critical care data exchange format which stores multiparametric data in an efficient, self-describing, hierarchical structure and supports real-time streaming and compression. In addition to cardiorespiratory and laboratory data, the format can, in future, accommodate other large datasets such as imaging and genomics. We demonstated the feasibility of a standardized format by converting data from three sites as well as the MIMIC III dataset.Significance.Individual approaches to the storage of multiparametric clinical data are proliferating, representing both a duplication of effort and a missed opportunity for collaboration. Adoption of a standardized format for clinical data exchange will enable the development of a digital biobank, facilitate the external validation of machine learning models and be a powerful tool for sharing multiparametric, high frequency patient level data in multisite clinical trials. Our proposed solution focuses on supporting standardized ontologies such as LOINC allowing easy reading of data regardless of the source and in so doing provides a useful method to integrate large amounts of existing data.
- Published
- 2021
- Full Text
- View/download PDF
36. 772: PREDICTING DELAYED CEREBRAL ISCHEMIA USING SEQUENTIAL PATTERNS IN PLETHYSMOGRAPHY DATA
- Author
-
Xinyu Li, Marilyn Hravnak, Michael R. Pinsky, and Artur Dubrawski
- Subjects
medicine.medical_specialty ,business.industry ,Internal medicine ,Ischemia ,medicine ,Cardiology ,Plethysmograph ,Critical Care and Intensive Care Medicine ,medicine.disease ,business - Published
- 2020
- Full Text
- View/download PDF
37. Interactive Linear Regression with Pairwise Comparisons
- Author
-
Aarti Singh, Sivaraman Balakrishnan, Artur Dubrawski, and Yichong Xu
- Subjects
Computer science ,business.industry ,Estimator ,0102 computer and information sciences ,010501 environmental sciences ,Machine learning ,computer.software_genre ,Minimax ,01 natural sciences ,Interactive Learning ,010201 computation theory & mathematics ,Face (geometry) ,Linear regression ,Pairwise comparison ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences - Abstract
A general goal of interactive learning is to investigate broad ways of leveraging human feedback, and understand the benefits of learning from potentially complex feedback. We study a special case of linear regression with access to comparisons between pairs of samples. Learning from such queries is motivated by several important applications, where obtaining comparisons can be much easier than direct labels, and/or when comparisons can be more reliable. We develop an interactive algorithm that utilizes both labels and comparisons to obtain a linear estimator, and show that it only requires a very small amount of direct labels to achieve low error. We also give minimax lower bounds for the problem, showing that our algorithm is optimal up to log factors. Finally, experiments show that our algorithm outperforms label-only algorithms when labels are scarce, and it can be practical for real-world applications.
- Published
- 2018
- Full Text
- View/download PDF
38. Active Search of Connections for Case Building and Combating Human Trafficking
- Author
-
Artur Dubrawski, David Bayani, and Reihaneh Rabbany
- Subjects
World Wide Web ,Focus (computing) ,Computer science ,Active learning (machine learning) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Human trafficking ,Relevance (information retrieval) ,02 engineering and technology ,16. Peace & justice - Abstract
How can we help an investigator to efficiently connect the dots and uncover the network of individuals involved in a criminal activity based on the evidence of their connections, such as visiting the same address, or transacting with the same bank account? We formulate this problem as Active Search of Connections, which finds target entities that share evidence of different types with a given lead, where their relevance to the case is queried interactively from the investigator. We present RedThread, an efficient solution for inferring related and relevant nodes while incorporating the user's feedback to guide the inference. Our experiments focus on case building for combating human trafficking, where the investigator follows leads to expose organized activities, i.e. different escort advertisements that are connected and possibly orchestrated. RedThread is a local algorithm and enables online case building when mining millions of ads posted in one of the largest classified advertising websites. The results of RedThread are interpretable, as they explain how the results are connected to the initial lead. We experimentally show that RedThread learns the importance of the different types and different pieces of evidence, while the former could be transferred between cases.
- Published
- 2018
- Full Text
- View/download PDF
39. Accelerated apprenticeship
- Author
-
Lujie Chen and Artur Dubrawski
- Subjects
Computer science ,Process (engineering) ,Operational Problem ,05 social sciences ,050301 education ,Data science ,050105 experimental psychology ,Graduate level ,Scale (social sciences) ,Scalability ,ComputingMilieux_COMPUTERSANDEDUCATION ,0501 psychology and cognitive sciences ,Apprenticeship ,0503 education - Abstract
It often takes years of hands-on practice to build operational problem solving skills for a data scientist to be sufficiently competent to tackle real world problems. In this research, we explore a new scalable technology-enhanced learning (TEL) platform that enables accelerated apprenticeship process via a repository of caselets - small but focused case studies with scaffolding questions and feedback. In this paper, we report rationales of the design, caselet authoring process, and the planned experiment with cohorts of students who will use caselets while taking graduate level data science courses.
- Published
- 2018
- Full Text
- View/download PDF
40. Automatic state discovery for unstructured audio scene classification
- Author
-
Artur Dubrawski, Sajid M. Siddiqi, Julian Ramos, Abhishek Sharma, and Geoffrey J. Gordon
- Subjects
business.industry ,Computer science ,Pattern recognition ,Overfitting ,Viterbi algorithm ,computer.software_genre ,Machine learning ,FOS: Psychology ,symbols.namesake ,Robustness (computer science) ,Expectation–maximization algorithm ,symbols ,Artificial intelligence ,170203 Knowledge Representation and Machine Learning ,Hidden Markov model ,business ,Audio signal processing ,computer - Abstract
In this paper we present a novel scheme for unstructured audio scene classification that possesses three highly desirable and powerful features: autonomy, scalability, and robustness. Our scheme is based on our recently introduced machine learning algorithm called Simultaneous Temporal And Contextual Splitting (STACS) that discovers the appropriate number of states and efficiently learns accurate Hidden Markov Model (HMM) parameters for the given data. STACS-based algorithms train HMMs up to five times faster than Baum-Welch, avoid the overfitting problem commonly encountered in learning large state-space HMMs using Expectation Maximization (EM) methods such as Baum-Welch, and achieve superior classification results on a very diverse dataset with minimal pre-processing. Furthermore, our scheme has proven to be highly effective for building real-world applications and has been integrated into a commercial surveillance system as an event detection component.
- Published
- 2018
- Full Text
- View/download PDF
41. Real alerts and artifact classification in archived multi-signal vital sign monitoring data: implications for mining big data
- Author
-
Eliezer Bose, Artur Dubrawski, Marilyn Hravnak, Michael R. Pinsky, Gilles Clermont, and Lujie Chen
- Subjects
Artifact (error) ,Data stream mining ,Computer science ,Feature extraction ,Health Informatics ,Critical Care and Intensive Care Medicine ,computer.software_genre ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Anesthesiology and Pain Medicine ,Knowledge extraction ,Test set ,030212 general & internal medicine ,Data mining ,computer ,030217 neurology & neurosurgery ,Block (data storage) ,Test data - Abstract
Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby “cleaning” such data for future modeling. 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data [heart rate (HR), respiratory rate (RR), peripheral arterial oxygen saturation (SpO2) at 1/20 Hz, and noninvasive oscillometric blood pressure (BP)]. Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. Block 1 yielded 812 VS events, with 214 (26 %) judged by experts as artifact (RR 43 %, SpO2 40 %, BP 15 %, HR 2 %). ML algorithms applied to the Block 1 training/cross-validation set (tenfold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO2. Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO2. ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building.
- Published
- 2015
- Full Text
- View/download PDF
42. Leveraging Publicly Available Data to Discern Patterns of Human-Trafficking Activity
- Author
-
Kyle Miller, Benedikt Boecking, Matt Barnes, Emily J. Kennedy, and Artur Dubrawski
- Subjects
World Wide Web ,Engineering ,Sociology and Political Science ,business.industry ,Anthropology ,Data analysis ,Transportation ,Human trafficking ,The Internet ,business ,Law ,Data science ,Demography - Abstract
We present a few data analysis methods that can be used to process advertisements for escort services available in public areas of the Internet. These data provide a readily available proxy evidence for modeling and discerning human-trafficking activity. We show how it can be used to identify advertisements that likely involve such activity. We demonstrate its utility in identifying and tracking entities in the Web-advertisement data even if strongly identifiable features are sparse. We also show a few possible ways to perform community- and population-level analyses including behavioral summaries stratified by various types of activity and detection of emerging trends and patterns.
- Published
- 2015
- Full Text
- View/download PDF
43. Semi-Supervised Prediction of Comorbid Rare Conditions Using Medical Claims Data
- Author
-
Michael R. Pinsky, Gilles Clermont, Kyle Miller, Tiffany Pellathy, Marilyn Hravnak, Chirag Nagpal, and Artur Dubrawski
- Subjects
Computer science ,business.industry ,02 engineering and technology ,Semi-supervised learning ,medicine.disease ,Machine learning ,computer.software_genre ,Comorbidity ,Medical insurance ,Task (project management) ,03 medical and health sciences ,0302 clinical medicine ,030228 respiratory system ,Economic indicator ,020204 information systems ,Claims data ,Health care ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Artificial intelligence ,Medical diagnosis ,business ,computer - Abstract
Medical insurance claims data offer a coarse view of a patient's medical profile, including information about previous diagnoses and procedures performed. These data have been exploited in the past to predict presence of unmanifested conditions. Rarer conditions however, provide an extremely limited amount of ground truth to train supervised models, but predicting relevant co-morbidities can help reduce failure to rescue from a treatable, yet potentially life threatening condition. In this paper, we aim at a formidable task of improving models built to predict comorbidity of rare conditions that emerge during hospitalization and present PreCoRC, a novel approach that leverages hierarchical structures of diagnosis and procedure codes to alleviate the relatively low prevalence of specific types of Failure to Rescue (FTR) incidents. It can be applied post-hoc over previously learnt predictive models, and used to discover parts of the underlying hierarchies that contribute to the task. Our experimental results demonstrate that PreCoRC carries promise for operational utility in clinical settings, and offer insights into potential leading indicators of life threatening complications.
- Published
- 2017
- Full Text
- View/download PDF
44. Learning to Extract Actionable Evidence from Medical Insurance Claims Data
- Author
-
Jieshi Chen and Artur Dubrawski
- Subjects
Actuarial science ,Claims data ,Business ,Medical insurance - Published
- 2017
- Full Text
- View/download PDF
45. Learning from learning curves
- Author
-
Lujie Chen and Artur Dubrawski
- Subjects
Population level ,business.industry ,Computer science ,Online tutoring ,Standardized test ,Machine learning ,computer.software_genre ,Data-driven ,Preliminary analysis ,Learning curve ,Unsupervised learning ,Artificial intelligence ,business ,Cluster analysis ,computer - Abstract
We propose a data driven method for decomposing population level learning curve models into mutually exclusive distinctive groups each consisting of similar learning trajectories. We validate this method on six knowledge components from the log data from an online tutoring system ASSIST-ment. Preliminary analysis reveals interpretable patterns of "skill growth" that correlate with students' performance in the subsequently administered state standardized tests.
- Published
- 2017
- Full Text
- View/download PDF
46. Scaling Active Search using Linear Similarity Functions
- Author
-
Jeff Schneider, Artur Dubrawski, James K. Miller, and Sibi Venkatesan
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Computer science ,Feature vector ,Scale (descriptive set theory) ,Machine Learning (stat.ML) ,02 engineering and technology ,Function (mathematics) ,16. Peace & justice ,Machine Learning (cs.LG) ,Computer Science - Learning ,020901 industrial engineering & automation ,Data point ,Similarity (network science) ,Statistics - Machine Learning ,Kernel (statistics) ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Adjacency matrix ,Algorithm - Abstract
Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing emphasis on the scalability of such techniques to handle very large and very complex datasets. In this paper, we consider the problem of Active Search where we are given a similarity function between data points. We look at an algorithm introduced by Wang et al. [2013] for Active Search over graphs and propose crucial modifications which allow it to scale significantly. Their approach selects points by minimizing an energy function over the graph induced by the similarity function on the data. Our modifications require the similarity function to be a dot-product between feature vectors of data points, equivalent to having a linear kernel for the adjacency matrix. With this, we are able to scale tremendously: for $n$ data points, the original algorithm runs in $O(n^2)$ time per iteration while ours runs in only $O(nr + r^2)$ given $r$-dimensional features. We also describe a simple alternate approach using a weighted-neighbor predictor which also scales well. In our experiments, we show that our method is competitive with existing semi-supervised approaches. We also briefly discuss conditions under which our algorithm performs well., Comment: To be published as conference paper at IJCAI 2017, 11 pages, 2 figures
- Published
- 2017
- Full Text
- View/download PDF
47. Beyond Assortativity: Proclivity Index for Attributed Networks (ProNe)
- Author
-
Reihaneh Rabbany, Artur Dubrawski, Christos Faloutsos, and Dhivya Eswaran
- Subjects
Theoretical computer science ,Computer science ,Assortativity ,Privacy protection ,Network size ,02 engineering and technology ,Homophily ,Heterophily ,Personalization ,Correlation ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Imputation (statistics) - Abstract
If Alice is majoring in Computer Science, can we guess the major of her friend Bob? Even harder, can we determine Bob’s age or sexual orientation? Attributed graphs are ubiquitous, occurring in a wide variety of domains; yet there is limited literature on the study of the interplay between the attributes associated to nodes and edges connecting them. Our work bridges this gap by addressing the following questions: Given the network structure, (i) which attributes and (ii) which pairs of attributes show correlation? Prior work has focused on the first part, under the name of assortativity (closely related to homophily). In this paper, we propose ProNe, the first measure to handle pairs of attributes (e.g., major and age). The proposed ProNe is (a) thorough, handling both homophily and heterophily (b) general, quantifying correlation of a single attribute or a pair of attributes (c) consistent, yielding a zero score in the absence of any structural correlation. Furthermore, ProNe can be computed fast in time linear in the network size and is highly useful, with applications in data imputation, marketing, personalization and privacy protection.
- Published
- 2017
- Full Text
- View/download PDF
48. Data-Driven Classification of Screwdriving Operations
- Author
-
Mathieu Guillame-Bert, Zhenzhong Jia, Artur Dubrawski, Matthew T. Mason, Reuben M. Aronson, David Alan Bourne, and Ankit Bhatia
- Subjects
0209 industrial biotechnology ,Computer science ,business.industry ,Control engineering ,02 engineering and technology ,Automation ,Data-driven ,020901 industrial engineering & automation ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Torque ,020201 artificial intelligence & image processing ,Electronics ,business - Abstract
Consumer electronic devices are made by the millions, and automating their production is a key manufacturing challenge. Fastening machine screws is among the most difficult components of this challenge. To accomplish this task with sufficient robustness for industry, detecting and recovering from failure is essential. We have built a robotic screwdriving system to collect data on this process. Using it, we collected data on 1862 screwdriving runs, each consisting of force, torque, motor current and speed, and video. Each run is also hand-labeled with the stages of screwdriving and the result of the run. We identify several distinct stages through which the system transitions and relate sequences of stages to characteristic failure modes. In addition, we explore several techniques for automatic result classification, including standard maximum angle/torque methods and machine learning time series techniques.
- Published
- 2017
- Full Text
- View/download PDF
49. Gleaning Knowledge from Data in the Intensive Care Unit
- Author
-
Michael R. Pinsky and Artur Dubrawski
- Subjects
Pulmonary and Respiratory Medicine ,medicine.medical_specialty ,Information Dissemination ,business.industry ,Big data ,Hemodynamics ,Stability (learning theory) ,Shock ,Cardiorespiratory fitness ,Disease ,Models, Theoretical ,Critical Care and Intensive Care Medicine ,Intensive care unit ,law.invention ,Task (project management) ,Intensive Care Units ,Identification (information) ,law ,medicine ,Humans ,Intensive care medicine ,business ,Set (psychology) ,Monitoring, Physiologic ,Critical Care Perspective - Abstract
It is often difficult to accurately predict when, why, and which patients develop shock, because signs of shock often occur late, once organ injury is already present. Three levels of aggregation of information can be used to aid the bedside clinician in this task: analysis of derived parameters of existing measured physiologic variables using simple bedside calculations (functional hemodynamic monitoring); prior physiologic data of similar subjects during periods of stability and disease to define quantitative metrics of level of severity; and libraries of responses across large and comprehensive collections of records of diverse subjects whose diagnosis, therapies, and course is already known to predict not only disease severity, but also the subsequent behavior of the subject if left untreated or treated with one of the many therapeutic options. The problem is in defining the minimal monitoring data set needed to initially identify those patients across all possible processes, and then specifically monitor their responses to targeted therapies known to improve outcome. To address these issues, multivariable models using machine learning data-driven classification techniques can be used to parsimoniously predict cardiorespiratory insufficiency. We briefly describe how these machine learning approaches are presently applied to address earlier identification of cardiorespiratory insufficiency and direct focused, patient-specific management.
- Published
- 2014
- Full Text
- View/download PDF
50. Machine learning of physiological waveforms and electronic health record data to predict, diagnose and treat haemodynamic instability in surgical patients: protocol for a retrospective study
- Author
-
Michael R. Pinsky, Kathirvel Subramaniam, Artur Dubrawski, Maxime Cannesson, Joseph Rinehart, Pierre Baldi, Christine Lee, and Ira Hofer
- Subjects
Decision Support Systems ,030204 cardiovascular system & hematology ,computer.software_genre ,California ,law.invention ,Anaesthesia ,Machine Learning ,surgery ,Postoperative Complications ,0302 clinical medicine ,030202 anesthesiology ,law ,Protocol ,Electronic Health Records ,Medicine ,blood pressure ,General Medicine ,Institutional review board ,Intensive care unit ,3. Good health ,Test (assessment) ,Intensive Care Units ,Research Design ,Public Health and Health Services ,Patient Safety ,User interface ,safety ,Clinical Sciences ,Machine learning ,Clinical decision support system ,7.3 Management and decision making ,Clinical ,03 medical and health sciences ,Clinical Research ,Intensive care ,Humans ,Retrospective Studies ,Protocol (science) ,haemodynamics ,Other Medical and Health Sciences ,business.industry ,Hemodynamics ,Retrospective cohort study ,Decision Support Systems, Clinical ,Good Health and Well Being ,physiology ,Management of diseases and conditions ,Artificial intelligence ,business ,computer - Abstract
IntroductionAbout 42 million surgeries are performed annually in the USA. While the postoperative mortality is less than 2%, 12% of all patients in the high-risk surgery group account for 80% of postoperative deaths. New onset of haemodynamic instability is common in surgical patients and its delayed treatment leads to increased morbidity and mortality. The goal of this proposal is to develop, validate and test real-time intraoperative risk prediction tools based on clinical data and high-fidelity physiological waveforms to predict haemodynamic instability during surgery.Methods and analysisWe will initiate our work using an existing annotated intraoperative database from the University of California Irvine, including clinical and high-fidelity waveform data. These data will be used for the training and development of the machine learning model (Carnegie Mellon University) that will then be tested on prospectively collected database (University of California Los Angeles). Simultaneously, we will use existing knowledge of haemodynamic instability patterns derived from our intensive care unit cohorts, medical information mart for intensive care II data, University of California Irvine data and animal studies to create smart alarms and graphical user interface for a clinical decision support. Using machine learning, we will extract a core dataset, which characterises the signatures of normal intraoperative variability, various haemodynamic instability aetiologies and variable responses to resuscitation. We will then employ clinician-driven iterative design to create a clinical decision support user interface, and evaluate its effect in simulated high-risk surgeries.Ethics and disseminationWe will publish the results in a peer-reviewed publication and will present this work at professional conferences for the anaesthesiology and computer science communities. Patient-level data will be made available within 6 months after publication of the primary manuscript. The study has been approved by University of California, Los Angeles Institutional review board. (IRB #19–0 00 354).
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.