16 results on '"Song, Hyebin"'
Search Results
2. Multivariate moment least-squares variance estimators for reversible Markov chains.
- Author
-
Song, Hyebin and Berg, Stephen
- Subjects
- *
MARKOV chain Monte Carlo , *MARKOV processes , *LEAST squares , *DISTRIBUTION (Probability theory) , *SAMPLE size (Statistics) - Abstract
AbstractMarkov chain Monte Carlo (MCMC) is a commonly used method for approximating expectations with respect to probability distributions. Uncertainty assessment for MCMC estimators is essential in practical applications. Moreover, for multivariate functions of a Markov chain, it is important to estimate not only the auto-correlation for each component but also to estimate cross-correlations, in order to better assess sample quality, improve estimates of effective sample size, and use more effective stopping rules. Berg and Song (2023) introduced the moment least squares (momentLS) estimator, a shape-constrained estimator for the autocovariance sequence from a reversible Markov chain, for univariate functions of the Markov chain. Based on this sequence estimator, they proposed an estimator of the asymptotic variance of the sample mean from MCMC samples. In this study, we propose novel autocovariance sequence and asymptotic variance estimators for Markov chain functions with multiple components, based on the univariate momentLS estimators from Berg and Song (2023). We demonstrate strong consistency of the proposed auto(cross)-covariance sequence and asymptotic variance matrix estimators. We conduct empirical comparisons of our method with other state-of-the-art approaches on simulated and real-data examples, using popular samplers including the random-walk Metropolis sampler and the No-U-Turn sampler from STAN. Supplemental materials for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Tyrosine Kinase Inhibitor Dosing Patterns in Elderly Patients With Chronic Myeloid Leukemia
- Author
-
Seo, Hee Yeon, Ko, Tae Hwa, Hyun, Shin Young, Song, Hyebin, Lim, Seung Taek, Shim, Kwang Yong, Lee, Jong In, and Kong, Jee Hyun
- Published
- 2019
- Full Text
- View/download PDF
4. Efficient shape-constrained inference for the autocovariance sequence from a reversible Markov chain
- Author
-
Berg, Stephen and Song, Hyebin
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,62G05, 60J05 (Primary) 60J22 (Secondary) ,FOS: Mathematics ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,Statistics - Methodology - Abstract
In this paper, we study the problem of estimating the autocovariance sequence resulting from a reversible Markov chain. A motivating application for studying this problem is the estimation of the asymptotic variance in central limit theorems for Markov chains. The asymptotic variance quantifies uncertainties in averages of the form $M^{-1}\sum_{t=0}^{M-1}g(X_t)$, where $X_0,X_1,...$ are iterates from a Markov chain. It is well known that the autocovariances from reversible Markov chains can be represented as the moments of a unique positive measure supported on $[-1,1]$. We propose a novel shape-constrained estimator of the autocovariance sequence. Our approach is based on the key observation that the representability of the autocovariance sequence as a moment sequence imposes certain shape constraints, which we can exploit in the estimation procedure. We examine the theoretical properties of the proposed estimator and provide strong consistency guarantees for our estimator. In particular, for reversible Markov chains satisfying a geometric drift condition, we show that our estimator is strongly consistent for the true autocovariance sequence with respect to an $\ell_2$ distance, and that our estimator leads to strongly consistent estimates of the asymptotic variance. Finally, we perform empirical studies to illustrate the theoretical properties of the proposed estimator as well as to demonstrate the effectiveness of our estimator in comparison with other current state-of-the-art methods for Markov chain Monte Carlo variance estimation, including batch means, spectral variance estimators, and the initial convex sequence estimator.
- Published
- 2022
5. Flexible Organic Photodetectors with Mechanically Robust Zinc Oxide Nanoparticle Thin Films.
- Author
-
Byeon, Huikyeong, Kim, Boyun, Hwang, Hyejee, Kim, Minji, Yoo, Hyeonjin, Song, Hyebin, Lee, Seoung Ho, and Lee, Byoung Hoon
- Published
- 2023
- Full Text
- View/download PDF
6. Inferring protein fitness landscapes from laboratory evolution experiments.
- Author
-
D'Costa, Sameer, Hinds, Emily C., Freschlin, Chase R., Song, Hyebin, and Romero, Philip A.
- Subjects
TETRAHYDROFOLATE dehydrogenase ,AMINO acid sequence ,SUPERVISED learning ,PROTEIN structure ,PROTEIN engineering ,STATISTICAL learning ,SEQUENCE spaces ,INTERNET servers - Abstract
Directed laboratory evolution applies iterative rounds of mutation and selection to explore the protein fitness landscape and provides rich information regarding the underlying relationships between protein sequence, structure, and function. Laboratory evolution data consist of protein sequences sampled from evolving populations over multiple generations and this data type does not fit into established supervised and unsupervised machine learning approaches. We develop a statistical learning framework that models the evolutionary process and can infer the protein fitness landscape from multiple snapshots along an evolutionary trajectory. We apply our modeling approach to dihydrofolate reductase (DHFR) laboratory evolution data and the resulting landscape parameters capture important aspects of DHFR structure and function. We use the resulting model to understand the structure of the fitness landscape and find numerous examples of epistasis but an overall global peak that is evolutionarily accessible from most starting sequences. Finally, we use the model to perform an in silico extrapolation of the DHFR laboratory evolution trajectory and computationally design proteins from future evolutionary rounds. Author summary: Laboratory evolution has revolutionized our understanding of protein structure, function, and evolution, and has generated countless useful proteins broad applications in medicine, biocatalysis, and biotechnology. These experiments explore protein sequence space through iterative rounds of mutation and selection and can provide rich data of populations traversing the fitness landscape. In this paper, we present a statistical learning framework that models the evolutionary process and can infer the structure of the underlying protein fitness landscape from multiple snapshots along a laboratory evolution trajectory. We generate a dihydrofolate reductase (DHFR) laboratory evolution data set and apply our modeling approach to infer the landscape parameters. The estimated parameters pinpoint key residues that dictate DHFR structure and function. We use the resulting model to understand the local and global structure of the fitness landscape and to perform in silico directed evolution for protein engineering. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Efficient Imidazolium‐Biomolecule Interaction‐Assisted Amplified Quenching for Ultrasensitive Detection of Heparin.
- Author
-
Lee, Seung Yeob, Song, Hyebin, Lee, Sun Woo, Han, Minwoo, Choi, Haemin, and Lee, Seoung Ho
- Subjects
- *
FLUORESCENCE quenching , *PROTON-proton interactions , *HEPARIN , *EXONUCLEASES - Abstract
Detection of heparin (HP) under physiological conditions is difficult due to the presence of biological obstructions including proteins and lipids. Thus, it is highly challenging to selectively detect HP and to increase its sensitivity in complex systems. Here, we report the detection of HP at nanomolar levels via efficient imidazolium‐HP interaction‐assisted fluorescence quenching amplification. The self‐assembled pyrenyl aggregates are devised as a conduit for efficient exciton transport, which induces amplified fluorescence quenching for HP detection. This amplified quenching is enhanced by introducing an imidazolium receptor designed to have a high affinity to HP via electrostatic and/or additional interactions with C2 protons, resulting in a very high Stern‐Volmer quenching constant of approximately 1.17×108 M−1. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Graph-based regularization for regression problems with alignment and highly-correlated designs
- Author
-
Li, Yuan, Mark, Benjamin, Raskutti, Garvesh, Willett, Rebecca, Song, Hyebin, and Neiman, David
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Statistics::Methodology ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Sparse models for high-dimensional linear regression and machine learning have received substantial attention over the past two decades. Model selection, or determining which features or covariates are the best explanatory variables, is critical to the interpretability of a learned model. Much of the current literature assumes that covariates are only mildly correlated. However, in many modern applications covariates are highly correlated and do not exhibit key properties (such as the restricted eigenvalue condition, restricted isometry property, or other related assumptions). This work considers a high-dimensional regression setting in which a graph governs both correlations among the covariates and the similarity among regression coefficients -- meaning there is \emph{alignment} between the covariates and regression coefficients. Using side information about the strength of correlations among features, we form a graph with edge weights corresponding to pairwise covariances. This graph is used to define a graph total variation regularizer that promotes similar weights for correlated features. This work shows how the proposed graph-based regularization yields mean-squared error guarantees for a broad range of covariance graph structures. These guarantees are optimal for many specific covariance graphs, including block and lattice graphs. Our proposed approach outperforms other methods for highly-correlated design in a variety of experiments on synthetic data and real biochemistry data.
- Published
- 2018
9. PUlasso: High-Dimensional Variable Selection With Presence-Only Data.
- Author
-
Song, Hyebin and Raskutti, Garvesh
- Subjects
- *
CHEBYSHEV approximation , *GENERALIZATION , *ALGORITHMS - Abstract
In various real-world problems, we are presented with classification problems with positive and unlabeled data, referred to as presence-only responses. In this article we study variable selection in the context of presence only responses where the number of features or covariates p is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this article, we develop the PUlasso algorithm for variable selection and classification with positive and unlabeled responses. Our algorithm involves using the majorization-minimization framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm converges to a stationary point, and then prove that any stationary point within a local neighborhood of the true parameter achieves the minimax optimal mean-squared error under both strict sparsity and group sparsity assumptions. We also demonstrate through simulations that our algorithm outperforms state-of-the-art algorithms in the moderate p settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Front Cover: Efficient Imidazolium‐Biomolecule Interaction‐Assisted Amplified Quenching for Ultrasensitive Detection of Heparin (Chem. Asian J. 18/2022).
- Author
-
Lee, Seung Yeob, Song, Hyebin, Lee, Sun Woo, Han, Minwoo, Choi, Haemin, and Lee, Seoung Ho
- Subjects
- *
HEPARIN , *FLUORESCENCE quenching - Abstract
Keywords: self-assembly; heparin; amplified quenching; imidazolium; fluorescent sensors EN self-assembly heparin amplified quenching imidazolium fluorescent sensors 1 1 1 09/16/22 20220914 NES 220914 B A micelle-based efficient sensing approach b , exhibiting stronger amplified fluorescence quenching by introducing imidazolium capable of enhancing affinity for heparin (HP), is described. A micelle-based fluorescent sensor with these features detects HP more sensitively than conventional fluorescent sensors with aliphatic ammonium. The pyrene aggregates in the micelle provide an efficient sensing platform for amplified quenching, and the C2 proton of imidazolium on the micelle surface attracts HP very effectively, as if black holes exert pressure on their environment. [Extracted from the article]
- Published
- 2022
- Full Text
- View/download PDF
11. Genetic modifiers and ascertainment drive variable expressivity of complex disorders.
- Author
-
Jensen M, Smolen C, Tyryshkina A, Pizzo L, Banerjee D, Oetjens M, Shimelis H, Taylor CM, Pounraja VK, Song H, Rohan L, Huber E, El Khattabi L, van de Laar I, Tadros R, Bezzina C, van Slegtenhorst M, Kammeraad J, Prontera P, Caberg JH, Fraser H, Banka S, Van Dijck A, Schwartz C, Voorhoeve E, Callier P, Mosca-Boidron AL, Marle N, Lefebvre M, Pope K, Snell P, Boys A, Lockhart PJ, Ashfaq M, McCready E, Nowacyzk M, Castiglia L, Galesi O, Avola E, Mattina T, Fichera M, Bruccheri MG, Mandarà GML, Mari F, Privitera F, Longo I, Curró A, Renieri A, Keren B, Charles P, Cuinat S, Nizon M, Pichon O, Bénéteau C, Stoeva R, Martin-Coignard D, Blesson S, Le Caignec C, Mercier S, Vincent M, Martin C, Mannik K, Reymond A, Faivre L, Sistermans E, Kooy RF, Amor DJ, Romano C, Andrieux J, and Girirajan S
- Abstract
Variable expressivity of disease-associated variants implies a role for secondary variants that modify clinical features. We assessed the effects of modifier variants towards clinical outcomes of 2,252 individuals with primary variants. Among 132 families with the 16p12.1 deletion, distinct rare and common variant classes conferred risk for specific developmental features, including short tandem repeats for neurological defects and SNVs for microcephaly, while additional disease-associated variants conferred multiple genetic diagnoses. Within disease and population cohorts of 773 individuals with the 16p12.1 deletion, we found opposing effects of secondary variants towards clinical features across ascertainments. Additional analysis of 1,479 probands with other primary variants, such as 16p11.2 deletion and CHD8 variants, and 1,084 without primary variants, showed that phenotypic associations differed by primary variant context and were influenced by synergistic interactions between primary and secondary variants. Our study provides a paradigm to dissect the genomic architecture of complex disorders towards personalized treatment., Competing Interests: DECLARATION OF INTERESTS The authors declare no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
12. Micelle-based fluorogenic sensing of trypsin: a sensitive method in pancreatic disease diagnosis.
- Author
-
Song H, Choi H, Kim YS, and Lee SH
- Subjects
- Humans, Spectrometry, Fluorescence, Protamines analysis, Micelles, Trypsin metabolism, Trypsin urine, Fluorescent Dyes chemistry, Fluorescent Dyes chemical synthesis, Pancreatic Diseases diagnosis
- Abstract
Protamine-mediated micellar aggregates, featuring an AIE-based fluorescent sensor, facilitate efficient detection of trypsin activity. This method enables the detection of trypsin at exceptionally low concentrations (0.01-0.1 μg mL
-1 ) in urine, demonstrating its potential for early clinical diagnosis of trypsin-related pancreatic diseases.- Published
- 2024
- Full Text
- View/download PDF
13. Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.
- Author
-
Song H, Bremer BJ, Hinds EC, Raskutti G, and Romero PA
- Subjects
- Amino Acid Sequence, Machine Learning, Proteins
- Abstract
Machine learning can infer how protein sequence maps to function without requiring a detailed understanding of the underlying physical or biological mechanisms. It is challenging to apply existing supervised learning frameworks to large-scale experimental data generated by deep mutational scanning (DMS) and related methods. DMS data often contain high-dimensional and correlated sequence variables, experimental sampling error and bias, and the presence of missing data. Notably, most DMS data do not contain examples of negative sequences, making it challenging to directly estimate how sequence affects function. Here, we develop a positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data. Our PU learning method displays excellent predictive performance across ten large-scale sequence-function datasets, representing proteins of different folds, functions, and library types. The estimated parameters pinpoint key residues that dictate protein structure and function. Finally, we apply our statistical sequence-function model to design highly stabilized enzymes., Competing Interests: Declaration of Interests The authors declare no competing interests., (Copyright © 2020 Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
14. Graph-based regularization for regression problems with alignment and highly-correlated designs.
- Author
-
Li Y, Mark B, Raskutti G, Willett R, Song H, and Neiman D
- Abstract
Sparse models for high-dimensional linear regression and machine learning have received substantial attention over the past two decades. Model selection, or determining which features or covariates are the best explanatory variables, is critical to the interpretability of a learned model. Much of the current literature assumes that covariates are only mildly correlated. However, in many modern applications covariates are highly correlated and do not exhibit key properties (such as the restricted eigenvalue condition, restricted isometry property, or other related assumptions). This work considers a high-dimensional regression setting in which a graph governs both correlations among the covariates and the similarity among regression coefficients - meaning there is alignment between the covariates and regression coefficients. Using side information about the strength of correlations among features, we form a graph with edge weights corresponding to pairwise covariances. This graph is used to define a graph total variation regularizer that promotes similar weights for correlated features. This work shows how the proposed graph-based regularization yields mean-squared error guarantees for a broad range of covariance graph structures. These guarantees are optimal for many specific covariance graphs, including block and lattice graphs. Our proposed approach outperforms other methods for highly-correlated design in a variety of experiments on synthetic data and real biochemistry data.
- Published
- 2020
- Full Text
- View/download PDF
15. The bias of isotonic regression.
- Author
-
Dai R, Song H, Barber RF, and Raskutti G
- Abstract
We study the bias of the isotonic regression estimator. While there is extensive work characterizing the mean squared error of the isotonic regression estimator, relatively little is known about the bias. In this paper, we provide a sharp characterization, proving that the bias scales as O ( n
- β/ 3 ) up to log factors, where 1 ≤ β ≤ 2 is the exponent corresponding to Hölder smoothness of the underlying mean. Importantly, this result only requires a strictly monotone mean and that the noise distribution has subexponential tails, without relying on symmetric noise or other restrictive assumptions.- Published
- 2020
- Full Text
- View/download PDF
16. PUlasso: High-Dimensional Variable Selection With Presence-Only Data.
- Author
-
Song H and Raskutti G
- Abstract
In various real-world problems, we are presented with classification problems with positive and unlabeled data , referred to as presence-only responses. In this article we study variable selection in the context of presence only responses where the number of features or covariates p is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this article, we develop the PUlasso algorithm for variable selection and classification with positive and unlabeled responses. Our algorithm involves using the majorization-minimization framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm converges to a stationary point, and then prove that any stationary point within a local neighborhood of the true parameter achieves the minimax optimal mean-squared error under both strict sparsity and group sparsity assumptions. We also demonstrate through simulations that our algorithm outperforms state-of-the-art algorithms in the moderate p settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example. Supplementary materials for this article are available online.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.