Author: "Lyu, Xin" / Topic: computer science - data structures and algorithms - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lyu, Xin"' showing total 11 results

Start Over Author "Lyu, Xin" Topic computer science - data structures and algorithms

11 results on '"Lyu, Xin"'

1. The Cost of Parallelizing Boosting

Author: Lyu, Xin, Wu, Hongxun, and Yang, Junzhao
Subjects: Computer Science - Machine Learning, Computer Science - Data Structures and Algorithms
Abstract: We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let $\gamma$ be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for $\tilde{O}(1 / \gamma^2)$ rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for $\Omega(1 / \gamma)$ rounds or incurs an $\exp(d / \gamma)$ blow-up in the complexity of training, where $d$ is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has $\Omega(1 / \gamma^2)$ rounds of interaction or incurs a smaller exponential blow-up of $\exp(d)$. -Complementing our lower bound, we show that there exists a boosting algorithm using $\tilde{O}(1/(t \gamma^2))$ rounds, and only suffer a blow-up of $\exp(d \cdot t^2)$. Plugging in $t = \omega(1)$, this shows that the smaller blow-up in our lower bound is tight. More interestingly, this provides the first trade-off between the parallelism and the total work required for boosting., Comment: appeared in SODA 2024
Published: 2024

2. Hot PATE: Private Aggregation of Distributions for Diverse Task

Author: Cohen, Edith, Cohen-Wang, Benjamin, Lyu, Xin, Nelson, Jelani, Sarlos, Tamas, and Stemmer, Uri
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Data Structures and Algorithms
Abstract: The Private Aggregation of Teacher Ensembles (PATE) framework is a versatile approach to privacy-preserving machine learning. In PATE, teacher models that are not privacy-preserving are trained on distinct portions of sensitive data. Privacy-preserving knowledge transfer to a student model is then facilitated by privately aggregating teachers' predictions on new examples. Employing PATE with generative auto-regressive models presents both challenges and opportunities. These models excel in open ended \emph{diverse} (aka hot) tasks with multiple valid responses. Moreover, the knowledge of models is often encapsulated in the response distribution itself and preserving this diversity is critical for fluid and effective knowledge transfer from teachers to student. In all prior designs, higher diversity resulted in lower teacher agreement and thus -- a tradeoff between diversity and privacy. Prior works with PATE thus focused on non-diverse settings or limiting diversity to improve utility. We propose \emph{hot PATE}, a design tailored for the diverse setting. In hot PATE, each teacher model produces a response distribution that can be highly diverse. We mathematically model the notion of \emph{preserving diversity} and propose an aggregation method, \emph{coordinated ensembles}, that preserves privacy and transfers diversity with \emph{no penalty} to privacy or efficiency. We demonstrate empirically the benefits of hot PATE for in-context learning via prompts and potential to unleash more of the capabilities of generative models.
Published: 2023

3. The Target-Charging Technique for Privacy Accounting across Interactive Computations

Author: Cohen, Edith and Lyu, Xin
Subjects: Computer Science - Data Structures and Algorithms
Abstract: We propose the \emph{Target Charging Technique} (TCT), a unified privacy analysis framework for interactive settings where a sensitive dataset is accessed multiple times using differentially private algorithms. Unlike traditional composition, where privacy guarantees deteriorate quickly with the number of accesses, TCT allows computations that don't hit a specified \emph{target}, often the vast majority, to be essentially free (while incurring instead a small overhead on those that do hit their targets). TCT generalizes tools such as the sparse vector technique and top-$k$ selection from private candidates and extends their remarkable privacy enhancement benefits from noisy Lipschitz functions to general private algorithms.
Published: 2023

4. Improved Pseudorandom Generators for $\mathsf{AC}^0$ Circuits

Author: Lyu, Xin
Subjects: Computer Science - Computational Complexity, Computer Science - Data Structures and Algorithms
Abstract: We show a new PRG construction fooling depth-$d$, size-$m$ $\mathsf{AC}^0$ circuits within error $\varepsilon$, which has seed length $O(\log^{d-1}(m)\log(m/\varepsilon)\log\log(m))$. Our PRG improves on previous work (Trevisan and Xue 2013, Servedio and Tan 2019, Kelley 2021) from various aspects. It has optimal dependence on $\frac{1}{\varepsilon}$ and is only one ``$\log\log(m)$'' away from the lower bound barrier. For the case of $d=2$, the seed length tightly matches the best-known PRG for CNFs (De et al. 2010, Tal 2017). There are two technical ingredients behind our new result; both of them might be of independent interest. First, we use a partitioning-based approach to construct PRGs based on restriction lemmas for $\mathsf{AC}^0$, which follows and extends the seminal work of (Ajtai and Wigderson 1989). Second, improving and extending prior works (Trevisan and Xue 2013, Servedio and Tan 2019, Kelley 2021), we prove a full derandomization of the powerful multi-switching lemma for a family of DNFs (H{\aa}stad 2014)., Comment: The conference version appeared in CCC2022
Published: 2023

5. Generalized Private Selection and Testing with High Confidence

Author: Cohen, Edith, Lyu, Xin, Nelson, Jelani, Sarlós, Tamás, and Stemmer, Uri
Subjects: Computer Science - Cryptography and Security, Computer Science - Data Structures and Algorithms
Abstract: Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private selection/test frameworks, including the work by Liu and Talwar (STOC 2019), and Papernot and Steinke (ICLR 2022). In this work, we first present an alternative framework for private selection and testing with a simpler privacy proof and equally-good utility guarantee. Second, we observe that the private selection framework (both previous ones and ours) can be applied to improve the accuracy/confidence trade-off for many fundamental privacy-preserving data-analysis tasks, including query releasing, top-$k$ selection, and stable selection. Finally, for online settings, we apply the private testing to design a mechanism for adaptive query releasing, which improves the sample complexity dependence on the confidence parameter for the celebrated private multiplicative weights algorithm of Hardt and Rothblum (FOCS 2010)., Comment: Appeared in ITCS 2023; This version: revised introduction and related works sections
Published: 2022

6. \~Optimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

Author: Cohen, Edith, Lyu, Xin, Nelson, Jelani, Sarlós, Tamás, and Stemmer, Uri
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Data Structures and Algorithms
Abstract: The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(\xi^{-1} \log(1/\beta))$ (for generalization error $\xi$ with confidence $1-\beta$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of $\tilde{O}(\log^* |X|)$, which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of $\tilde{O}((\log^* |X|)^{1.5})$ by Kaplan et al. We also provide matching upper and lower bounds of $\tilde{\Theta}(2^{\log^*|X|})$ for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications.
Published: 2022

7. Time-Space Tradeoffs for Element Distinctness and Set Intersection via Pseudorandomness

Author: Lyu, Xin and Zhu, Weihao
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Computational Complexity
Abstract: In the Element Distinctness problem, one is given an array $a_1,\dots, a_n$ of integers from $[poly(n)]$ and is tasked to decide if $\{a_i\}$ are mutually distinct. Beame, Clifford and Machmouchi (FOCS 2013) gave a low-space algorithm for this problem running in space $S(n)$ and time $T(n)$ where $T(n) \le \widetilde{O}(n^{3/2}/S(n)^{1/2})$, assuming a random oracle (i.e., random access to polynomially many random bits). A recent breakthrough by Chen, Jin, Williams and Wu (SODA 2022) showed how to remove the random oracle assumption in the regime $S(n) = polylog(n)$ and $T(n) = \widetilde{O}(n^{3/2})$. They designed the first truly $polylog(n)$-space, $\widetilde{O}(n^{3/2})$-time algorithm by constructing a small family of hash functions $\mathcal{H} \subseteq \{h | h:[poly(n)]\to [n]\}$ with a certain pseudorandom property. In this paper, we give a significantly simplified analysis of the pseudorandom hash family by Chen et al. Our analysis clearly identifies the key pseudorandom property required to fool the BCM algorithm, allowing us to explore the full potential of this construction. As our main result, we show a time-space tradeoff for Element Distinctness without random oracle. Namely, for every $S(n),T(n)$ such that $T\approx \widetilde{O}(n^{3/2}/S(n)^{1/2})$, our algorithm can solve the problem in space $S(n)$ and time $T(n)$. Our algorithm also works for a related problem Set Intersection, for which this tradeoff is tight due to a matching lower bound by Dinur (Eurocrypt 2020). As two additional contributions, we show a more general pseudorandom property of the hash family, and slightly improve the seed length to sample the pseudorandom hash function., Comment: To appear in SODA 2023. Abstract shortened to fit into the requirement of arXiv
Published: 2022

8. Composition Theorems for Interactive Differential Privacy

Author: Lyu, Xin
Subjects: Computer Science - Cryptography and Security, Computer Science - Data Structures and Algorithms, Computer Science - Information Theory
Abstract: An interactive mechanism is an algorithm that stores a data set and answers adaptively chosen queries to it. The mechanism is called differentially private, if any adversary cannot distinguish whether a specific individual is in the data set by interacting with the mechanism. We study composition properties of differential privacy in concurrent compositions. In this setting, an adversary interacts with k interactive mechanisms in parallel and can interleave its queries to the mechanisms arbitrarily. Previously, Vadhan and Wang [2021] proved an optimal concurrent composition theorem for pure-differential privacy. We significantly generalize and extend their results. Namely, we prove optimal parallel composition properties for several major notions of differential privacy in the literature, including approximate DP, R\'enyi DP, and zero-concentrated DP. Our results demonstrate that the adversary gains no advantage by interleaving its queries to independently running mechanisms. Hence, interactivity is a feature that differential privacy grants us for free. Concurrently and independently of our work, Vadhan and Zhang [2022] proved an optimal concurrent composition theorem for f-DP [Dong et al., 2022], which implies our result for the approximate DP case., Comment: To appear in NeurIPS 2022; Revised according to reviewers' feedback; Mentioned a concurrent and independent work
Published: 2022

9. On the Robustness of CountSketch to Adaptive Inputs

Author: Cohen, Edith, Lyu, Xin, Nelson, Jelani, Sarlós, Tamás, Shechner, Moshe, and Stemmer, Uri
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Machine Learning
Abstract: CountSketch is a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering $\ell_2$-heavy hitters of a vector (entries with $v[i]^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$). We study the robustness of the sketch in adaptive settings where input vectors may depend on the output from prior inputs. Adaptive settings arise in processes with feedback or with adversarial attacks. We show that the classic estimator is not robust, and can be attacked with a number of queries of the order of the sketch size. We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior work.
Published: 2022

10. Generalized Private Selection and Testing with High Confidence

Author: Cohen, Edith, Lyu, Xin, Nelson, Jelani, Sarlós, Tamás, and Stemmer, Uri
Subjects: FOS: Computer and information sciences, Computer Science - Cryptography and Security, differential privacy, Theory of computation → Design and analysis of algorithms, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), sparse vector technique, Cryptography and Security (cs.CR), adaptive data analysis
Abstract: Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private selection/test frameworks, including the work by Liu and Talwar (STOC 2019), and Papernot and Steinke (ICLR 2022). In this work, we first present an alternative framework for private selection and testing with a simpler privacy proof and equally-good utility guarantee. Second, we observe that the private selection framework (both previous ones and ours) can be applied to improve the accuracy/confidence trade-off for many fundamental privacy-preserving data-analysis tasks, including query releasing, top-$k$ selection, and stable selection. Finally, for online settings, we apply the private testing to design a mechanism for adaptive query releasing, which improves the sample complexity dependence on the confidence parameter for the celebrated private multiplicative weights algorithm of Hardt and Rothblum (FOCS 2010)., Comment: Appeared in ITCS 2023; This version: revised introduction and related works sections
Published: 2023
Full Text: View/download PDF

11. Õptimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

Author: Cohen, Edith, Lyu, Xin, Nelson, Jelani, Sarlós, Tamás, and Stemmer, Uri
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Cryptography and Security (cs.CR), Machine Learning (cs.LG)
Abstract: The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(\xi^{-1} \log(1/\beta))$ (for generalization error $\xi$ with confidence $1-\beta$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of $\tilde{O}(\log^* |X|)$, which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of $\tilde{O}((\log^* |X|)^{1.5})$ by Kaplan et al. We also provide matching upper and lower bounds of $\tilde{\Theta}(2^{\log^*|X|})$ for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications.
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Lyu, Xin"'

1. The Cost of Parallelizing Boosting

2. Hot PATE: Private Aggregation of Distributions for Diverse Task

3. The Target-Charging Technique for Privacy Accounting across Interactive Computations

4. Improved Pseudorandom Generators for $\mathsf{AC}^0$ Circuits

5. Generalized Private Selection and Testing with High Confidence

6. \~Optimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

7. Time-Space Tradeoffs for Element Distinctness and Set Intersection via Pseudorandomness

8. Composition Theorems for Interactive Differential Privacy

9. On the Robustness of CountSketch to Adaptive Inputs

10. Generalized Private Selection and Testing with High Confidence

11. Õptimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

11 results on '"Lyu, Xin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources