Author: "Dan Feldman" / Publisher: mdpi ag - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Dan Feldman"' showing total 8 results

Start Over Author "Dan Feldman" Publisher mdpi ag

8 results on '"Dan Feldman"'

1. Coresets for the Average Case Error for Finite Query Sets

Author: Alaa Maalouf, Ibrahim Jubran, Murad Tukan, and Dan Feldman
Subjects: coreset, average case analysis, big data, sparsification, dimensionality reduction, approximation algorithms, Chemical technology, TP1-1185
Abstract: Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for n≫1/ε, where ε is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided.
Published: 2021
Full Text: View/download PDF

2. No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks

Author: Murad Tukan, Alaa Maalouf, Matan Weksler, and Dan Feldman
Subjects: matrix factorization, neural networks compression, robust low rank approximation, Löwner ellipsoid, Chemical technology, TP1-1185
Abstract: A common technique for compressing a neural network is to compute the k-rank ℓ2 approximation Ak of the matrix A∈Rn×d via SVD that corresponds to a fully connected layer (or embedding layer). Here, d is the number of input neurons in the layer, n is the number in the next one, and Ak is stored in O((n+d)k) memory instead of O(nd). Then, a fine-tuning step is used to improve this initial compression. However, end users may not have the required computation resources, time, or budget to run this fine-tuning stage. Furthermore, the original training set may not be available. In this paper, we provide an algorithm for compressing neural networks using a similar initial compression time (to common techniques) but without the fine-tuning step. The main idea is replacing the k-rank ℓ2 approximation with ℓp, for p∈[1,2], which is known to be less sensitive to outliers but much harder to compute. Our main technical result is a practical and provable approximation algorithm to compute it for any p≥1, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing the networks BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage.
Published: 2021
Full Text: View/download PDF

3. k-Means+++: Outliers-Resistant Clustering

Author: Adiel Statman, Liat Rozenberg, and Dan Feldman
Subjects: clustering, approximation, outliers, Industrial engineering. Management engineering, T55.4-60.8, Electronic computers. Computer science, QA75.5-76.95
Abstract: The k-means problem is to compute a set of k centers (points) that minimizes the sum of squared distances to a given set of n points in a metric space. Arguably, the most common algorithm to solve it is k-means++ which is easy to implement and provides a provably small approximation error in time that is linear in n. We generalize k-means++ to support outliers in two sense (simultaneously): (i) nonmetric spaces, e.g., M-estimators, where the distance dist(p,x) between a point p and a center x is replaced by mindist(p,x),c for an appropriate constant c that may depend on the scale of the input. (ii) k-means clustering with m≥1 outliers, i.e., where the m farthest points from any given k centers are excluded from the total sum of distances. This is by using a simple reduction to the (k+m)-means clustering (with no outliers).
Published: 2020
Full Text: View/download PDF

4. Sphere Fitting with Applications to Machine Tracking

Author: Dror Epstein and Dan Feldman
Subjects: sphere fitting, coresets, sampling methodologies, geometric approximation algorithms, Industrial engineering. Management engineering, T55.4-60.8, Electronic computers. Computer science, QA75.5-76.95
Abstract: We suggest a provable and practical approximation algorithm for fitting a set P of n points in R d to a sphere. Here, a sphere is represented by its center x ∈ R d and radius r > 0 . The goal is to minimize the sum ∑ p ∈ P ∣ p − x − r ∣ of distances to the points up to a multiplicative factor of 1 ± ε , for a given constant ε > 0 , over every such r and x. Our main technical result is a data summarization of the input set, called coreset, that approximates the above sum of distances on the original (big) set P for every sphere. Then, an accurate sphere can be extracted quickly via an inefficient exhaustive search from the small coreset. Most articles focus mainly on sphere identification (e.g., circles in 2 D image) rather than finding the exact match (in the sense of extent measures), and do not provide approximation guarantees. We implement our algorithm and provide extensive experimental results on both synthetic and real-world data. We then combine our algorithm in a mechanical pressure control system whose main bottleneck is tracking a falling ball. Full open source is also provided.
Published: 2020
Full Text: View/download PDF

5. Autonomous Toy Drone via Coresets for Pose Estimation

Author: Soliman Nasser, Ibrahim Jubran, and Dan Feldman
Subjects: pose estimation, localization, indoor navigation and mapping, autonomous sensors for micro drones, coresets, caratheodory, Chemical technology, TP1-1185
Abstract: A coreset of a dataset is a small weighted set, such that querying the coreset provably yields a ( 1 + ε )-factor approximation to the original (full) dataset, for a given family of queries. This paper suggests accurate coresets ( ε = 0 ) that are subsets of the input for fundamental optimization problems. These coresets enabled us to implement a “Guardian Angel” system that computes pose-estimation in a rate > 20 frames per second. It tracks a toy quadcopter which guides guests in a supermarket, hospital, mall, airport, and so on. We prove that any set of n matrices in R d × d whose sum is a matrix S of rank r, has a coreset whose sum has the same left and right singular vectors as S, and consists of O ( d r ) = O ( d 2 ) matrices, independent of n. This implies the first (exact, weighted subset) coreset of O ( d 2 ) points to problems such as linear regression, PCA/SVD, and Wahba’s problem, with corresponding streaming, dynamic, and distributed versions. Our main tool is a novel usage of the Caratheodory Theorem for coresets, an algorithm that computes its set in time that is linear in its cardinality. Extensive experimental results on both synthetic and real data, companion video of our system, and open code are provided.
Published: 2020
Full Text: View/download PDF

6. Deterministic Coresets for k-Means of Big Sparse Data

Author: Artem Barger and Dan Feldman
Subjects: coreset, clustering, KMeans, big data, streaming, Industrial engineering. Management engineering, T55.4-60.8, Electronic computers. Computer science, QA75.5-76.95
Abstract: Let P be a set of n points in R d , k ≥ 1 be an integer and ε ∈ ( 0 , 1 ) be a constant. An ε-coreset is a subset C ⊆ P with appropriate non-negative weights (scalars), that approximates any given set Q ⊆ R d of k centers. That is, the sum of squared distances over every point in P to its closest point in Q is the same, up to a factor of 1 ± ε to the weighted sum of C to the same k centers. If the coreset is small, we can solve problems such as k-means clustering or its variants (e.g., discrete k-means, where the centers are restricted to be in P, or other restricted zones) on the small coreset to get faster provable approximations. Moreover, it is known that such coreset support streaming, dynamic and distributed data using the classic merge-reduce trees. The fact that the coreset is a subset implies that it preserves the sparsity of the data. However, existing such coresets are randomized and their size has at least linear dependency on the dimension d. We suggest the first such coreset of size independent of d. This is also the first deterministic coreset construction whose resulting size is not exponential in d. Extensive experimental results and benchmarks are provided on public datasets, including the first coreset of the English Wikipedia using Amazon’s cloud.
Published: 2020
Full Text: View/download PDF

7. k-Means+++: Outliers-Resistant Clustering

Author: Dan Feldman, Liat Rozenberg, and Adiel Statman
Subjects: 0209 industrial biotechnology, lcsh:T55.4-60.8, Scale (descriptive set theory), 02 engineering and technology, numerical_analysis_optimization, lcsh:QA75.5-76.95, Theoretical Computer Science, Combinatorics, Reduction (complexity), 020901 industrial engineering & automation, Approximation error, 0202 electrical engineering, electronic engineering, information engineering, lcsh:Industrial engineering. Management engineering, Cluster analysis, approximation, Mathematics, Numerical Analysis, outliers, k-means clustering, algebra_number_theory, Computational Mathematics, Metric space, Computational Theory and Mathematics, Outlier, 020201 artificial intelligence & image processing, lcsh:Electronic computers. Computer science, Constant (mathematics), clustering
Abstract: The k-means problem is to compute a set of k centers (points) that minimizes the sum of squared distances to a given set of n points in a metric space. Arguably, the most common algorithm to solve it is k-means++ which is easy to implement and provides a provably small approximation error in time that is linear in n. We generalize k-means++ to support outliers in two sense (simultaneously): (i) nonmetric spaces, e.g., M-estimators, where the distance dist(p,x) between a point p and a center x is replaced by mindist(p,x),c for an appropriate constant c that may depend on the scale of the input. (ii) k-means clustering with m&ge, 1 outliers, i.e., where the m farthest points from any given k centers are excluded from the total sum of distances. This is by using a simple reduction to the (k+m)-means clustering (with no outliers).
Published: 2020

8. Finding Patterns in Signals Using Lossy Text Compression

Author: Sagi Lotan, Liat Rozenberg, and Dan Feldman
Subjects: Reverse engineering, Computer science, 02 engineering and technology, Lossy compression, computer.software_genre, run-length, Theoretical Computer Science, Encoding (memory), 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Pattern matching, data compression, robotics, Lossless compression, Numerical Analysis, signals, 020206 networking & telecommunications, Computational Mathematics, Computational Theory and Mathematics, periods, ComputerSystemsOrganization_MISCELLANEOUS, 020201 artificial intelligence & image processing, RRLE, Communications protocol, computer, Algorithm, Data compression
Abstract: Whether the source is autonomous car, robotic vacuum cleaner, or a quadcopter, signals from sensors tend to have some hidden patterns that repeat themselves. For example, typical GPS traces from a smartphone contain periodic trajectories such as &ldquo, home, work, home, work, ⋯&rdquo, Our goal in this study was to automatically reverse engineer such signals, identify their periodicity, and then use it to compress and de-noise these signals. To do so, we present a novel method of using algorithms from the field of pattern matching and text compression to represent the &ldquo, language&rdquo, in such signals. Common text compression algorithms are less tailored to handle such strings. Moreover, they are lossless, and cannot be used to recover noisy signals. To this end, we define the recursive run-length encoding (RRLE) method, which is a generalization of the well known run-length encoding (RLE) method. Then, we suggest lossy and lossless algorithms to compress and de-noise such signals. Unlike previous results, running time and optimality guarantees are proved for each algorithm. Experimental results on synthetic and real data sets are provided. We demonstrate our system by showing how it can be used to turn commercial micro air-vehicles into autonomous robots. This is by reverse engineering their unpublished communication protocols and using a laptop or on-board micro-computer to control them. Our open source code may be useful for both the community of millions of toy robots users, as well as for researchers that may extend it for further protocols.
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Dan Feldman"'

1. Coresets for the Average Case Error for Finite Query Sets

2. No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks

3. k-Means+++: Outliers-Resistant Clustering

4. Sphere Fitting with Applications to Machine Tracking

5. Autonomous Toy Drone via Coresets for Pose Estimation

6. Deterministic Coresets for k-Means of Big Sparse Data

7. k-Means+++: Outliers-Resistant Clustering

8. Finding Patterns in Signals Using Lossy Text Compression

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

8 results on '"Dan Feldman"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources