Author: "Miles, Roy" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Miles, Roy"' showing total 16 results

Start Over Author "Miles, Roy"

16 results on '"Miles, Roy"'

1. VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

Author: Miles, Roy, Reddy, Pradyumna, Elezi, Ismail, and Deng, Jiankang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have recently emerged as powerful tools for tackling many language-processing tasks. Despite their success, training and fine-tuning these models is still far too computationally and memory intensive. In this paper, we identify and characterise the important components needed for effective model convergence using gradient descent. In doing so we find that the intermediate activations used to implement backpropagation can be excessively compressed without incurring any degradation in performance. This result leads us to a cheap and memory-efficient algorithm for both fine-tuning and pre-training LLMs. The proposed algorithm simply divides the tokens up into smaller sub-tokens before projecting them onto a fixed 1-dimensional subspace during the forward pass. These features are then coarsely reconstructed during the backward pass to implement the update rules. We confirm the effectiveness of our algorithm as being complimentary to many state-of-the-art PEFT methods on the VTAB-1k fine-tuning benchmark. Furthermore, we outperform QLoRA for fine-tuning LLaMA and show competitive performance against other memory-efficient pre-training methods on the large-scale C4 dataset., Comment: NeurIPS 2024. Code available at https://github.com/roymiles/VeLoRA
Published: 2024

2. Learning to Project for Cross-Task Knowledge Distillation

Author: Auty, Dylan, Miles, Roy, Kolbeinsson, Benedikt, and Mikolajczyk, Krystian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Traditional knowledge distillation (KD) relies on a proficient teacher trained on the target task, which is not always available. In this setting, cross-task distillation can be used, enabling the use of any teacher model trained on a different task. However, many KD methods prove ineffective when applied to this cross-task setting. To address this limitation, we propose a simple modification: the use of an inverted projection. We show that this drop-in replacement for a standard projector is effective by learning to disregard any task-specific features which might degrade the student's performance. We find that this simple modification is sufficient for extending many KD methods to the cross-task setting, where the teacher and student tasks can be very different. In doing so, we obtain up to a 1.9% improvement in the cross-task setting compared to the traditional projection, at no additional cost. Our method can obtain significant performance improvements (up to 7%) when using even a randomly-initialised teacher on various tasks such as depth estimation, image translation, and semantic segmentation, despite the lack of any learned knowledge to transfer. To provide conceptual and analytical insights into this result, we show that using an inverted projection allows the distillation loss to be decomposed into a knowledge transfer and a spectral regularisation component. Through this analysis we are additionally able to propose a novel regularisation loss that allows teacher-free distillation, enabling performance improvements of up to 8.57% on ImageNet with no additional training costs.
Published: 2024

3. $V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

Author: Miles, Roy, Elezi, Ismail, and Deng, Jiankang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Knowledge distillation is an effective method for training small and efficient deep learning models. However, the efficacy of a single method can degenerate when transferring to other tasks, modalities, or even other architectures. To address this limitation, we propose a novel constrained feature distillation method. This method is derived from a small set of core principles, which results in two emerging components: an orthogonal projection and a task-specific normalisation. Equipped with both of these components, our transformer models can outperform all previous methods on ImageNet and reach up to a 4.4% relative improvement over the previous state-of-the-art methods. To further demonstrate the generality of our method, we apply it to object detection and image generation, whereby we obtain consistent and substantial performance improvements over state-of-the-art. Code and models are publicly available: https://github.com/roymiles/vkd, Comment: CVPR 2024. Code available at https://github.com/roymiles/vkd
Published: 2024

4. Understanding the Role of the Projector in Knowledge Distillation

Author: Miles, Roy and Mikolajczyk, Krystian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: In this paper we revisit the efficacy of knowledge distillation as a function matching and metric learning problem. In doing so we verify three important design decisions, namely the normalisation, soft maximum function, and projection layers as key ingredients. We theoretically show that the projector implicitly encodes information on past examples, enabling relational gradients for the student. We then show that the normalisation of representations is tightly coupled with the training dynamics of this projector, which can have a large impact on the students performance. Finally, we show that a simple soft maximum function can be used to address any significant capacity gap problems. Experimental results on various benchmark datasets demonstrate that using these insights can lead to superior or comparable performance to state-of-the-art knowledge distillation techniques, despite being much more computationally efficient. In particular, we obtain these results across image classification (CIFAR100 and ImageNet), object detection (COCO2017), and on more difficult distillation objectives, such as training data efficient transformers, whereby we attain a 77.2% top-1 accuracy with DeiT-Ti on ImageNet. Code and models are publicly available., Comment: AAAI 2024. Code available at https://github.com/roymiles/Simple-Recipe-Distillation
Published: 2023

5. MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

Author: Miles, Roy, Yucel, Mehmet Kerim, Manganelli, Bruno, and Saa-Garriga, Albert
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper tackles the problem of semi-supervised video object segmentation on resource-constrained devices, such as mobile phones. We formulate this problem as a distillation task, whereby we demonstrate that small space-time-memory networks with finite memory can achieve competitive results with state of the art, but at a fraction of the computational cost (32 milliseconds per frame on a Samsung Galaxy S22). Specifically, we provide a theoretically grounded framework that unifies knowledge distillation with supervised contrastive representation learning. These models are able to jointly benefit from both pixel-wise contrastive learning and distillation from a pre-trained teacher. We validate this loss by achieving competitive J&F to state of the art on both the standard DAVIS and YouTube benchmarks, despite running up to 5x faster, and with 32x fewer parameters., Comment: CVPR 2023
Published: 2023

6. Information Theoretic Representation Distillation

Author: Miles, Roy, Rodriguez, Adrian Lopez, and Mikolajczyk, Krystian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the empirical success of knowledge distillation, current state-of-the-art methods are computationally expensive to train, which makes them difficult to adopt in practice. To address this problem, we introduce two distinct complementary losses inspired by a cheap entropy-like estimator. These losses aim to maximise the correlation and mutual information between the student and teacher representations. Our method incurs significantly less training overheads than other approaches and achieves competitive performance to the state-of-the-art on the knowledge distillation and cross-model transfer tasks. We further demonstrate the effectiveness of our method on a binary distillation task, whereby it leads to a new state-of-the-art for binary quantisation and approaches the performance of a full precision model. Code: www.github.com/roymiles/ITRD, Comment: BMVC 2022
Published: 2021

7. Reconstructing Pruned Filters using Cheap Spatial Transformations

Author: Miles, Roy and Mikolajczyk, Krystian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present an efficient alternative to the convolutional layer using cheap spatial transformations. This construction exploits an inherent spatial redundancy of the learned convolutional filters to enable a much greater parameter efficiency, while maintaining the top-end accuracy of their dense counter-parts. Training these networks is modelled as a generalised pruning problem, whereby the pruned filters are replaced with cheap transformations from the set of non-pruned filters. We provide an efficient implementation of the proposed layer, followed by two natural extensions to avoid excessive feature compression and to improve the expressivity of the transformed features. We show that these networks can achieve comparable or improved performance to state-of-the-art pruning models across both the CIFAR-10 and ImageNet-1K datasets., Comment: ICCV 2023 Workshop on Resource Efficient Deep Learning for Computer Vision
Published: 2021

8. Cascaded channel pruning using hierarchical self-distillation

Author: Miles, Roy and Mikolajczyk, Krystian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose an approach for filter-level pruning with hierarchical knowledge distillation based on the teacher, teaching-assistant, and student framework. Our method makes use of teaching assistants at intermediate pruning levels that share the same architecture and weights as the target student. We propose to prune each model independently using the gradient information from its corresponding teacher. By considering the relative sizes of each student-teacher pair, this formulation provides a natural trade-off between the capacity gap for knowledge distillation and the bias of the filter saliency updates. Our results show improvements in the attainable accuracy and model compression across the CIFAR10 and ImageNet classification tasks using the VGG16and ResNet50 architectures. We provide an extensive evaluation that demonstrates the benefits of using a varying number of teaching assistant models at different sizes., Comment: BMVC 2020
Published: 2020

9. Compression of descriptor models for mobile applications

Author: Miles, Roy and Mikolajczyk, Krystian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep neural networks have demonstrated state-of-the-art performance for feature-based image matching through the advent of new large and diverse datasets. However, there has been little work on evaluating the computational cost, model size, and matching accuracy tradeoffs for these models. This paper explicitly addresses these practical metrics by considering the state-of-the-art HardNet model. We observe a significant redundancy in the learned weights, which we exploit through the use of depthwise separable layers and an efficient Tucker decomposition. We demonstrate that a combination of these methods is very effective, but still sacrifices the top-end accuracy. To resolve this, we propose the Convolution-Depthwise-Pointwise(CDP) layer, which provides a means of interpolating between the standard and depthwise separable convolutions. With this proposed layer, we can achieve an 8 times reduction in the number of parameters on the HardNet model, 13 times reduction in the computational complexity, while sacrificing less than 1% on the overall accuracy across theHPatchesbenchmarks. To further demonstrate the generalisation of this approach, we apply it to the state-of-the-art SuperPoint model, where we can significantly reduce the number of parameters and floating-point operations, with minimal degradation in the matching accuracy., Comment: ICASSP 2021
Published: 2020

10. Reconstructing Pruned Filters using Cheap Spatial Transformations

Author: Miles, Roy, primary and Mikolajczyk, Krystian, additional
Published: 2023
Full Text: View/download PDF

11. MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

Author: Miles, Roy, primary, Yucel, Mehmet Kerim, additional, Manganelli, Bruno, additional, and Saà-Garriga, Albert, additional
Published: 2023
Full Text: View/download PDF

12. A closer look at the training dynamics of knowledge distillation

Author: Miles, Roy and Mikolajczyk, Krystian
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper we revisit the efficacy of knowledge distillation as a function matching and metric learning problem. In doing so we verify three important design decisions, namely the normalisation, soft maximum function, and projection layers as key ingredients. We theoretically show that the projector implicitly encodes information on past examples, enabling relational gradients for the student. We then show that the normalisation of representations is tightly coupled with the training dynamics of this projector, which can have a large impact on the students performance. Finally, we show that a simple soft maximum function can be used to address any significant capacity gap problems. Experimental results on various benchmark datasets demonstrate that using these insights can lead to superior or comparable performance to state-of-the-art knowledge distillation techniques, despite being much more computationally efficient. In particular, we obtain these results across image classification (CIFAR100 and ImageNet), object detection (COCO2017), and on more difficult distillation objectives, such as training data efficient transformers, whereby we attain a 77.2% top-1 accuracy with DeiT-Ti on ImageNet.
Published: 2023
Full Text: View/download PDF

13. Network compression and faster inference using spatial basis filters

Author: Miles, Roy and Mikolajczyk, Krystian
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: We present an efficient alternative to the convolutional layer through utilising spatial basis filters (SBF). SBF layers exploit the spatial redundancy in the convolutional filters across the depth to achieve overall model compression, while maintaining the top-end accuracy of their dense counter-parts. Training SBF-Nets is modelled as a simple pruning problem, but instead of zeroing out the pruned channels, they are replaced with inexpensive transformations from the set of non-pruned features. To enable an adoption of these SBF layers, we provide a flexible training pipeline and an efficient implementation in CUDA with low latency. To further demonstrate the effective capacity of these models, we apply semi-supervised knowledge distillation that leads to significant performance improvements over the baseline networks. Our experiments show that SBF-Nets are effective and achieve comparable or improved performance to state-of-the-art across CIFAR10, CIFAR100, Tiny-ImageNet, and ILSCRC-2012.
Published: 2021

14. Compressing Local Descriptor Models for Mobile Applications

Author: Miles, Roy, primary and Mikolajczyk, Krystian, additional
Published: 2021
Full Text: View/download PDF

15. The effect of behavioral and non-behavioral objectives on achievement in introductory college geology

Author: Miles, Roy Gene and Community College
Subjects: LD5655.V856 1976.M54, education
Abstract: The primary purpose of this study was to test the relative effectiveness of behavioral and non-behavioral objectives. The behavioral objectives attempted to specify to the student what was to be learned and how such learning was to be demonstrated. The non-behavioral form, called outline objectives, consisted of listings of terms and concepts in hierarchical groups. A second purpose was to assess the attitudes and preferences of the subjects with respect to types of objectives. Sixty-two students in four introductory geology classes were used as subjects. Two intact classes were randomly assigned to an experimental group (32 students) and two to a comparison group (30 students). Multivariate Analysis of seven demographic variables (age, high school rank, geology pretest, I.Q. score, vocabulary, comprehension, and reading rate) was used to establish the equivalence of both groups. Throughout the first quarter the experimental group received weekly units of behavioral objectives whereas the comparison group received weekly units of outline objectives. Subjects were tested at approximately two-week intervals and were given a comprehensive final exam. Analysis of Variance showed that the experimental group achieved significantly higher on two tests. Differences on all other tests were in favor of the experimental group but were below the 0.05 level of significance. Multivariate Analysis indicated that the overall achievement of the experimental group was significantly higher than that of the comparison group. During the second quarter, all students were exposed to both types of objectives. An attitude scale administered to assess student preferences revealed almost unanimous support for the use of objectives. Given a choice of behavioral or outline objectives, a majority of subjects indicated a preference for the outline form and appeared to view the outline form as significantly more useful. It was concluded that the use of behavioral objectives was supported by this study but that students seemed to prefer the outline form. Further research should focus on the development of more effective variations of objectives and on case studies to determine which individuals make most efficient use of each type of objective. Ed. D.
Published: 1976

16. Beginners' Russian.

Author: MILES, ROY
Subjects: RUSSIAN art, NONFICTION
Abstract: The article reviews the book "Contemporary Russian Art," by Matthew Cullerne Bown.
Published: 1989

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

16 results on '"Miles, Roy"'

1. VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

2. Learning to Project for Cross-Task Knowledge Distillation

3. $V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

4. Understanding the Role of the Projector in Knowledge Distillation

5. MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

6. Information Theoretic Representation Distillation

7. Reconstructing Pruned Filters using Cheap Spatial Transformations

8. Cascaded channel pruning using hierarchical self-distillation

9. Compression of descriptor models for mobile applications

10. Reconstructing Pruned Filters using Cheap Spatial Transformations

11. MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

12. A closer look at the training dynamics of knowledge distillation

13. Network compression and faster inference using spatial basis filters

14. Compressing Local Descriptor Models for Mobile Applications

15. The effect of behavioral and non-behavioral objectives on achievement in introductory college geology

16. Beginners' Russian.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

16 results on '"Miles, Roy"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources