Author: "Ahmadian, Mona" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ahmadian, Mona"' showing total 10 results

Start Over Author "Ahmadian, Mona"

10 results on '"Ahmadian, Mona"'

1. FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

Author: Ahmadian, Mona, Guerin, Frank, and Gilbert, Andrew
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper demonstrates a self-supervised approach for learning semantic video representations. Recent vision studies show that a masking strategy for vision and natural language supervision has contributed to developing transferable visual pretraining. Our goal is to achieve a more semantic video representation by leveraging the text related to the video content during the pretraining in a fully self-supervised manner. To this end, we present FILS, a novel self-supervised video Feature prediction In semantic Language Space (FILS). The vision model can capture valuable structured information by correctly predicting masked feature semantics in language space. It is learned using a patch-wise video-text contrastive strategy, in which the text representations act as prototypes for transforming vision features into a language space, which are then used as targets for semantically meaningful feature prediction using our masked encoder-decoder structure. FILS demonstrates remarkable transferability on downstream action recognition tasks, achieving state-of-the-art on challenging egocentric datasets, like Epic-Kitchens, Something-SomethingV2, Charades-Ego, and EGTEA, using ViT-Base. Our efficient method requires less computation and smaller batches compared to previous works.
Published: 2024

2. MOFO: MOtion FOcused Self-Supervision for Video Understanding

Author: Ahmadian, Mona, Guerin, Frank, and Gilbert, Andrew
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-supervised learning (SSL) techniques have recently produced outstanding results in learning visual representations from unlabeled videos. Despite the importance of motion in supervised learning techniques for action recognition, SSL methods often do not explicitly consider motion information in videos. To address this issue, we propose MOFO (MOtion FOcused), a novel SSL method for focusing representation learning on the motion area of a video, for action recognition. MOFO automatically detects motion areas in videos and uses these to guide the self-supervision task. We use a masked autoencoder which randomly masks out a high proportion of the input sequence; we force a specified percentage of the inside of the motion area to be masked and the remainder from outside. We further incorporate motion information into the finetuning step to emphasise motion in the downstream task. We demonstrate that our motion-focused innovations can significantly boost the performance of the currently leading SSL method (VideoMAE) for action recognition. Our method improves the recent self-supervised Vision Transformer (ViT), VideoMAE, by achieving +2.6%, +2.1%, +1.3% accuracy on Epic-Kitchens verb, noun and action classification, respectively, and +4.7% accuracy on Something-Something V2 action classification. Our proposed approach significantly improves the performance of the current SSL method for action recognition, indicating the importance of explicitly encoding motion in SSL., Comment: Accepted at the NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice
Published: 2023

3. Heterogeneous Graph Learning for Acoustic Event Classification

Author: Shirian, Amir, Ahmadian, Mona, Somandepalli, Krishna, and Guha, Tanaya
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities. This makes modeling audiovisual data using heterogeneous graphs an attractive option. However, graph structure does not appear naturally in audiovisual data. Graphs for audiovisual data are constructed manually which is both difficult and sub-optimal. In this work, we address this problem by (i) proposing a parametric graph construction strategy for the intra-modal edges, and (ii) learning the crossmodal edges. To this end, we develop a new model, heterogeneous graph crossmodal network (HGCN) that learns the crossmodal edges. Our proposed model can adapt to various spatial and temporal scales owing to its parametric construction, while the learnable crossmodal edges effectively connect the relevant nodes across modalities. Experiments on a large benchmark dataset (AudioSet) show that our model is state-of-the-art (0.53 mean average precision), outperforming transformer-based models and other graph-based models., Comment: arXiv admin note: text overlap with arXiv:2207.07935
Published: 2023

4. Heterogeneous Graph Learning for Acoustic Event Classification

Author: Shirian, Amir, primary, Ahmadian, Mona, additional, Somandepalli, Krishna, additional, and Guha, Tanaya, additional
Published: 2023
Full Text: View/download PDF

5. Future Image Prediction of Plantar Pressure During Gait Using Spatio-temporal Transformer

Author: Ahmadian, Mona, primary, Rahmani-Boldaji, Sadegh, additional, and Shirian, Amir, additional
Published: 2022
Full Text: View/download PDF

6. Unsupervised Generative Adversarial Network for Plantar Pressure Image-to-Image Translation

Author: Ahmadian, Mona, primary, Beheshti, Mohammad TH., additional, Kalhor, Ahmad, additional, and Shirian, Amir, additional
Published: 2021
Full Text: View/download PDF

7. The Differential Impact of Linguistic Experience on the Discrimination and Categorization of Non-Native Sounds in Foreign Language Learners

Author: Nemati, Fatemeh, Ahmadian, Mona, and Abbasi, Abbas
Abstract: Speech perception has been extensively proven to be modulated by exposure to native language. As the perception of nonnative sounds is predicted to be influenced by the phonological experiences of learners, it is worthwhile to study the perception of non-native speech sounds by learners who share the same language learning ecology but possess different linguistic repertoires. This study focuses on the perception of English dental fricatives, [θ] and [ð], by Persian and Arabic-Persian EFL learners, with the former lacking these sounds in their L1 and the latter having them in their L1 (Arabic) phonological system. To examine the perception of these sounds by both groups and their perceptual substitutes, 90 Iranian EFL learners – 32 Arabic-Persian bilinguals and 58 Persian monolinguals – completed a discrimination and an identification task. Although the results indicated a significant difference only in the identification of [θ], the trend showed that Arabic-Persian learners were more successful in the two tasks, presumably due to activating perceptual routines from their L1. The dominant substitution made in the two tasks by both groups reveals the prominence of acoustic features rather than articulatory similarity in the perception of the dental fricatives., Keywords: English, Persian, Arabic, dental fricatives, non-native speech perception, discrimination, identification, perceptual substitution, monolinguals, bilinguals, Grazer Linguistische Studien 92 (Herbst 2020); S. 21-43
Published: 2020
Full Text: View/download PDF

8. Grazer Linguistische Studien / The Differential Impact of Linguistic Experience on the Discrimination and Categorization of Non-Native Sounds in Foreign Language Learners

Author: Nemati, Fatemeh, Ahmadian, Mona, and Abbasi, Abbas
Published: 2020

9. Future Image Prediction of Plantar Pressure During Gait Using Spatio-temporal Transformer.

Author: Ahmadian M, Rahmani-Boldaji S, and Shirian A
Subjects: Electric Power Supplies, Gait, Humans, Quality of Life, Exoskeleton Device, Robotics
Abstract: Gait is one of the most frequently used forms of human movement during daily activities. The majority of works focus on exploring the dynamic factors during gait. Different from previous works, we adapt an image prediction task for anticipating the next frame in process of gait. In this work, we present a novel framework for human gait plantar pressure prediction using Spatio-temporal Transformer. We train the model to predict the next plantar pressure image in an image series while also learning frame feature encoders that predict the features of subsequent frames in the sequence. We proposed two new components in our loss function for considering temporality as well as smaller values in the image. Our model achieves superior results over several competitive baselines on the CAD WALK database. Clinical Relevance- This work can be used in robotic exoskeleton devices which are intelligent systems designed to improve gait performance and quality of life for the wearer that are being used to assist the recovery of walking ability for patients with disorders.
Published: 2022
Full Text: View/download PDF

10. Unsupervised Generative Adversarial Network for Plantar Pressure Image-to-Image Translation.

Author: Ahmadian M, Beheshti MT, Kalhor A, and Shirian A
Subjects: Humans, Image Processing, Computer-Assisted
Abstract: Analyzing human gait from plantar pressure is critical for human health. The majority of works focus on classifying the healthy plantar pattern from unhealthy ones. Different from previous works, we adopt a generative adversarial network to produce healthy plantar pressure image for individual patients. In this work, we do not have pairs of images for training thus we cast the problem as an unsupervised generative adversarial learning task. Our network benefits from multiple components: an encoder-decoder generator, a convolution-based discriminator, a convolution-based evaluation network, and a new term in the loss function to preserve the person's gait style. Our method achieves high performance (99.8%) on the CAD WALK databases which have patients with hallux valgus disease.
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Ahmadian, Mona"'

1. FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

2. MOFO: MOtion FOcused Self-Supervision for Video Understanding

3. Heterogeneous Graph Learning for Acoustic Event Classification

4. Heterogeneous Graph Learning for Acoustic Event Classification

5. Future Image Prediction of Plantar Pressure During Gait Using Spatio-temporal Transformer

6. Unsupervised Generative Adversarial Network for Plantar Pressure Image-to-Image Translation

7. The Differential Impact of Linguistic Experience on the Discrimination and Categorization of Non-Native Sounds in Foreign Language Learners

8. Grazer Linguistische Studien / The Differential Impact of Linguistic Experience on the Discrimination and Categorization of Non-Native Sounds in Foreign Language Learners

9. Future Image Prediction of Plantar Pressure During Gait Using Spatio-temporal Transformer.

10. Unsupervised Generative Adversarial Network for Plantar Pressure Image-to-Image Translation.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

10 results on '"Ahmadian, Mona"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources