Author: "Farajtabar, Mehrdad" / Topic: machine learning (cs.lg) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Farajtabar, Mehrdad"' showing total 22 results

Start Over Author "Farajtabar, Mehrdad" Topic machine learning (cs.lg)

22 results on '"Farajtabar, Mehrdad"'

1. Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

Author: Faghri, Fartash, Pouransari, Hadi, Mehta, Sachin, Farajtabar, Mehrdad, Farhadi, Ali, Rastegari, Mohammad, and Tuzel, Oncel
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks (e.g., segmentation and detection). As an example, the accuracy of ResNet-50 improves by 1.7% on the ImageNet validation set, 3.5% on ImageNetV2, and 10.0% on ImageNet-R. Expected Calibration Error (ECE) on the ImageNet validation set is also reduced by 9.9%. Using this backbone with Mask-RCNN for object detection on MS-COCO, the mean average precision improves by 0.8%. We reach similar gains for MobileNets, ViTs, and Swin-Transformers. For MobileNetV3 and Swin-Tiny we observe significant improvements on ImageNet-R/A/C of up to 10% improved robustness. Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4% improved accuracy.
Published: 2023

2. Continual Learning Beyond a Single Model

Author: Doan, Thang, Mirzadeh, Seyed Iman, and Farajtabar, Mehrdad
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles' training and inference costs can increase significantly as the number of models grows. Motivated by this limitation, we study different ensemble models to understand their benefits and drawbacks in continual learning scenarios. Finally, to overcome the high compute cost of ensembles, we leverage recent advances in neural network subspace to propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles., Accepted to 2nd Conference on Lifelong Learning Agents (CoLLAs 2023); Keywords: continual learning, neural network subspaces, ensemble models, computationally efficient training
Published: 2022

3. An Empirical Study of Implicit Regularization in Deep Offline RL

Author: Gulcehre, Caglar, Srinivasan, Srivatsan, Sygnowski, Jakub, Ostrovski, Georg, Farajtabar, Mehrdad, Hoffman, Matt, Pascanu, Razvan, and Doucet, Arnaud
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called \textit{effective rank}, has been observed to drastically collapse during the training. In turn, this collapse has been argued to reduce the model's ability to further adapt in later stages of learning, leading to the diminished final performance. Such an association between the effective rank and performance makes effective rank compelling for offline RL, primarily for offline policy evaluation. In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind lab. We observe that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps. Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank. Further, we show that several other factors could confound the relationship between effective rank and performance and conclude that studying this association under simplistic assumptions could be highly misleading., Comment: 40 pages, 37 figures, 2 tables
Published: 2022
Full Text: View/download PDF

4. Wide Neural Networks Forget Less Catastrophically

Author: Mirzadeh, Seyed Iman, Chaudhry, Arslan, Yin, Dong, Hu, Huiyi, Pascanu, Razvan, Gorur, Dilan, and Farajtabar, Mehrdad
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. To explain this effect, we study the learning dynamics of the network from various perspectives such as gradient orthogonality, sparsity, and lazy training regime. We provide potential explanations that are consistent with the empirical results across different architectures and continual learning benchmarks., ICML 2022
Published: 2021

5. Task-agnostic Continual Learning with Hybrid Probabilistic Models

Author: Kirichenko, Polina, Farajtabar, Mehrdad, Rao, Dushyant, Lakshminarayanan, Balaji, Levine, Nir, Li, Ang, Hu, Huiyi, Wilson, Andrew Gordon, and Pascanu, Razvan
Subjects: FOS: Computer and information sciences, Computer Science::Machine Learning, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the model's statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.
Published: 2021

6. Linear Mode Connectivity in Multitask and Continual Learning

Author: Mirzadeh, Seyed Iman, Farajtabar, Mehrdad, Gorur, Dilan, Pascanu, Razvan, and Ghasemzadeh, Hassan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: Continual (sequential) training and multitask (simultaneous) training are often attempting to solve the same overall objective: to find a solution that performs well on all considered tasks. The main difference is in the training regimes, where continual learning can only have access to one task at a time, which for neural networks typically leads to catastrophic forgetting. That is, the solution found for a subsequent task does not perform well on the previous ones anymore. However, the relationship between the different minima that the two training regimes arrive at is not well understood. What sets them apart? Is there a local structure that could explain the difference in performance achieved by the two different schemes? Motivated by recent work showing that different minima of the same task are typically connected by very simple curves of low error, we investigate whether multitask and continual solutions are similarly connected. We empirically find that indeed such connectivity can be reliably achieved and, more interestingly, it can be done by a linear path, conditioned on having the same initialization for both. We thoroughly analyze this observation and discuss its significance for the continual learning process. Furthermore, we exploit this finding to propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution. We show that our method outperforms several state of the art continual learning algorithms on various vision benchmarks.
Published: 2020

7. The Effectiveness of Memory Replay in Large Scale Continual Learning

Author: Balaji, Yogesh, Farajtabar, Mehrdad, Yin, Dong, Mott, Alex, and Li, Ang
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: We study continual learning in the large scale setting where tasks in the input sequence are not limited to classification, and the outputs can be of high dimension. Among multiple state-of-the-art methods, we found vanilla experience replay (ER) still very competitive in terms of both performance and scalability, despite its simplicity. However, a degraded performance is observed for ER with small memory. A further visualization of the feature space reveals that the intermediate representation undergoes a distributional drift. While existing methods usually replay only the input-output pairs, we hypothesize that their regularization effect is inadequate for complex deep models and diverse tasks with small replay buffer size. Following this observation, we propose to replay the activation of the intermediate layers in addition to the input-output pairs. Considering that saving raw activation maps can dramatically increase memory and compute cost, we propose the Compressed Activation Replay technique, where compressed representations of layer activation are saved to the replay buffer. We show that this approach can achieve superior regularization effect while adding negligible memory overhead to replay method. Experiments on both the large-scale Taskonomy benchmark with a diverse set of tasks and standard common datasets (Split-CIFAR and Split-miniImageNet) demonstrate the effectiveness of the proposed method., 15 pages
Published: 2020

8. Orthogonal Gradient Descent for Continual Learning

Author: Farajtabar, Mehrdad, Azizan, Navid, Mott, Alex, and Li, Ang
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method.
Published: 2020

9. Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

Author: Yin, Dong, Farajtabar, Mehrdad, Li, Ang, Levine, Nir, and Mott, Alex
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary classes of methods to alleviate catastrophic forgetting. In this paper, we provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. This viewpoint leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning. Our theoretical results indicate the importance of accurate approximation of the Hessian matrix. The experimental results on several benchmarks provide empirical validation of our theoretical findings., Preliminary version with a different title presented at ICML Workshop on Continual Learning, 2020 (spotlight)
Published: 2020

10. A maximum-entropy approach to off-policy evaluation in average-reward MDPs

Author: Lazic, Nevena, Yin, Dong, Farajtabar, Mehrdad, Levine, Nir, Gorur, Dilan, Harris, Chris, and Schuurmans, Dale
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.
Published: 2020

11. Understanding the Role of Training Regimes in Continual Learning

Author: Mirzadeh, Seyed Iman, Farajtabar, Mehrdad, Pascanu, Razvan, and Ghasemzadeh, Hassan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes -- learning rate, batch size, regularization method-- can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.
Published: 2020

12. Learning to Incentivize Other Learning Agents

Author: Yang, Jiachen, Li, Ang, Farajtabar, Mehrdad, Sunehag, Peter, Hughes, Edward, and Zha, Hongyuan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning, Machine Learning (stat.ML), Computer Science - Multiagent Systems, ComputingMethodologies_ARTIFICIALINTELLIGENCE, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT), Multiagent Systems (cs.MA)
Abstract: The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and acting in a shared multi-agent environment? Observing that humans often provide incentives to influence others' behavior, we propose to equip each RL agent in a multi-agent environment with the ability to give rewards directly to other agents, using a learned incentive function. Each agent learns its own incentive function by explicitly accounting for its impact on the learning of recipients and, through them, the impact on its own extrinsic objective. We demonstrate in experiments that such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games, often by finding a near-optimal division of labor. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future., 20 pages, 11 figures. To appear in 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Published: 2020

13. Self-Distillation Amplifies Regularization in Hilbert Space

Author: Mobahi, Hossein, Farajtabar, Mehrdad, and Bartlett, Peter L.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Quantum Physics, Machine Learning (cs.LG)
Abstract: Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data. Why this happens, however, has been a mystery: the self-distillation dynamics does not receive any new information about the task and solely evolves by looping over training. To the best of our knowledge, there is no rigorous understanding of this phenomenon. This work provides the first theoretical analysis of self-distillation. We focus on fitting a nonlinear function to training data, where the model space is Hilbert space and fitting is subject to $\ell_2$ regularization in this function space. We show that self-distillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-fitting, further rounds may lead to under-fitting and thus worse performance.
Published: 2020

14. Balance Regularized Neural Network Models for Causal Effect Estimation

Author: Farajtabar, Mehrdad, Lee, Andrew, Feng, Yuanjian, Gupta, Vishal, Dolan, Peter, Chandran, Harish, and Szummer, Martin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Estimating individual and average treatment effects from observational data is an important problem in many domains such as healthcare and e-commerce. In this paper, we advocate balance regularization of multi-head neural network architectures. Our work is motivated by representation learning techniques to reduce differences between treated and untreated distributions that potentially arise due to confounding factors. We further regularize the model by encouraging it to predict control outcomes for individuals in the treatment group that are similar to control outcomes in the control group. We empirically study the bias-variance trade-off between different weightings of the regularizers, as well as between inductive and transductive inference., Comment: Causal Discovery & Causality-Inspired Machine Learning Workshop at Neural Information Processing Systems, 2020
Published: 2020
Full Text: View/download PDF

15. Adapting Auxiliary Losses Using Gradient Similarity

Author: Du, Yunshu, Czarnecki, Wojciech M., Jayakumar, Siddhant M., Farajtabar, Mehrdad, Pascanu, Razvan, and Lakshminarayanan, Balaji
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main task and when it could start hurting. We propose to use the cosine similarity between gradients of tasks as an adaptive weight to detect when an auxiliary loss is helpful to the main loss. We show that our approach is guaranteed to converge to critical points of the main task and demonstrate the practical usefulness of the proposed algorithm in a few domains: multi-task supervised learning on subsets of ImageNet, reinforcement learning on gridworld, and reinforcement learning on Atari games.
Published: 2018

16. Representation Learning over Dynamic Graphs

Author: Trivedi, Rakshit, Farajtabar, Mehrdad, Biswal, Prasenjeet, and Zha, Hongyuan
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: How can we effectively encode evolving information over dynamic graphs into low-dimensional representations? In this paper, we propose DyRep, an inductive deep representation learning framework that learns a set of functions to efficiently produce low-dimensional node embeddings that evolves over time. The learned embeddings drive the dynamics of two key processes namely, communication and association between nodes in dynamic graphs. These processes exhibit complex nonlinear dynamics that evolve at different time scales and subsequently contribute to the update of node embeddings. We employ a time-scale dependent multivariate point process model to capture these dynamics. We devise an efficient unsupervised learning procedure and demonstrate that our approach significantly outperforms representative baselines on two real-world datasets for the problem of dynamic link prediction and event time prediction.
Published: 2018
Full Text: View/download PDF

17. Wasserstein Learning of Deep Generative Point Process Models

Author: Xiao, Shuai, Farajtabar, Mehrdad, Ye, Xiaojing, Yan, Junchi, Song, Le, and Zha, Hongyuan
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Point processes are becoming very popular in modeling asynchronous sequential data due to their sound mathematical foundation and strength in modeling a variety of real-world phenomena. Currently, they are often characterized via intensity function which limits model's expressiveness due to unrealistic assumptions on its parametric form used in practice. Furthermore, they are learned via maximum likelihood approach which is prone to failure in multi-modal distributions of sequences. In this paper, we propose an intensity-free approach for point processes modeling that transforms nuisance processes to a target one. Furthermore, we train the model using a likelihood-free leveraging Wasserstein distance between point processes. Experiments on various synthetic and real-world data substantiate the superiority of the proposed point process model over conventional ones.
Published: 2017

18. Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks

Author: Xiao, Shuai, Yan, Junchi, Farajtabar, Mehrdad, Song, Le, Yang, Xiaokang, and Zha, Hongyuan
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Machine Learning (cs.LG)
Abstract: A variety of real-world processes (over networks) produce sequences of data whose complex temporal dynamics need to be studied. More especially, the event timestamps can carry important information about the underlying network dynamics, which otherwise are not available from the time-series evenly sampled from continuous signals. Moreover, in most complex processes, event sequences and evenly-sampled times series data can interact with each other, which renders joint modeling of those two sources of data necessary. To tackle the above problems, in this paper, we utilize the rich framework of (temporal) point processes to model event data and timely update its intensity function by the synergic twin Recurrent Neural Networks (RNNs). In the proposed architecture, the intensity function is synergistically modulated by one RNN with asynchronous events as input and another RNN with time series as input. Furthermore, to enhance the interpretability of the model, the attention mechanism for the neural point process is introduced. The whole model with event type and timestamp prediction output layers can be trained end-to-end and allows a black-box treatment for modeling the intensity. We substantiate the superiority of our model in synthetic data and three real-world benchmark datasets., Comment: 14 pages
Published: 2017
Full Text: View/download PDF

19. Fake News Mitigation via Point Process Based Intervention

Author: Farajtabar, Mehrdad, Yang, Jiachen, Ye, Xiaojing, Xu, Huan, Trivedi, Rakshit, Khalil, Elias, Li, Shuang, Song, Le, and Zha, Hongyuan
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Learning, Computer Science - Social and Information Networks, Machine Learning (cs.LG)
Abstract: We propose the first multistage intervention framework that tackles fake news in social networks by combining reinforcement learning with a point process network activity model. The spread of fake news and mitigation events within the network is modeled by a multivariate Hawkes process with additional exogenous control terms. By choosing a feature representation of states, defining mitigation actions and constructing reward functions to measure the effectiveness of mitigation activities, we map the problem of fake news mitigation into the reinforcement learning framework. We develop a policy iteration method unique to the multivariate networked point process, with the goal of optimizing the actions for maximal total reward under budget constraints. Our method shows promising performance in real-time intervention experiments on a Twitter network to mitigate a surrogate fake news campaign, and outperforms alternatives on synthetic datasets., Comment: Point Process, Hawkes Process, Social Networks, Intervention and Control, Reinforcement Learning, ICML 2017
Published: 2017
Full Text: View/download PDF

20. Detecting weak changes in dynamic events over networks

Author: Li, Shuang, Xie, Yao, Farajtabar, Mehrdad, Verma, Apurv, and Song, Le
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Large volume of networked streaming event data are becoming increasingly available in a wide variety of applications, such as social network analysis, Internet traffic monitoring and healthcare analytics. Streaming event data are discrete observation occurred in continuous time, and the precise time interval between two events carries a great deal of information about the dynamics of the underlying systems. How to promptly detect changes in these dynamic systems using these streaming event data? In this paper, we propose a novel change-point detection framework for multi-dimensional event data over networks. We cast the problem into sequential hypothesis test, and derive the likelihood ratios for point processes, which are computed efficiently via an EM-like algorithm that is parameter-free and can be computed in a distributed fashion. We derive a highly accurate theoretical characterization of the false-alarm-rate, and show that it can achieve weak signal detection by aggregating local statistics over time and networks. Finally, we demonstrate the good performance of our algorithm on numerical examples and real-world datasets from twitter and Memetracker.
Published: 2016

21. Learning Granger Causality for Hawkes Processes

Author: Xu, Hongteng, Farajtabar, Mehrdad, and Zha, Hongyuan
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Learning Granger causality for general point processes is a very challenging task. In this paper, we propose an effective method, learning Granger causality, for a special but significant type of point processes --- Hawkes process. We reveal the relationship between Hawkes process's impact function and its Granger causality graph. Specifically, our model represents impact functions using a series of basis functions and recovers the Granger causality graph via group sparsity of the impact functions' coefficients. We propose an effective learning algorithm combining a maximum likelihood estimator (MLE) with a sparse-group-lasso (SGL) regularizer. Additionally, the flexibility of our model allows to incorporate the clustering structure event types into learning framework. We analyze our learning algorithm and propose an adaptive procedure to select basis functions. Experiments on both synthetic and real-world data show that our method can learn the Granger causality graph and the triggering patterns of the Hawkes processes simultaneously., International Conference on Machine Learning, 2016
Published: 2016

22. A Continuous-time Mutually-Exciting Point Process Framework for Prioritizing Events in Social Media

Author: Farajtabar, Mehrdad, Yousefi, Safoora, Tran, Long Q., Song, Le, and Zha, Hongyuan
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Learning, Computer Science - Social and Information Networks, Machine Learning (cs.LG)
Abstract: The overwhelming amount and rate of information update in online social media is making it increasingly difficult for users to allocate their attention to their topics of interest, thus there is a strong need for prioritizing news feeds. The attractiveness of a post to a user depends on many complex contextual and temporal features of the post. For instance, the contents of the post, the responsiveness of a third user, and the age of the post may all have impact. So far, these static and dynamic features has not been incorporated in a unified framework to tackle the post prioritization problem. In this paper, we propose a novel approach for prioritizing posts based on a feature modulated multi-dimensional point process. Our model is able to simultaneously capture textual and sentiment features, and temporal features such as self-excitation, mutual-excitation and bursty nature of social interaction. As an evaluation, we also curated a real-world conversational benchmark dataset crawled from Facebook. In our experiments, we demonstrate that our algorithm is able to achieve the-state-of-the-art performance in terms of analyzing, predicting, and prioritizing events. In terms of interpretability of our method, we observe that features indicating individual user profile and linguistic characteristics of the events work best for prediction and prioritization of new events.
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

22 results on '"Farajtabar, Mehrdad"'

1. Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

2. Continual Learning Beyond a Single Model

3. An Empirical Study of Implicit Regularization in Deep Offline RL

4. Wide Neural Networks Forget Less Catastrophically

5. Task-agnostic Continual Learning with Hybrid Probabilistic Models

6. Linear Mode Connectivity in Multitask and Continual Learning

7. The Effectiveness of Memory Replay in Large Scale Continual Learning

8. Orthogonal Gradient Descent for Continual Learning

9. Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint

10. A maximum-entropy approach to off-policy evaluation in average-reward MDPs

11. Understanding the Role of Training Regimes in Continual Learning

12. Learning to Incentivize Other Learning Agents

13. Self-Distillation Amplifies Regularization in Hilbert Space

14. Balance Regularized Neural Network Models for Causal Effect Estimation

15. Adapting Auxiliary Losses Using Gradient Similarity

16. Representation Learning over Dynamic Graphs

17. Wasserstein Learning of Deep Generative Point Process Models

18. Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks

19. Fake News Mitigation via Point Process Based Intervention

20. Detecting weak changes in dynamic events over networks

21. Learning Granger Causality for Hawkes Processes

22. A Continuous-time Mutually-Exciting Point Process Framework for Prioritizing Events in Social Media

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

22 results on '"Farajtabar, Mehrdad"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources