Information mining and knowledge learning from sequential data is a field of growing importance in both industrial and academic fields. Sequential data, which is the natural representation format of the information flow in many applications, usually carries enormous information and is able to help researchers gain insights for various tasks such as airport threat detection, cyber-attack detection, recommender system, point-of-interest (POI) prediction, and citation forecasting. This dissertation focuses on developing the methods for sequential data-driven applications and evolutionary dynamics characterization for various topics such as transit service disruption detection, early event detection on social media, technology opportunity discovery, and traffic incident impact analysis. In particular, four specific applications are studied with four proposed novel methods, including a spatiotemporal feature learning framework for transit service disruption detection, a multi-task learning framework for cybersecurity event detection, citation dynamics modeling via multi-context attentional recurrent neural networks, and traffic incident impact forecasting via hierarchical spatiotemporal graph neural networks. For the first of these methods, the existing transit service disruption detection methods usually suffer from two significant shortcomings: 1) failing to modulate the sparsity of the social media feature domain, i.e., only a few important ``particles'' are indeed related to service disruption among the massive volume of data generated every day and 2) ignoring the real-world geographical connections of transit networks as well as the semantic consistency existing in the problem space. This work makes three contributions: 1) developing a spatiotemporal learning framework for metro disruption detection using open-source data, 2) modeling semantic similarity and spatial connectivity among metro lines in feature space, and 3) developing an optimization algorithm for solving the multi-convex and non-smooth objective function efficiently. For the second of these methods, the conventional studies in cybersecurity detection suffer from the following shortcomings: 1) unable to capture weak signals generated by the cyber-attacks on small organizations or individual accounts, 2) lack of generalization of distinct types of security incidents, and 3) failing to consider the relatedness across different types of cyber-attacks in the feature domain. Three contributions are made in this work: 1) formulating the problem of social media-based cyber-attack detection into the multi-task learning framework, 2) modeling multi-type task relatedness in feature space, and 3) developing an efficient algorithm to solve the non-smooth model with inequality constraints. For the third of these methods, conventional citation forecasting methods are using the traditional temporal point process, which suffers from several drawbacks: 1) unable to predict the technological categories of citing documents and thus are incapable of technological diversity assessment, and 2) require prior domain knowledge and thus are hard to extend to different research areas. Two contributions are made in this work: 1) formulating a novel framework to provide long-term citation predictions in an end-to-end fashion by integrating the process of learning intensity function representations and the process of predicting future citations and 2) designing two novel temporal attention mechanisms to improve the model's ability to modulate complicated temporal dependencies and to allow the model to dynamically combine the observation and prediction sides during the learning process. For the fourth of these methods, the previous work treats the traffic sensor readings as the features and views the incident duration prediction as a feature-driven regression, which typically suffers from three drawbacks: 1) ignoring the existence of the road-sensor hierarchical structure in the real-world traffic network, 2) unable to learn and modulate the hidden temporal patterns in the sensor readings, and 3) lack of consideration of the spatial connectivity between arterial roads and traffic sensors. This work makes three significant contributions: 1) designing a hierarchical graph convolutional network architecture for modeling the road-sensor hierarchy, 2) proposing novel spatiotemporal attention mechanism on the sensor- and road-level features for representation learning, and 3) presenting a graph convolutional network-based method for incident representation learning via spatial connectivity modeling and traffic characteristics modulation. Doctor of Philosophy Information mining and knowledge learning from sequential data is a field of growing importance in both industrial and academic fields. Sequential data, which is the natural representation format of the information flow in many applications, usually carries enormous information and is able to help researchers gain insights for various tasks such as airport threat detection, cyber-attack detection, recommender system, point-of-interest (POI) prediction, and citation forecasting. This dissertation focuses on developing the methods for sequential data-driven applications and evolutionary dynamics characterization for various topics such as transit service disruption detection, early event detection on social media, technology opportunity discovery, and traffic incident impact analysis. In particular, four specific applications are studied with four proposed novel methods, including a spatiotemporal feature learning framework for transit service disruption detection, a multi-task learning framework for cybersecurity event detection, citation dynamics modeling via multi-context attentional recurrent neural networks, and traffic incident impact forecasting via hierarchical spatiotemporal graph neural networks. For the first of these methods, the existing transit service disruption detection methods usually suffer from two significant shortcomings: 1) failing to modulate the sparsity of the social media feature domain, i.e., only a few important ``particles'' are indeed related to service disruption among the massive volume of data generated every day and 2) ignoring the real-world geographical connections of transit networks as well as the semantic consistency existing in the problem space. This work makes three contributions: 1) developing a spatiotemporal learning framework for metro disruption detection using open-source data, 2) modeling semantic similarity and spatial connectivity among metro lines in feature space, and 3) developing an optimization algorithm for solving the multi-convex and non-smooth objective function efficiently. For the second of these methods, the conventional studies in cybersecurity detection suffer from the following shortcomings: 1) unable to capture weak signals generated by the cyber-attacks on small organizations or individual accounts, 2) lack of generalization of distinct types of security incidents, and 3) failing to consider the relatedness across different types of cyber-attacks in the feature domain. Three contributions are made in this work: 1) formulating the problem of social media-based cyber-attack detection into the multi-task learning framework, 2) modeling multi-type task relatedness in feature space, and 3) developing an efficient algorithm to solve the non-smooth model with inequality constraints. For the third of these methods, conventional citation forecasting methods are using the traditional temporal point process, which suffers from several drawbacks: 1) unable to predict the technological categories of citing documents and thus are incapable of technological diversity assessment, and 2) require prior domain knowledge and thus are hard to extend to different research areas. Two contributions are made in this work: 1) formulating a novel framework to provide long-term citation predictions in an end-to-end fashion by integrating the process of learning intensity function representations and the process of predicting future citations and 2) designing two novel temporal attention mechanisms to improve the model's ability to modulate complicated temporal dependencies and to allow the model to dynamically combine the observation and prediction sides during the learning process. For the fourth of these methods, the previous work treats the traffic sensor readings as the features and views the incident duration prediction as a feature-driven regression, which typically suffers from three drawbacks: 1) ignoring the existence of the road-sensor hierarchical structure in the real-world traffic network, 2) unable to learn and modulate the hidden temporal patterns in the sensor readings, and 3) lack of consideration of the spatial connectivity between arterial roads and traffic sensors. This work makes three significant contributions: 1) designing a hierarchical graph convolutional network architecture for modeling the road-sensor hierarchy, 2) proposing novel spatiotemporal attention mechanism on the sensor- and road-level features for representation learning, and 3) presenting a graph convolutional network-based method for incident representation learning via spatial connectivity modeling and traffic characteristics modulation.