Descriptor: "sequence mining" / Publication Type: Academic Journals - Searchworks@Jio Institute Digital Library Search Results

1. Dropout is not always a failure! Exploration on the prior knowledge and learning behaviors of MOOC learners

Author: Matcha, Wannisa, Natthaphatwirata, Rusada, Uzir, Nora’ayu Ahmad, and Gašević, Dragan
Published: 2024
Full Text: View/download PDF

2. AirPollutionViz: visual analytics for understanding the spatio-temporal evolution of air pollution.

Author: Yue, Xiaoqi, Feng, Dan, Sun, Desheng, Liu, Chao, Qin, Hongxing, and Hu, Haibo
Abstract: Spatio-temporal evolution analysis has been a critical topic of air pollution research. However, there are still several difficulties caused by the large scale and dimensionality of the data. Specifically, First, traditional methods deal with such data by simplifying and abstracting, resulting in information loss. Second, most existing visualizations, generally focusing on overall evolution, ignore the exploration of multiple time scales and pattern transitions between subsequences. This paper presents AirPollutionViz, a visual analytics system that enables to analyze the spatio-temporal evolution in two manners: sequence mining and clustering analysis. Concretely, we propose sequence merging to shorten the sequence length and construct a weighted directed graph structure, which promotes efficient querying of sequence patterns by combination with dynamic time warping. We design a novel summary view to display the overview of pollution level changes, together with the improved node-link chart, to support the analysis of air pollution spatio-temporal evolution patterns. We also apply K-means clustering to pollutants, and a scatter plot and map reflect the spatial distribution aggregation. The system supports users' free exploration across multiple time scales with rich interactions. Case studies with three domain experts and a user study with ten users demonstrate the usefulness and effectiveness of AirPollutionViz. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Sky-signatures: detecting and characterizing recurrent behavior in sequential data.

Author: Gautrais, Clément, Cellier, Peggy, Guyet, Thomas, Quiniou, René, and Termier, Alexandre
Subjects: NATURAL language processing, DATA mining, POLITICAL oratory, RECURRENT neural networks
Abstract: This paper proposes the sky-signature model, an extension of the signature model Gautrais et al. (in: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Springer, 2017b) to multi-objective optimization. The signature approach considers a sequence of itemsets, and given a number k it returns a segmentation of the sequence in k segments such that the number of items occuring in all segments is maximized. The limitation of this approach is that it requires to manually set k, and thus fixes the temporal granularity at which the data is analyzed. The sky-signature model proposed in this paper removes this requirement, and allows to examine the results at multiple levels of granularity, while keeping a compact output. This paper also proposes efficient algorithms to mine sky-signatures, as well as an experimental validation both real data both from the retail domain and from natural language processing (political speeches). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Efficient Frequent Chronicle Mining Algorithms: Application to Sleep Disorder

Author: Hareth Zmezm, Jose Maria Luna, Eduardo Almeda, and Sebastian Ventura
Subjects: Frequent event graphs, chronicle mining, sequence mining, temporal data mining, sleep disorder, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Sequential pattern mining is a dynamic and thriving research field that aims to extract recurring sequences of events from complex datasets. Traditionally, focusing solely on the order of events often falls short of providing precise insights. Consequently, incorporating the temporal intervals between events has emerged as a vital necessity across various domains, e.g. medicine. Analyzing temporal event sequences within patients’ clinical histories, drug prescriptions, and monitoring alarms exemplifies this critical need. This paper presents innovative and efficient methodologies for mining frequent chronicles from temporal data. The mined graphs offer a significantly more expressive representation than mere event sequences, capturing intricate details of a series of events in a factual manner. The experimental stage includes a series of analyses of diverse databases with distinct characteristics. The proposed approaches were also applied to real-world data comprising information about subjects suffering from sleep disorders. Alluring frequent complete event graphs were obtained on patients who were under the effect of sleep medication.
Published: 2024
Full Text: View/download PDF

5. Capturing temporal pathways of collaborative roles: A multilayered analytical approach using community of inquiry

Author: Elmoazen, Ramy, Saqr, Mohammed, Hirsto, Laura, and Tedre, Matti
Published: 2024
Full Text: View/download PDF

6. A learning analytics perspective on educational escape rooms.

Author: López-Pernas, Sonsoles, Saqr, Mohammed, Gordillo, Aldo, and Barra, Enrique
Subjects: *DATA encryption, *LEARNING, *KNOWLEDGE acquisition (Expert systems), *DATA analysis, *PSYCHOLOGY of students
Abstract: Learning analytics methods have proven useful in providing insights from the increasingly available digital data about students in a variety of learning environments, including serious games. However, such methods have not been applied to the specific context of educational escape rooms and therefore little is known about students' behavior while playing. The present work aims to fill the gap in the existing literature by showcasing the power of learning analytics methods to reveal and represent students' behavior when participating in a computer-supported educational escape room. Specifically, we make use of sequence mining methods to analyze the temporal and sequential aspects of the activities carried out by students during these novel educational games. We further use clustering to identify different player profiles according to the sequential unfolding of students' actions and analyze how these profiles relate to knowledge acquisition. Our results show that students' behavior differed significantly in their use of hints in the escape room and resulted in differences in their knowledge acquisition levels. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Exploration of Latent Structure in Test Revision and Review Log Data.

Author: Zhang, Susu, Li, Anqi, and Wang, Shiyu
Subjects: *DATA logging, *STATISTICAL learning, *COMPUTER adaptive testing, *ACQUISITION of data
Abstract: In computer‐based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable‐length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test‐taking behavior, which can inform test development and instructions. In the current study, we used recently proposed statistical learning methods for sequence data to provide an exploratory analysis of item‐level revision and review log data. Based on the revision log data collected from computer‐based classroom assessments, common prototypes of revisit and review behavior were identified. The relationship between revision behavior and various item, test, and individual covariates was further explored under a Bayesian multivariate generalized linear mixed model. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. Natural Exponent Inertia Weight-based Particle Swarm Optimization for Mining Serial Episode Rules from Event Sequences.

Author: Poongodi, K. and Kumar, Dhananjay
Subjects: *PARTICLE swarm optimization, *DATA structures, *EXPONENTS
Abstract: An episode rule mining to extract useful and important patterns or episodes from large event sequences represents the temporal implication of associating the antecedent and consequent episodes. The existing technique for mining precise-positioning episode rules from event sequences, mines serial episodes resulting in enormous memory consumption. To resolve this issue, the proposed work ensures the generation of fixed-gap episodes and parameter settings through the use of Particle Swarm Optimization mechanism. Fixed-gap episodes are generated using Natural Exponent Inertia Weight-based Particle Swarm Optimization algorithm. In this paper, a new technique called Mining Serial Episode Rules (MSER) is proposed, which utilizes the correlation between episodes and the generation of parameter selection where the occurrence time of an event is specified in the consequent. Further, a trie-based data structure to mine MSER along with a pruning technique is incorporated in the proposed methodology to improve the performance. The efficiency of the proposed algorithm MSER is evaluated on three benchmark data sets Retail, Kosarak, and MSNBC where the experimental results outperform the existing methods with respect to memory usage and execution time. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. The application of advertising logo color design for big data and visual communication technology.

Author: Tian, Huan
Subjects: *TELECOMMUNICATION, *BIG data, *DATA transmission systems, *VISUAL communication, *LOGO design, *COLOR in design
Abstract: Color is one of the three major elements of print advertising, and different color combinations can trigger different emotional experiences of human beings. At present, the application of color in advertising in China is relatively mature, but it is limited to the traditional application method and has not been combined with big data technology. From the perspective of business needs, this research analyzes the process of visual creativity from the perspective of business value-added, and analyzes the role of big data in it. Then it introduces the semantics of common colors and how to incorporate color semantics into advertising design. And a sequence mining-based advertising click-through rate prediction model is proposed. The Criteo dataset is used as the training set. The AUC value of the model is 0.702 and the loss value is 0.415. Compared with other models, AUC values increased by 10.16%, 4.70%, 2.69% and 2.30%, respectively. Losses decreased by 10.17%, 9.19%, 6.11% and 7.57%, respectively. Finally, the online shopping data of 20 consumers was used as the test set to predict their color preferences, and the prediction accuracy was about 70%. Among them, the prediction accuracy of the group with stable shopping habits was 72.76%, and that of the group who liked to try new things was 70.60%, both meeting the expectation. Through experiments, it is concluded that the model has good performance and stability, and can more accurately judge consumers' consumption preferences. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. Mining Frequent Serial Positioning Episode Rules with Forward and Backward Search Technique from Event Sequences.

Author: K, Poongodi and Kumar, Dhananjay
Subjects: *SEARCH algorithms, *RANK correlation (Statistics), *DATABASES, *STATISTICAL correlation
Abstract: A large event sequence can generate episode rules that are patterns which help to identify the possible dependencies existing among event types. Frequent episodes occurring in a simple sequence of events are commonly used for mining the episodes from a sequential database. Mining serial positioning episode rules (MSPER) using a fixed-gap episode occurrence suffers from unsatisfied scalability with complex sequences to test whether an episode occurs in a sequence. Large number of redundant nodes was generated in the MSPER-trie-based data structure. In this paper, forward and backward search algorithm (FBSA) is proposed here to detect minimal occurrences of frequent peak episodes. An extensive correlation of parameter settings and the generating procedure of fixed-gap episodes are carried out. To generate a fixed-gap episode and estimate the variance that decides the parameter selection in event sequences, Spearman's correlation coefficient is used for verifying the sequence of occurrences of the episodes. MFSPER with FBSA is developed to eliminate the frequent sequence scans and redundant event sets. The MFSPER–FBSA stores the minimal occurrences of frequent peak episodes from the event sequences. The experimental evaluation on benchmark datasets shows that the proposed technique outperforms the existing methods with respect to memory, execution time, recall and precision. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. Robust IoT Malware Detection and Classification Using Opcode Category Features on Machine Learning

Author: Hyunjong Lee, Sooin Kim, Dongheon Baek, Donghoon Kim, and Doosung Hwang
Subjects: IoT malware, machine learning, opcode category, sequence mining, visualization, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Technology advancements have led to the use of millions of IoT devices. However, IoT devices are being exploited as an entry point due to security flaws by resource constraints. IoT malware is being discovered in a variety of types. The purpose of this study is to investigate whether IoT malware can be detected from benign and whether various malware family types can be classified. We propose fixed-length and low-dimensional features using opcode category information on ML models. The binary IoT dataset for this study is converted into opcode to create features. The opcodes are categorized into 6 or 11 according to their functionality. Features are created using a sequence of opcode categories and the entropy values of opcode categories. These features can be visualized by using a 2D image in order to observe patterns. We evaluate our proposed features on various ML models (5-NN, SVM, Decision Tree, and Random Forest) and MLP with various performance metrics, such as Accuracy, Precision, Recall, F1-score, MCC, AUC-ROC, and AUC-PR. The performance results for malware detection and classification have an accuracy over 98.0%. The experiments have demonstrated that the features we’ve proposed are effective and robust for identifying different types of IoT malware and benign.
Published: 2023
Full Text: View/download PDF

12. Transferring effective learning strategies across learning contexts matters: A study in problem-based learning.

Author: Saqr, Mohammed, Matcha, Wannisa, Uzir, Nora'ayu Ahmad, Jovanović, Jelena, Gašević, Dragan, and López-Pernas, Sonsoles
Abstract: Learning strategies are important catalysts of students' learning. Research has shown that students with effective learning strategies are more likely to have better academic achievement. This study aimed to investigate students' adoption of learning strategies in different course implementations, the transfer of learning strategies between courses and relationship to performance. We took advantage of recent advances in learning analytics methods, namely sequence and process mining as well as statistical methods and visualisations to study how students regulate their online learning through learning strategies. The study included 81,739 log traces of students' learning related activities from two different problem-based learning medical courses. The results revealed that students who applied deep learning strategies were more likely to score high grades, and students who applied surface learning strategies were more likely to score lower grades in either course. More importantly, students who were able to transfer deep learning strategies or continue to use effective strategies between courses obtained higher scores, and were less likely to adopt surface strategies in the subsequent course. These results highlight the need for supporting the development of effective learning strategies in problem-based learning curricula so that students adopt and transfer effective strategies as they advance through the programme. Implications for practice or policy: • Teachers need to help students develop and transfer deep learning as they are directly related to success. • Students who continue to use light strategies are more at risk of low achievement and need to be supported. • Technology-supported problem-based learning requires more active scaffolding and teachers' support beyond "guide on the side" as in face-to-face. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. The temporal dynamics of online problem-based learning: Why and when sequence matters.

Author: Saqr, Mohammed and López-Pernas, Sonsoles
Subjects: PROBLEM-based learning, ONLINE education, GROUP dynamics, VIRTUAL communities, SOCIAL groups, SOCIAL interaction
Abstract: Early research on online PBL explored student satisfaction, effectiveness, and design. The temporal aspect of online PBL has rarely been addressed. Thus, a gap exists in our knowledge regarding how online PBL unfolds: when and for how long a group engages in collaborative discussions. Similarly, little is known about whether and what sequence of interactions could predict higher achievement. This study aims to bridge such a gap by implementing the latest advances in temporal learning analytics to analyze the sequential and temporal aspects of online PBL across a large sample (n = 204 students) of qualitatively coded interactions (8,009 interactions). We analyzed interactions at the group level to understand the group dynamics across whole problem discussions, and at the student level to understand the students' contribution dynamics across different episodes. We followed such analyses by examining the association of interaction types and the sequences thereof with students' performance using multilevel linear regression models. The analysis of the interactions reflected that the scripted PBL process is followed a logical sequence, yet often lacked enough depth. When cognitive interactions (e.g., arguments, questions, and evaluations) occurred, they kindled high cognitive interactions, when low cognitive and social interactions dominated, they kindled low cognitive interactions. The order and sequence of interactions were more predictive of performance, and with a higher explanatory power as compared to frequencies. Starting or initiating interactions (even with low cognitive content) showed the highest association with performance, pointing to the importance of initiative and sequencing. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. A Sequence Mining-Based Novel Architecture for Detecting Fraudulent Transactions in Healthcare Systems

Author: Irum Matloob, Shoab Ahmed Khan, Rukaiya Rukaiya, Muazzam A. Khan Khattak, and Arslan Munir
Subjects: Fraudsters, health insurance, healthcare, medical benefits, premium, sequence mining, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the exponential rise in government and private health-supported schemes, the number of fraudulent billing cases is also increasing. Detection of fraudulent transactions in healthcare systems is an exigent task due to intricate relationships among dynamic elements, including doctors, patients, and services. Hence, to introduce transparency in health support programs, there is a need to develop intelligent fraud detection models for tracing the loopholes in existing procedures, so that the fraudulent medical billing cases can be accurately identified. Moreover, there is also a need to optimize both the cost burden for the service provider and medical benefits for the client. This paper presents a novel process-based fraud detection methodology to detect insurance claim-related frauds in the healthcare system using sequence mining concepts. Recent literature focuses on the amount-based analysis or medication versus disease sequential analysis rather than detecting frauds using sequence generation of services within each specialty. The proposed methodology generates frequent sequences with different pattern lengths. The confidence values and confidence level are computed for each sequence. The sequence rule engine generates frequent sequences along with confidence values for each hospital’s specialty and compares them with the actual patient values. This identifies anomalies as both sequences would not be compliant with the rule engine’s sequences. The process-based fraud detection methodology is validated using last five years of a local hospital’s transactional data that includes many reported cases of fraudulent activities.
Published: 2022
Full Text: View/download PDF

15. nTreeClus: A tree-based sequence encoder for clustering categorical series.

Author: Jahanshahi, Hadi and Baydogan, Mustafa Gokce
Subjects: *TIME series analysis, *AUTOREGRESSIVE models, *AMINO acid sequence, *DECISION trees, *CHANNEL coding, *NOMOGRAPHY (Mathematics)
Abstract: • A novel model-based clustering approach for sequential data, nTreeClus, is proposed. • nTreeClus introduces a Decision Tree Path encoder in an autoregressive manner. • The method's robustness to its only parameter (window size) has been examined. • nTreeClus shows competitive results compared to existing methods in sequence mining. The overwhelming presence of categorical/sequential data in diverse domains emphasizes the importance of sequence mining. The challenging nature of sequences proves the need for continuing research to find a more accurate and faster approach providing a better understanding of their (dis) similarities. This paper proposes a new Model-based approach for clustering sequence data, namely nTreeClus. The proposed method deploys Tree-based Learners, k -mers, and autoregressive models for categorical time series, culminating with a novel numerical representation of the categorical sequences. Adopting this new representation, we cluster sequences, considering the inherent patterns in categorical time series. Accordingly, the model showed robustness to its parameter. Under different simulated scenarios, nTreeClus improved the baseline methods for various internal and external cluster validation metrics for up to 10.7% and 2.7%, respectively. The empirical evaluation using synthetic and real datasets, protein sequences, and categorical time series showed that nTreeClus is competitive or superior to most state-of-the-art algorithms. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

16. A High-Level Representation of the Navigation Behavior of Website Visitors.

Author: Huidobro, Alicia, Monroy, Raúl, and Cervantes, Bárbara
Subjects: TASK analysis, NAVIGATION, BLOGS, WEB analytics
Abstract: Knowing how visitors navigate a website can lead to different applications. For example, providing a personalized navigation experience or identifying website failures. In this paper, we present a method for representing the navigation behavior of an entire class of website visitors in a moderately small graph, aiming to ease the task of web analysis, especially in marketing areas. Current solutions are mainly oriented to a detailed page-by-page analysis. Thus, obtaining a high-level abstraction of an entire class of visitors may involve the analysis of large amounts of data and become an overwhelming task. Our approach extracts the navigation behavior that is common among a certain class of visitors to create a graph that summarizes class navigation behavior and enables a contrast of classes. The method works by representing website sessions as the sequence of visited pages. Sub-sequences of visited pages of common occurrence are identified as "rules". Then, we replace those rules with a symbol that is given a representative name and use it to obtain a shrinked representation of a session. Finally, this shrinked representation is used to create a graph of the navigation behavior of a visitor class (group of visitors relevant to the desired analysis). Our results show that a few rules are enough to capture a visitor class. Since each class is associated with a conversion, a marketing expert can easily find out what makes classes different. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

17. Students' active cognitive engagement with instructional videos predicts STEM learning.

Author: Kuhlmann, Shelbi L., Plumley, Robert, Evans, Zoe, Bernacki, Matthew L., Greene, Jeffrey A., Hogan, Kelly A., Berro, Michael, Gates, Kathleen, and Panter, Abigail
Abstract: The efficacy of well-designed instructional videos for STEM learning is largely reliant on how actively students cognitively engage with them. Students' ability to actively engage with videos likely depends upon individual characteristics like their prior knowledge. In this study, we investigated how digital trace data could be used as indicators of students' cognitive engagement with instructional videos, how such engagement predicted learning, and how prior knowledge moderated that relationship. One hundred twenty-eight biology undergraduate students learned with a series of instructional videos and took a biology unit exam one week later. We conducted sequence mining on the digital events of students' video-watching behaviors to capture the most commonly occurring sequences. Twenty-six sequences emerged and were aggregated into four groups indicative of cognitive engagement: repeated scrubbing, speed watching, extended scrubbing , and rewinding. Results indicated more active engagement via speed watching and rewinding behaviors positively predicted unit exam scores, but only for students with lower prior knowledge. These findings suggest that the ways students cognitively engage with videos predict how they will learn from them, that these relations are dependent upon their prior knowledge, and that researchers can measure students' cognitive engagement with instructional videos via mining digital log data. This research emphasizes the importance of active cognitive engagement with video interface tools and the need for students to accurately calibrate their learning behaviors in relation to their prior knowledge when learning from videos. • Log data was mined for behavioral sequences reflective of cognitive engagement. • Engagement categories: speed watching, rewinding, frequent and extended scrubbing. • Benefit of speed watching on learning decreases for higher prior knowledge students. • Benefit of rewinding on learning increases for lower prior knowledge students. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Bringing Synchrony and Clarity to Complex Multi-Channel Data: A Learning Analytics Study in Programming Education

Author: Sonsoles Lopez-Pernas and Mohammed Saqr
Subjects: Learning analytics, programming, computer science education, sequence mining, Hidden Markov Models, automated assessment, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Supporting teaching and learning programming with learning analytics is an active area of inquiry. Most data used for learning analytics research comes from learning management systems. However, such systems were not developed to support learning programming. Therefore, educators have to resort to other systems that support the programming process, which can pose a challenge when it comes to understanding students’ learning since it takes place in different contexts. Methods that support the combination of different data sources are needed. Such methods would ideally account for the time-ordered sequence of students’ learning actions. In this article, we use a novel method (multi-channel sequence mining with Hidden Markov Models, HMMs) that allows the combination of multiple data sources, accounts for the temporal nature of students’ learning actions, and maps the transitions between different learning tactics. Our study included 291 students enrolled in a higher education programming course. Students’ trace-log data were collected from the learning management system and from a programming automated assessment tool. Data were analyzed using multi-channel sequence mining and HMM. The results reveal different patterns of students’ approaches to learning programming. High achievers start earlier to work on the programming assignments, use more independent strategies and consume learning resources more frequently, while the low achievers procrastinate early in the course and rely on help forums. Our findings demonstrate the potentials of multi-channel sequence mining and how this method can be analyzed using HMM. Furthermore, the results obtained can be of use for educators to understand students’ strategies when learning programming.
Published: 2021
Full Text: View/download PDF

19. Using Sequence Mining Techniques for Understanding Incorrect Behavioral Patterns on Interactive Tasks.

Author: Ulitzsch, Esther, He, Qiwei, and Pohl, Steffi
Subjects: TASK analysis, TASKS, MINES & mineral resources, EDUCATIONAL evaluation
Abstract: Interactive tasks designed to elicit real-life problem-solving behavior are rapidly becoming more widely used in educational assessment. Incorrect responses to such tasks can occur for a variety of different reasons such as low proficiency levels, low metacognitive strategies, or motivational issues. We demonstrate how behavioral patterns associated with incorrect responses can, in part, be understood, supporting insights into the different sources of failure on a task. To this end, we make use of sequence mining techniques that leverage the information contained in time-stamped action sequences commonly logged in assessments with interactive tasks for (a) investigating what distinguishes incorrect behavioral patterns from correct ones and (b) identifying subgroups of examinees with similar incorrect behavioral patterns. Analyzing a task from the Programme for the International Assessment of Adult Competencies 2012 assessment, we find incorrect behavioral patterns to be more heterogeneous than correct ones. We identify multiple subgroups of incorrect behavioral patterns, which point toward different levels of effort and lack of different subskills needed for solving the task. Albeit focusing on a single task, meaningful patterns of major differences in how examinees approach a given task that generalize across multiple tasks are uncovered. Implications for the construction and analysis of interactive tasks as well as the design of interventions for complex problem-solving skills are derived. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

20. Weighted frequent sequential pattern mining.

Author: Islam, Md Ashraful, Rafi, Mahfuzur Rahman, Azad, Al-amin, and Ovi, Jesan Ahammed
Subjects: SEQUENTIAL pattern mining, DATA mining, MINES & mineral resources, PATTERNS (Mathematics)
Abstract: Trillions of bytes of data are generated every day in different forms, and extracting useful information from that massive amount of data is the study of data mining. Sequential pattern mining is a major branch of data mining that deals with mining frequent sequential patterns from sequence databases. Due to items having different importance in real-life scenarios, they cannot be treated uniformly. With today's datasets, the use of weights in sequential pattern mining is much more feasible. In most cases, as in real-life datasets, pushing weights will give a better understanding of the dataset, as it will also measure the importance of an item inside a pattern rather than treating all the items equally. Many techniques have been introduced to mine weighted sequential patterns, but typically these algorithms generate a massive number of candidate patterns and take a long time to execute. This work aims to introduce a new pruning technique and a complete framework that takes much less time and generates a small number of candidate sequences without compromising with completeness. Performance evaluation on real-life datasets shows that our proposed approach can mine weighted patterns substantially faster than other existing approaches. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

21. Sequence Mining and Prediction-Based Healthcare Fraud Detection Methodology

Author: Irum Matloob, Shoab Ahmed Khan, and Habib Ur Rahman
Subjects: Anomaly, fraudsters, sequence mining, sequence prediction, probability, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: This article presents a novel methodology to detect insurance claim related frauds in the healthcare system using concepts of sequence mining and sequence prediction. Fraud detection in healthcare is a non-trivial task due to the heterogeneous nature of healthcare records. Fraudsters behave as normal patients and with the passage of time keep on changing their way of planting frauds; hence, there is a need to develop fraud detection models. The sequence generation is not the part of previous researches which mostly focus on amount based analysis or medication versus diseases sequential analysis. The proposed methodology is able to generate sequences of services availed or prescribed by each specialty and analyse via two cascaded checks for the detection of insurance claim related frauds. The methodology addresses these challenges and self learns from historical medical records. It is based on two modules namely “Sequence rule engine and Prediction based engine”. The sequence rule engine generates frequent sequences and probabilities of rare sequences for each specialty of the hospital. The comparison of such sequences with the actual patient sequences leads to the identification of anomalies as both sequences are not compliant to the sequences of the rule engine. The system performs further in detail analysis on all non-compliant sequences in the prediction based engine. The proposed methodology is validated by generating patient sequences from last five years transactional data of a local hospital and identifies patterns of service procedures administered to patients using Prefixspan algorithm and Compact prediction tree. Various experiments have been performed to validate the applicability of the developed methodology and the results demonstrate that the methodology is pertinent to detect healthcare frauds and provides on average 85% of accuracy. Thus can help in preventing fraudulent claims and provides better insight into how to improve patient management and treatment procedures.
Published: 2020
Full Text: View/download PDF

22. Automated gadget discovery in the quantum domain

Author: Lea M Trenkwalder, Andrea López-Incera, Hendrik Poulsen Nautrup, Fulvio Flamini, and Hans J Briegel
Subjects: reinforcement learning, machine learning, sequence mining, quantum optics, quantum information, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: In recent years, reinforcement learning (RL) has become increasingly successful in its application to the quantum domain and the process of scientific discovery in general. However, while RL algorithms learn to solve increasingly complex problems, interpreting the solutions they provide becomes ever more challenging. In this work, we gain insights into an RL agent’s learned behavior through a post-hoc analysis based on sequence mining and clustering. Specifically, frequent and compact subroutines, used by the agent to solve a given task, are distilled as gadgets and then grouped by various metrics. This process of gadget discovery develops in three stages: First, we use an RL agent to generate data, then, we employ a mining algorithm to extract gadgets and finally, the obtained gadgets are grouped by a density-based clustering algorithm. We demonstrate our method by applying it to two quantum-inspired RL environments. First, we consider simulated quantum optics experiments for the design of high-dimensional multipartite entangled states where the algorithm finds gadgets that correspond to modern interferometer setups. Second, we consider a circuit-based quantum computing environment where the algorithm discovers various gadgets for quantum information processing, such as quantum teleportation. This approach for analyzing the policy of a learned agent is agent and environment agnostic and can yield interesting insights into any agent’s policy.
Published: 2023
Full Text: View/download PDF

23. Comparative analysis of real-world data of frequent treatment sequences in metastatic prostate cancer.

Author: Jaipuria J, Kaur I, Doja MN, Ahmad T, Singh A, Rawal SK, Talwar V, and Sharma G
Abstract: Background: The incidence of prostate cancer is increasing worldwide. A significant proportion of patients develop metastatic disease and are initially prescribed androgen deprivation therapy (ADT). However, subsequent sequences of treatments in real-world settings that may improve overall survival remain an area of active investigation., Materials and Methods: Data were collected from 384 patients presenting with de novo metastatic prostate cancer from 2011 to 2015 at a tertiary cancer center. Patients were categorized into surviving (n = 232) and deceased (n = 152) groups at the end of 3 years. Modified sequence pattern mining techniques (Generalized Sequential Pattern Mining and Sequential Pattern Discovery using Equivalence Classes) were applied to determine the exact order of the most frequent sets of treatments in each group., Results: Degarelix, as the initial form of ADT, was uniquely in the surviving group. The sequence of ADT followed by abiraterone and docetaxel was uniquely associated with a higher 3-year overall survival. Orchiectomy followed by fosfestrol was found to have a unique niche among surviving patients with a long duration of response to the initial ADT. Patients who received chemotherapy followed by radiotherapy and those who received radiotherapy followed by chemotherapy were found more frequently in the deceased group., Conclusions: We identified unique treatment sequences among surviving and deceased patients at the end of 3 years. Degarelix should be the preferred form of ADT. Patients who received ADT followed by abiraterone and chemotherapy showed better results. Patients requiring palliative radiation and chemotherapy in any sequence were significantly more frequent in the deceased group, identifying the need to offer such patients the most efficacious agents and to target them in clinical trial design., Competing Interests: No conflict of interest has been declared by the author., (Copyright © 2023 The Authors. Published by Wolters Kluwer Health, Inc.)
Published: 2024
Full Text: View/download PDF

24. The relational, co-temporal, contemporaneous, and longitudinal dynamics of self-regulation for academic writing.

Author: Saqr, Mohammed, Peeters, Ward, and Viberg, Olga
Subjects: ACADEMIC discourse, SELF regulation, ENGLISH as a foreign language, HYPERSONIC aerodynamics, WRITING processes
Abstract: Writing in an academic context often requires students in higher education to acquire a new set of skills while familiarising themselves with the goals, objectives and requirements of the new learning environment. Students' ability to continuously self-regulate their writing process, therefore, is seen as a determining factor in their learning success. In order to study students' self-regulated learning (SRL) behaviour, research has increasingly been tapping into learning analytics (LA) methods in recent years, making use of multimodal trace data that can be obtained from students writing and working online. Nevertheless, little is still known about the ways students apply and govern SRL processes for academic writing online, and about how their SRL behaviour might change over time. To provide new perspectives on the use of LA approaches to examine SRL, this study applied a range of methods to investigate what they could tell us about the evolution of SRL tactics and strategies on a relational, co-temporal, contemporaneous and longitudinal level. The data originates from a case study in which a private Facebook group served as an online collaboration space in a first-year academic writing course for foreign language majors of English. The findings show that learners use a range of SRL tactics to manage their writing tasks and that different tactic can take up key positions in this process over time. Several shifts could be observed in students' behaviour, from mainly addressing content-specific topics to more form-specific and social ones. Our results have also demonstrated that different methods can be used to study the relational, co-temporal, contemporaneous, and longitudinal dynamics of self-regulation in this regard, demonstrating the wealth of insights LA methods can bring to the table. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

25. Abnormal Event Correlation and Detection Based on Network Big Data Analysis.

Author: Zhichao Hu, Xiangzhan Yu, Jiantao Shi, and Lin Ye
Subjects: DATA analysis, DATA mining, SECURITY systems, BIG data, CRYPTOCURRENCY mining, ALARMS
Abstract: With the continuous development of network technology, various large-scale cyber-attacks continue to emerge. These attacks pose a severe threat to the security of systems, networks, and data. Therefore, how to mine attack patterns from massive data and detect attacks are urgent problems. In this paper, an approach for attack mining and detection is proposed that performs tasks of alarm correlation, false-positive elimination, attack mining, and attack prediction. Based on the idea of CluStream, the proposed approach implements a flow clustering method and a two-step algorithm that guarantees efficient streaming and clustering. The context of an alarm in the attack chain is analyzed and the LightGBM method is used to perform falsepositive recognition with high accuracy. To accelerate the search for the filtered alarm sequence data to mine attack patterns, the PrefixSpan algorithm is also updated in the store strategy. The updated PrefixSpan increases the processing efficiency and achieves a better result than the original one in experiments. With Bayesian theory, the transition probability for the sequence pattern string is calculated and the alarm transition probability table constructed to draw the attack graph. Finally, a long-short-term memory network and embedding word-vector method are used to perform online prediction. Results of numerical experiments show that the method proposed in this paper has a strong practical value for attack detection and prediction. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

26. A High-Level Representation of the Navigation Behavior of Website Visitors

Author: Alicia Huidobro, Raúl Monroy, and Bárbara Cervantes
Subjects: web analytics, web log mining, clickstream analysis, sequence mining, sequitur, graph techniques, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Knowing how visitors navigate a website can lead to different applications. For example, providing a personalized navigation experience or identifying website failures. In this paper, we present a method for representing the navigation behavior of an entire class of website visitors in a moderately small graph, aiming to ease the task of web analysis, especially in marketing areas. Current solutions are mainly oriented to a detailed page-by-page analysis. Thus, obtaining a high-level abstraction of an entire class of visitors may involve the analysis of large amounts of data and become an overwhelming task. Our approach extracts the navigation behavior that is common among a certain class of visitors to create a graph that summarizes class navigation behavior and enables a contrast of classes. The method works by representing website sessions as the sequence of visited pages. Sub-sequences of visited pages of common occurrence are identified as “rules”. Then, we replace those rules with a symbol that is given a representative name and use it to obtain a shrinked representation of a session. Finally, this shrinked representation is used to create a graph of the navigation behavior of a visitor class (group of visitors relevant to the desired analysis). Our results show that a few rules are enough to capture a visitor class. Since each class is associated with a conversion, a marketing expert can easily find out what makes classes different.
Published: 2022
Full Text: View/download PDF

27. MIMVOGUE: modeling Indian music using a variable order gapped HMM.

Author: Mor, Bhavya, Garhwal, Sunita, and Kumar, Ajay
Subjects: HIDDEN Markov models, MUSICAL composition
Abstract: The computer-assisted music composition is an active research area since mid-1900. In this paper, we have applied the VOGUE model for designing musical sequence of bandish notations of raga Bhairav, a classical Indian music. Variable Order and Gapped hidden Markov model for unstructured elements can capture variable length dependencies with variable gaps in sequential data. In most of raga pattern, a particular pattern repeats itself which may be separated by variable length gaps. VOGUE mines the frequent patterns in raga having different length gaps. These mined patterns are used to model VOGUE for Indian music ragas. Furthermore, we analyzed the benefits of VOGUE model over the standard HMM. To the best of author's knowledge, this is the very first attempt to model Indian classical music with variable order gapped HMM. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

28. What’s Next? A Recommendation System for Industrial Training

Author: Rajiv Srivastava, Girish Keshav Palshikar, Saheb Chaurasia, and Arati Dixit
Subjects: Personalized recommendation, Sequence matching, Sequence mining, Industrial training, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract Continuous training is crucial for creating and maintaining the right skill-profile for the industrial organization’s workforce. There is a tremendous variety in the available trainings within an organization: technical, project management, quality, leadership, domain-specific, soft-skills, etc. Hence it is important to assist the employee in choosing the best trainings, which perfectly suits her background, project needs and career goals. In this paper, we focus on algorithms for training recommendation in an industrial setting. We formalize the problem of next training recommendation, taking into account the employee’s training and work history. We present several new unsupervised sequence mining algorithms to mine the past trainings data from the organization for arriving at personalized next training recommendation. Using the real-life data about trainings of 118,587 employees over 5019 distinct trainings from a large multi-national IT organization, we show that these algorithms outperform several standard recommendation engine algorithms as well as those based on standard sequence mining algorithms.
Published: 2018
Full Text: View/download PDF

29. ST Sequence Miner: visualization and mining of spatio-temporal event sequences.

Author: Koseoglu, Baran, Kaya, Erdem, Balcisoy, Selim, and Bozkaya, Burcin
Subjects: *SEQUENTIAL pattern mining, *VISUAL analytics, *SEQUENCE analysis, *DATA visualization, *ALGORITHMS
Abstract: As a promising field of research, event sequence analysis seems to assist in facilitating clear reasoning behind human decisions by mining reality behind the sequential actions. Mining frequent patterns from event sequences has proved to be promising in extracting actionable insights, which plays an important role in many application domains. Much of the related work challenges the problem solely from the temporal perspective omitting the information that could be gained from the spatial part. This could be in part due to the fact that analysis of event sequences with references to both time and space is attributed as a challenging task due to the additional variance in the data introduced by the spatial aspect. We propose a visual analytics approach that incorporates spatio-temporal pattern extraction leveraging an extended sequential pattern mining algorithm and a pattern discovery guidance mechanism operating on geographic query and selection capabilities. As an implementation of our approach, we introduce a visual analytics tool, namely ST Sequence Miner, enabling event pattern exploration in time-location space. We evaluate our approach over a credit card transaction dataset by adopting case study methodology. Our study unveils that patterns mined from event sequences can better explain possible relationships with proper visualization of time-location data. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

30. Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data.

Author: TINGTING WANG, LEI DUAN, GUOZHU DONG, and ZHIFENG BAO
Subjects: SEQUENTIAL pattern mining, HEURISTIC, DATA
Abstract: Recently, a lot of research work has been proposed in different domains to detect outliers and analyze the outlierness of outliers for relational data. However, while sequence data is ubiquitous in real life, analyzing the outlierness for sequence data has not received enough attention. In this article, we study the problem of mining outlying sequence patterns in sequence data addressing the question: given a query sequence s in a sequence dataset D, the objective is to discover sequence patterns that will indicate the most unusualness (i.e., outlierness) of s compared against other sequences. Technically, we use the rank defined by the average probabilistic strength (aps) of a sequence pattern in a sequence to measure the outlierness of the sequence. Then a minimal sequence pattern where the query sequence is ranked the highest is defined as an outlying sequence pattern. To address the above problem, we present OSPMiner, a heuristic method that computes aps by incorporating several pruning techniques. Our empirical study using both real and synthetic data demonstrates that OSPMiner is effective and efficient. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

31. Hybridization of population-based ant colony optimization via data mining.

Author: Adak, Zeynep and Demiriz, Ayhan
Subjects: *DATA mining, *QUADRATIC assignment problem, *TRAVELING salesman problem, *ASSOCIATION rule mining, *PATTERNS (Mathematics)
Abstract: We propose a hybrid application of Population Based Ant Colony Optimization that uses a data mining procedure to wisely initialize the pheromone entries. Hybridization of metaheuristics with data mining techniques has been studied by several researchers in recent years. In this line of research, frequent patterns in a number of initial high-quality solutions are extracted to guide the subsequent iterations of an algorithm, which results in an improvement in solution quality and computational time. Our proposal possesses certain differences from and contributions to existing literature. Instead of one single run that incorporates both the main metaheuristic and the data mining module inside, we propose to carry out independent runs and collect elite sets over these trials. Another contribution is the way we use the knowledge gained from the application of the data mining module. The extracted knowledge is used to initialize the memory model in the algorithm rather than to construct new initial solutions. One additional contribution is the use of a path mining algorithm (a specific sequence mining algorithm) rather than Apriori-like association mining algorithms. Computational experiments, conducted both on symmetric Travelling Salesman Problem and symmetric/asymmetric Quadratic Assignment Problem instances, showed that our proposal produces significantly better results, and is more robust than pure applications of population-based ant colony optimization. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

32. Phylogenetic review of the comb-tooth blenny genus Hypleurochilus in the northwest Atlantic and Gulf of Mexico.

Author: Carter, Joshua E., Sporre, Megan A., and Eytan, Ron I.
Subjects: *NUMBERS of species, *REEF fishes, *BIOGEOGRAPHY, *BIOMASS, *PHYLOGENY, *DRILLING platforms, *CHLOROPLAST DNA
Abstract: [Display omitted] • Hypleurochilus delimited into two clades and one lineage. • The phylogenetic relationships within Hypleurochilus reflect biogeographic breaks. • Single-locus delimitation fails to resolve recently diverged species. • First report of sister relationship between H. caudovittatus and H. multifilis. • Mined sequence data supports the hypothesis of a range expansion of H. aequipinnis. As some of the smallest vertebrates, yet largest producers of consumed reef biomass, cryptobenthic reef fishes serve a disproportionate role in reef ecosystems and are one of the most poorly understood groups of fish. The blenny genera Hypleurochilus and Parablennius are currently considered paraphyletic and the interrelationships of Parablennius have been the focus of recent phylogenetic studies. However, the interrelationships of Hypleurochilus remain understudied. This genus is transatlantically distributed and comprises 11 species with a convoluted taxonomic history. In this study, relationships for ten Hypleurochilus species are resolved using multi-locus nuclear and mtDNA sequence data, morphological data, and mined COI barcode data. Mitochondrial and nuclear sequence data from 61 individuals collected from the western Atlantic and northern Gulf of Mexico (N. GoM) delimit seven species into a temperate clade, a tropical clade, and a third distinct lineage. This lineage, herein referred to as H. cf. aequipinnis, may represent a species of Hypleurochilus whose range has expanded into the N. GoM. Inclusion of publicly available COI sequence for an additional three species provides further phylogenetic resolution. H. bananensis forms a new eastern Atlantic clade with H. cf. aequipinnis , providing further evidence for a western Atlantic range expansion. Single marker COI delimitation was unable to elucidate the relationships between H. springeri/H. pseudoaequipinnis and between H. multifilis/H. caudovittatus due to incomplete lineage sorting. Mitochondrial data are also unable to accurately resolve the placement of H. bermudensis. However, a comprehensive approach using multi-locus phylogenetic and species delimitation methods was able to resolve these relationships. While mining publicly available sequence data allowed for the inclusion of an increased number of species in the analysis and a more comprehensive phylogeny, it was not without drawbacks, as a handful of sequences are potentially mis-identified. Overall, we find that the recent divergence of some species within this genus and potential introgression events confound the results of single locus delimitation methods, yet a combination of single and multi-locus analyses has allowed for insights into the biogeography of this genus and uncovered a potential transatlantic range expansion. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

33. microTaboo: a general and practical solution to the k-disjoint problem

Author: Mohammed Al-Jaff, Eric Sandström, and Manfred Grabherr
Subjects: k-disjoint problem, Software, Sequence mining, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of “unique”, requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable. Results We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe- to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining. Conclusions microTaboo implements a solution to the k-disjoint problem in an alignment- and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at: https://MohammedAlJaff.github.io/microTaboo
Published: 2017
Full Text: View/download PDF

34. A pattern growth-based sequential pattern mining algorithm called prefixSuffixSpan

Author: Kenmogne Edith Belise, Tadmon Calvin, and Nkambou Roger
Subjects: sequence mining, sequential pattern, pattern-growth direction, pattern-growth ordering, search space, pruning, partitioning, Management information systems, T58.6-58.62
Abstract: Sequential pattern mining is an important data mining problem widely addressed by the data mining community, with a very large field of applications. The sequence pattern mining aims at extracting a set of attributes, shared across time among a large number of objects in a given database. The work presented in this paper is directed towards the general theoretical foundations of the pattern-growth approach. It helps indepth understanding of the pattern-growth approach, current status of provided solutions, and direction of research in this area. In this paper, this study is carried out on a particular class of pattern-growth algorithms for which patterns are grown by making grow either the current pattern prefix or the current pattern suffix from the same position at each growth-step. This study leads to a new algorithm called prefixSuffixSpan. Its correctness is proven and experimentations are performed.
Published: 2017
Full Text: View/download PDF

35. Efficient model selection for predictive pattern mining model by safe pattern pruning.

Author: Yoshida T, Hanada H, Nakagawa K, Taji K, Tsuda K, and Takeuchi I
Abstract: Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering unified inconsistent notation sub-structures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the safe pattern pruning method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences., Competing Interests: The authors declare no competing interests., (© 2023 The Author(s).)
Published: 2023
Full Text: View/download PDF

36. Improving Customer Behaviour Prediction with the Item2Item model in Recommender Systems

Author: T. Nguyen and P. Do
Subjects: recommender systems, sequence mining, item2item, Computer engineering. Computer hardware, TK7885-7895, Systems engineering, TA168
Abstract: Recommender Systems are the most well-known applications in E-commerce sites. However, the trade-off between runtime and the accuracy in making recommendations is a big challenge. This work combines several traditional techniques to reduce the limitation of each single technique and exploits the Item2Item model to improve the prediction accuracy. As a case study, this paper focuses on user behaviour prediction in restaurant recommender systems and uses a public dataset including restaurant information and user sessions. Within this dataset, user behaviour can be discovered for the collaborative filtering, and restaurant information is extracted for the content-based filtering. The idea of the pre-trained word embedding in Natural Language Processing is utilized in the item-based collaborative filtering to find the similarity between restaurants based on user sessions. Experimental results have shown that the combination of these techniques makes valuable recommendations.
Published: 2018
Full Text: View/download PDF

37. Hybrid ASP-based Approach to Pattern Mining.

Author: PARAMONOV, SERGEY, STEPANOVA, DARIA, and MIETTINEN, PAULI
Subjects: LOGIC programming, DATA mining, AUTOMATIC extracting (Information science), RULE-based programming, SEQUENTIAL pattern mining
Abstract: Detecting small sets of relevant patterns from a given data set is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like answer set programming (ASP) seem well suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods focus either on scalability or on generality. In this paper, we make steps toward combining local (frequency, size, and cost) and global (various condensed representations like maximal, closed, and skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence, and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework, we apply it to a problem of approximately tiling a database. Experiments on real-world data sets show the effectiveness of the proposed method and computational gains for itemset, sequence, and graph mining, as well as approximate tiling. Under consideration in Theory and Practice of Logic Programming. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

38. An efficient pixel clustering-based method for mining spatial sequential patterns from serial remote sensing images.

Author: Wu, Xiaozhu and Zhang, Ximei
Subjects: *REMOTE sensing, *PIXELS, *SEQUENTIAL analysis, *URBAN growth, *RUN-length encoding
Abstract: Abstract The accumulation of serial remote sensing images provides plentiful data for discovering sequential spatial patterns in various fields such as agricultural monitoring, urban development, and vegetation cover. Otherwise, traditional sequential pattern-mining algorithms cannot be directly or efficiently applied to remote sensing images. In this study, we propose a pixel clustering-based method to improve the efficiency of mining spatial sequential patterns from raster serial remote sensing images (SRSI). Firstly, the images are compressed by using the Run-Length coding schema. Then, pixels with identical sequences are clustered by means of the Run-length code-based spatial overlay operation. Finally, a pruning strategy is proposed, to extend the prefixSpan algorithm to skip unnecessary database scanning when mining from pixel groups. The experimental results indicate that the method presented in this paper could extract spatial sequential patterns from SRSI efficiently. Although accurate support rates for the patterns may not be obtained, our method could ensure that all patterns are extracted with a lower time cost. Highlights •A Run-Length Coding based pixels clustering method is proposed. •A pruning strategy to extend prefixSpan algorithm is proposed. •The proposed method is efficient for mining spatial sequential patterns. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

39. Estimating the selectivity of LIKE queries using pattern-based histograms.

Author: AYTİMUR, Mehmet and ÇAKMAK, Ali
Subjects: *HISTOGRAMS, *COST estimates, *DATABASE management, *SQL, *DATA mining
Abstract: Accurate cost and time estimation of a query is one of the major success indicators for database management systems. SQL allows the expression of flexible queries on text-formatted data. The LIKE operator is used to search for a specified pattern (e.g., LIKE "luck%") in a string database. It is vital to estimate the selectivity of such flexible predicates for the query optimizer to choose an efficient execution plan. In this paper, we study the problem of estimating the selectivity of a LIKE query predicate over a bag of strings. We propose a new type of pattern-based histogram structure to summarize the data distribution in a particular column. More specifically, we first mine sequential patterns over a given string database and then construct a special histogram out of the mined patterns. During query optimization time, pattern-based histograms are exploited to estimate the selectivity of a LIKE predicate. The experimental results on a real dataset from DBLP show that the proposed technique outperforms the state of the art for generic LIKE queries like %s1%s2%...%sn% where si represents one or more characters. What is more, the proposed histogram structure requires more than two orders of magnitude smaller memory space, and the estimation time is almost an order of magnitude less in comparison to the state of the art. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

40. Applying Learning Analytics to Detect Sequences of Actions and Common Errors in a Geometry Game

Author: Manuel J. Gomez, José A. Ruipérez-Valiente, Pedro A. Martínez, and Yoon Jeon Kim
Subjects: educational games, learning analytics, game-based assessment, sequence mining, visualization dashboard, Chemical technology, TP1-1185
Abstract: Games have become one of the most popular activities across cultures and ages. There is ample evidence that supports the benefits of using games for learning and assessment. However, incorporating game activities as part of the curriculum in schools remains limited. Some of the barriers for broader adoption in classrooms is the lack of actionable assessment data, the fact that teachers often do not have a clear sense of how students are interacting with the game, and it is unclear if the gameplay is leading to productive learning. To address this gap, we seek to provide sequence and process mining metrics to teachers that are easily interpretable and actionable. More specifically, we build our work on top of Shadowspect, a three-dimensional geometry game that has been developed to measure geometry skills as well other cognitive and noncognitive skills. We use data from its implementation across schools in the U.S. to implement two sequence and process mining metrics in an interactive dashboard for teachers. The final objective is to facilitate that teachers can understand the sequence of actions and common errors of students using Shadowspect so they can better understand the process, make proper assessment, and conduct personalized interventions when appropriate.
Published: 2021
Full Text: View/download PDF

41. EMOSS: An Efficient Algorithm to Hide Sequential Patterns

Author: Olya Sadat Behbahani, Mir Mohsen Pedram, Kambiz Badie, and Babak Rahbarinia
Subjects: data mining, sequence mining, knowledge hiding, sequential pattern, Information technology, T58.5-58.64, Telecommunication, TK5101-6720, Electronic computers. Computer science, QA75.5-76.95
Abstract: Nowadays data mining is the way of extracting hidden knowledge from raw data whereas sequence mining aims to find sequential patterns that are frequent in the database, so publishing these data may lead to the disclosure of private information about organizations or individuals. Knowledge hiding is the process of hiding sensitive knowledge extracted previously from the database, to ensure that no abuse will be caused. This paper addresses the problem of sequential pattern hiding and proposes an efficient algorithm which uses a multi-objective approach to overcome the problem of sequence hiding as well as maintaining database fidelity as much as possible. It also shows that the proposed algorithm outperforms existing methods in terms of both speed and memory usage.
Published: 2015

42. Discovering Human Activities from Binary Data in Smart Homes

Author: Mohamed Eldib, Wilfried Philips, and Hamid Aghajan
Subjects: human activity discovery, smart homes, health monitoring, clustering, unsupervised learning, sequence mining, Chemical technology, TP1-1185
Abstract: With the rapid development in sensing technology, data mining, and machine learning fields for human health monitoring, it became possible to enable monitoring of personal motion and vital signs in a manner that minimizes the disruption of an individual’s daily routine and assist individuals with difficulties to live independently at home. A primary difficulty that researchers confront is acquiring an adequate amount of labeled data for model training and validation purposes. Therefore, activity discovery handles the problem that activity labels are not available using approaches based on sequence mining and clustering. In this paper, we introduce an unsupervised method for discovering activities from a network of motion detectors in a smart home setting. First, we present an intra-day clustering algorithm to find frequent sequential patterns within a day. As a second step, we present an inter-day clustering algorithm to find the common frequent patterns between days. Furthermore, we refine the patterns to have more compressed and defined cluster characterizations. Finally, we track the occurrences of various regular routines to monitor the functional health in an individual’s patterns and lifestyle. We evaluate our methods on two public data sets captured in real-life settings from two apartments during seven-month and three-month periods.
Published: 2020
Full Text: View/download PDF

43. Finding sequential patterns with TF-IDF metrics in health-care databases

Author: Kardkovács Zsolt T. and Kovács Gábor
Subjects: sequence mining, frequent sequential pattern, tf-idf, health care database, Electronic computers. Computer science, QA75.5-76.95
Abstract: Finding frequent sequential patterns has been defined as finding ordered list of items that occur more times in a database than a user defined threshold. For big and dense databases that contain really long sequences and large itemset such as medical case histories, algorithm proposed on this idea of counting the occurrences output enourmous number of highly redundant frequent sequences, and are therefore simply impractical. Therefore, there is a need for algorithm that perform frequent pattern search and prefiltering simultaneously. In this paper, we propose an algorithm that reinterprets the term support on text mining basis. Experiments show that our method not only eliminates redundancy among the output sequences, but it scales much better with huge input data sizes. We apply our algorithm for mining medical databases: what diagnoses are likely to lead to a certain future health condition.
Published: 2014
Full Text: View/download PDF

44. What’s Next? A Recommendation System for Industrial Training.

Author: Srivastava, Rajiv, Palshikar, Girish Keshav, Chaurasia, Saheb, and Dixit, Arati
Subjects: ONLINE education, INDUSTRIAL organization (Economic theory), PROJECT management, INFORMATION economy, K-nearest neighbor classification
Abstract: Continuous training is crucial for creating and maintaining the right skill-profile for the industrial organization’s workforce. There is a tremendous variety in the available trainings within an organization: technical, project management, quality, leadership, domain-specific, soft-skills, etc. Hence it is important to assist the employee in choosing the best trainings, which perfectly suits her background, project needs and career goals. In this paper, we focus on algorithms for training recommendation in an industrial setting. We formalize the problem of next training recommendation, taking into account the employee’s training and work history. We present several new unsupervised sequence mining algorithms to mine the past trainings data from the organization for arriving at personalized next training recommendation. Using the real-life data about trainings of 118,587 employees over 5019 distinct trainings from a large multi-national IT organization, we show that these algorithms outperform several standard recommendation engine algorithms as well as those based on standard sequence mining algorithms. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

45. Expert recommendation for trouble ticket routing.

Author: Xu, Jian and He, Rouying
Subjects: *EXPERT systems, *ROUTING algorithms, *DATA mining, *PROBLEM solving, *MACHINE learning
Abstract: A trouble ticket is an important information carrier in system maintenance, which records problem symptoms, the resolving process, and resolutions. A critical challenge for the ticket management system is how to quickly assign a proper expert to deal with trouble tickets and fix problems. Thousands of tickets bouncing among multiple experts before being fixed will consume limited system maintenance resources and may also violate the service level agreement (SLA). Thus, for an incoming ticket, an expert should be recommended as quickly as possible in order to reduce the processing delay. In this paper, to address the challenge in the expert assignment, we exploit an expert collaboration network model by combining expertise profiles and social profiles learned from problem descriptions and resolution sequences of the historical resolved tickets, and develop several two-stage expert recommendation algorithms to determine a resolver to solve the problem. To evaluate the effectiveness of expert recommendation algorithms, we conduct extensive experiments on a real ticket data set. The experimental results show that the proposed algorithms can effectively shorten the mean number of steps to resolve (MSTR) with a high recommendation precision, especially for the long routing sequences generated from manual assignments. The proposed model and algorithms have the potential of being used in a ticket routing recommendation engine to greatly reduce human intervention in the routing process. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

46. Trouble Ticket Routing Models and Their Applications.

Author: Xu, Jian, He, Rouying, Zhou, Wubai, and Li, Tao
Abstract: A trouble ticket is an important information carrier in system maintenance, which records problem symptoms, the resolving process, and resolutions. A critical challenge for the ticket management system is how to quickly deal with trouble tickets and fix problems. Thousands of tickets, bouncing among multiple expert groups before being fixed, will consume limited system maintenance resources and may also violate the service level agreement. Thus, trouble tickets should be routed to the right expert group as quickly as possible in order to reduce the processing delay. In this paper, to address the challenge in ticket routing, we exploit three different routing models by mining the combination of problem descriptions and resolution sequences from the historical resolved tickets, and develop the corresponding routing recommendation algorithms to determine the next expert group to solve the problem. To evaluate the performance of routing recommendation algorithms, we conduct extensive experiments on a real ticket data set. The experimental results show that the proposed models and algorithm can effectively shorten the mean number of steps to resolve with a high ratio of the number of successfully resolved tickets to the total number of tickets, especially for the long routing sequences generated from manual assignments. These models and algorithms have the potential of being used in a ticket routing recommendation engine to greatly reduce human intervention in the routing process. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

47. Using sequence mining to reveal the efficiency in scientific reasoning during STEM learning with a game-based learning environment.

Author: Taub, Michelle, Azevedo, Roger, Bradbury, Amanda E., Millar, Garrett C., and Lester, James
Subjects: *STEM education, *SEQUENTIAL pattern mining, *EDUCATIONAL games, *METACOGNITION, *REASONING
Abstract: The goal of this study was to assess how metacognitive monitoring and scientific reasoning impacted the efficiency of game completion during learning with Crystal Island, a game-based learning environment that fosters self-regulated learning and scientific reasoning by having participants solve the mystery of what illness impacted inhabitants of the island. We conducted sequential pattern mining and differential sequence mining on 64 undergraduate participants’ hypothesis testing behavior. Patterns were coded based on the relevancy of what items were being tested for, and the items themselves. Results revealed that participants who were more efficient at solving the mystery tested significantly fewer partially-relevant and irrelevant items than less efficient participants. Additionally, more efficient participants had fewer sequences of testing items overall, and significantly lower instance support values of the PartiallyRelevant -- Relevant to Relevant -- Relevant and PartiallyRelevant -- PartiallyRelevant to Relevant--Partially Relevant sequences compared to less efficient participants. These findings have implications for designing adaptive GBLEs that scaffold participants based on in-game behaviors. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

48. Discovering activity patterns in office environment using a network of low-resolution visual sensors.

Author: Eldib, Mohamed, Deboeverie, Francis, Philips, Wilfried, and Aghajan, Hamid
Abstract: Understanding activity patterns in office environments is important in order to increase workers’ comfort and productivity. This paper proposes an automated system for discovering activity patterns of multiple persons in a work environment using a network of cheap low-resolution visual sensors (900 pixels). Firstly, the users’ locations are obtained from a robust people tracker based on recursive maximum likelihood principles. Secondly, based on the users’ mobility tracks, the high density positions are found using a bivariate kernel density estimation. Then, the hotspots are detected using a confidence region estimation. Thirdly, we analyze the individual’s tracks to find the starting and ending hotspots. The starting and ending hotspots form an observation sequence, where the user’s presence and absence are detected using three powerful Probabilistic Graphical Models (PGMs). We describe two approaches to identify the user’s status: a single model approach and a two-model mining approach. We evaluate both approaches on video sequences captured in a real work environment, where the persons’ daily routines are recorded over 5 months. We show how the second approach achieves a better performance than the first approach. Routines dominating the entire group’s activities are identified with a methodology based on the Latent Dirichlet Allocation topic model. We also detect routines which are characteristic of persons. More specifically, we perform various analysis to determine regions with high variations, which may correspond to specific events. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

49. Mining Precise-Positioning Episode Rules from Event Sequences.

Author: Ao, Xiang, Luo, Ping, Wang, Jin, Zhuang, Fuzhen, and He, Qing
Subjects: *DATA mining, *DATA analysis, *ALGORITHMS, *NUMERICAL analysis, *MATHEMATICAL analysis
Abstract: Episode Rule Mining is a popular framework for discovering sequential rules from event sequential data. However, traditional episode rule mining methods only tell that the consequent event is likely to happen within a given time interval after the occurrence of the antecedent events. As a result, they cannot satisfy the requirement of many time sensitive applications, such as program security trading and intelligent transportation management due to the lack of fine-grained response time. In this study, we come up with the concept of fixed-gap episode to address this problem. A fixed-gap episode consists of an ordered set of events where the elapsed time between any two consecutive events is a constant. Based on this concept, we formulate the problem of mining precise-positioning episode rules in which the occurrence time of each event in the consequent is clearly specified. In addition, we develop a trie-based data structure to mine such precise-positioning episode rules with several pruning strategies incorporated for improving the performance as well as reducing memory consumption. Experimental results on real datasets show the superiority of our proposed algorithms. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

50. Sports analytics for professional speed skating.

Author: Knobbe, Arno, Orie, Jac, Hofman, Nico, Burgh, Benjamin, and Cachucho, Ricardo
Subjects: SPEED skating, PROFESSIONAL sports, ATHLETES, DATA extraction, LINEAR statistical models
Abstract: In elite sports, training schedules are becoming increasingly complex, and a large number of parameters of such schedules need to be tuned to the specific physique of a given athlete. In this paper, we describe how extensive analysis of historical data can help optimise these parameters, and how possible pitfalls of under- and overtraining in the past can be avoided in future schedules. We treat the series of exercises an athlete undergoes as a discrete sequence of attributed events, that can be aggregated in various ways, to capture the many ways in which an athlete can prepare for an important test event. We report on a cooperation with the elite speed skating team LottoNL-Jumbo, who have recorded detailed training data over the last 15 years. The aim of the project was to analyse this potential source of knowledge, and extract actionable and interpretable patterns that can provide input to future improvements in training. We present two alternative techniques to aggregate sequences of exercises into a combined, long-term training effect, one of which based on a sliding window, and one based on a physiological model of how the body responds to exercise. Next, we use both linear modelling and Subgroup Discovery to extract meaningful models of the data. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

168 results on '"sequence mining"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources