Author: "Lior Rokach" / Journal: expert systems with applications - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lior Rokach"' showing total 10 results

Start Over Author "Lior Rokach" Journal expert systems with applications

10 results on '"Lior Rokach"'

1. Constraint learning based gradient boosting trees

Author: Asaf Shabtai, Abraham Israeli, and Lior Rokach
Subjects: 0209 industrial biotechnology, Constraint learning, Boosting (machine learning), Computer science, business.industry, General Engineering, Intelligent decision support system, 02 engineering and technology, Machine learning, computer.software_genre, Computer Science Applications, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Gradient boosting, business, computer
Abstract: Predictive regression models aim to find the most accurate solution to a given problem, often without any constraints related to the model’s predicted values. Such constraints have been used in prior research where they have been applied to a subpopulation within the training dataset which is of greater interest and importance. In this research we introduce a new setting of regression problems, in which each instance can be assigned a different constraint, defined based on the value of the target (predicted) attribute. The new use of constraints is taken into account and incorporated into the learning process, and is also considered when evaluating the induced model. We propose two algorithms which are modifications to the regression boosting method. There are two advantages of the proposed algorithms: they are not dependent on the base learner used during the learning process, and they can be adopted by any boosting technique. We implemented the algorithms by modifying the gradient boosting trees (GBT) model, and we also introduced two measures for evaluating the models that were trained to solve the constraint problems. We compared the proposed algorithms to three baseline algorithms using four real-life datasets. Due to the algorithms’ focus on satisfying the constraints, in most cases the results showed significant improvement in the constraint-related measures, with just a minimal effect on the general prediction error. The main impact of the proposed approach is in its ability to derive a model with a higher level of assurance for specific cases of interest (i.e., the constrained cases). This is extremely important and has great significance in various use cases and expert and intelligent systems, particularly critical systems, such as critical healthcare systems (e.g., when predicting blood pressure or blood sugar level), safety systems (e.g., when aiming to estimate the distance of cars or airplanes from other objects), or critical industrial systems (e.g., require to estimate their usability along time). In each of these cases, there is a subpopulation of all instances that is of greater interest to the expert or system, and the sensitivity of the model’s error changes according to the real value of the predicted feature. For example, for a subpopulation of patients (e.g., patients under the age of eight, or patients known to be at risk), physicians often require a sensitive model that accurately predicts blood pressure values.
Published: 2019

2. A deep learning framework for predicting burglaries based on multiple contextual factors

Author: Adir Solomon, Mor Kertis, Bracha Shapira, and Lior Rokach
Subjects: Artificial Intelligence, General Engineering, Computer Science Applications
Published: 2022

3. Explaining anomalies detected by autoencoders using Shapley Additive Explanations

Author: Liat Antwarg, Bracha Shapira, Lior Rokach, and Ronnie Mindlin Miller
Subjects: Ground truth, Computer science, business.industry, Deep learning, Anomaly (natural sciences), Supervised learning, General Engineering, Machine learning, computer.software_genre, Autoencoder, Computer Science Applications, Kernel (image processing), Artificial Intelligence, Outlier, Anomaly detection, Artificial intelligence, business, computer
Abstract: Deep learning algorithms for anomaly detection, such as autoencoders, point out the outliers, saving experts the time-consuming task of examining normal cases in order to find anomalies. Most outlier detection algorithms output a score for each instance in the database. The top-k most intense outliers are returned to the user for further inspection; however, the manual validation of results becomes challenging without justification or additional clues. An explanation of why an instance is anomalous enables the experts to focus their investigation on the most important anomalies and may increase their trust in the algorithm. Recently, a game theory-based framework known as SHapley Additive exPlanations (SHAP) was shown to be effective in explaining various supervised learning models. In this paper, we propose a method that uses Kernel SHAP to explain anomalies detected by an autoencoder, which is an unsupervised model. The proposed explanation method aims to provide a comprehensive explanation to the experts by focusing on the connection between the features with high reconstruction error and the features that are most important in terms of their affect on the reconstruction error. We propose a black-box explanation method, because it has the advantage of being able to explain any autoencoder without being aware of the exact architecture of the autoencoder model. The proposed explanation method extracts and visually depicts both features that contribute the most to the anomaly and those that offset it. An expert evaluation using real-world data demonstrates the usefulness of the proposed method in helping domain experts better understand the anomalies. Our evaluation of the explanation method, in which a “perfect” autoencoder is used as the ground truth, shows that the proposed method explains anomalies correctly, using the exact features, and evaluation on real-data demonstrates that (1) our explanation model, which uses SHAP, is more robust than the Local Interpretable Model-agnostic Explanations (LIME) method, and (2) the explanations our method provides are more effective at reducing the anomaly score than other methods.
Published: 2021

4. A hybrid approach for improving unsupervised fault detection for robotic systems

Author: Eliahu Khalastchi, Meir Kalech, and Lior Rokach
Subjects: 0209 industrial biotechnology, business.industry, Computer science, Supervised learning, General Engineering, Intelligent decision support system, 02 engineering and technology, Machine learning, computer.software_genre, Fault (power engineering), Flight simulator, Fault detection and isolation, Computer Science Applications, Domain (software engineering), 020901 industrial engineering & automation, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Robot, Artificial intelligence, business, computer
Abstract: From unsupervised to supervised learning a fault detection model (for robots).Insights to why and when it becomes more accurate.Theoretical analysis and a prediction tool.Empirical results on 3 real-world domains that back these insights. The use of robots in our daily lives is increasing. As we rely more on robots, thus it becomes more important for us that the robots will continue on with their mission successfully. Unfortunately, these sophisticated, and sometimes very expensive, machines are susceptible to different kinds of faults. It becomes important to apply a Fault Detection (FD) mechanism which is suitable for the domain of robots. Two important requirements of such a mechanism are: high accuracy and low computational-load during operation (online). Supervised learning can potentially produce very accurate FD models, and if the learning takes place offline then the online computational-load can be reduced. Yet, the domain of robots is characterized with the absence of labeled data (e.g., faulty, normal) required by supervised approaches, and consequently, unsupervised approaches are being used. In this paper we propose a hybrid approach - an unsupervised approach can label a data set, with a low degree of inaccuracy, and then the labeled data set is used offline by a supervised approach to produce an online FD model. Now, we are faced with a choice should we use the unsupervised or the hybrid fault detector? Seemingly, there is no way to validate the choice due to the absence of (a priori) labeled data. In this paper we give an insight to why, and a tool to predict when, the hybrid approach is more accurate. In particular, the main impacts of our work are (1) we theoretically analyze the conditions under which the hybrid approach is expected to be more accurate. (2) Our theoretical findings are backed with empirical analysis. We use data sets of three different robotic domains: a high fidelity flight simulator, a laboratory robot, and a commercial Unmanned Arial Vehicle (UAV). (3) We analyze how different unsupervised FD approaches are improved by the hybrid technique and (4) how well this improvement fits our prediction tool. The significance of the hybrid approach and the prediction tool is the potential benefit to expert and intelligent systems in which labeled data is absent or expensive to create.
Published: 2017

5. An ensemble method for top-N recommendations from the SVD

Author: Lior Rokach, David Ben-Shimon, and Bracha Shapira
Subjects: business.industry, General Engineering, Decision tree, Dot product, 02 engineering and technology, Recommender system, Machine learning, computer.software_genre, Ensemble learning, Computer Science Applications, Matrix decomposition, Set (abstract data type), Tree (data structure), Artificial Intelligence, 020204 information systems, Singular value decomposition, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer, Mathematics
Abstract: SVD suffers from computational limitation when delivering top-N items online.An ensemble algorithm for getting top-N items from the SVD results is proposed.The algorithm maps the items to the leaves of multiple compact trees offline.Users are assigned online to one leaf in each tree for obtaining their top-N items.The algorithm delivers faster and more accurate top-N items than the base SVD. Matrix factorization methods such as the singular value decomposition technique have become very popular in the area of recommender systems. Given a rating matrix as input, these techniques output two matrixes with lower dimensional space that represent the user and item features. The relevance of item i to user u is revealed by the score of the dot product between u vector of features and i vector of features. High scores indicate greater relevance. In order to deliver the best recommendations for a given user based on these latent features, one must obtain the list of scores of all the items for the given user and sort the resulting list. When the size of the catalogue is large, this phase consumes a large amount of computational time and cannot be done online. Another drawback with this approach is that once such a list is computed for a given user, it remains finite and it is impossible to incorporate within it new activities of the user. Hence, the use of such techniques is limited online.In this paper we propose an ensemble method for building a forest of trees offline, where each leaf in each tree is holding a unique set of item vectors. Once a user is engaged with the system, its vector is classified to one leaf in each one of the trees in the forest for conducting a dot product with the corresponding items. By using this method we compute online only a small number of dot products for a given user vector allowing us to quickly retrieve dynamic recommendations from the SVD, thereby presenting an alternative to the existing method which computes and caches all of the dot products among the items and users. The method maps the items to the leaves of multiple compact trees offline, each tree is a weak recommendation model, creating a forest of decision trees algorithm in which users that are assigned to these leaves online are likely to produce high dot product scores with the items that are already in the leaves. We demonstrate the effectiveness of the suggested ensemble method by applying it to three public datasets and comparing it to a state-of-the-art algorithm aimed at solving the problem.
Published: 2016

6. SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods

Author: Nir Nissim, Yuval Elovici, Aviad Cohen, and Lior Rokach
Subjects: Advanced persistent threat, business.industry, computer.internet_protocol, Computer science, Feature extraction, General Engineering, Feature selection, 02 engineering and technology, computer.file_format, Static analysis, computer.software_genre, Machine learning, Computer Science Applications, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Ransomware, Malware, 020201 artificial intelligence & image processing, Artificial intelligence, Executable, business, computer, XML
Abstract: Office documents are used extensively by individuals and organizations. Most users consider these documents safe for use. Unfortunately, Office documents can contain malicious components and perform harmful operations. Attackers increasingly take advantage of naive users and leverage Office documents in order to launch sophisticated advanced persistent threat (APT) and ransomware attacks. Recently, targeted cyber-attacks against organizations have been initiated with emails containing malicious attachments. Since most email servers do not allow the attachment of executable files to emails, attackers prefer to use of non-executable files (e.g., documents) for malicious purposes. Existing anti-virus engines primarily use signature-based detection methods, and therefore fail to detect new unknown malicious code which has been embedded in an Office document. Machine learning methods have been shown to be effective at detecting known and unknown malware in various domains, however, to the best of our knowledge, machine learning methods have not been used for the detection of malicious XML-based Office documents (*.docx, *.xlsx, *.pptx, *.odt, *.ods, etc.). In this paper we present a novel structural feature extraction methodology (SFEM) for XML-based Office documents. SFEM extracts discriminative features from documents, based on their structure. We leveraged SFEM’s features with machine learning algorithms for effective detection of malicious *.docx documents. We extensively evaluated SFEM with machine learning classifiers using a representative collection (16,938 *.docx documents collected "from the wild") which contains ∼4.9% malicious and ∼95.1% benign documents. We examined 1,600 unique configurations based on different combinations of feature extraction, feature selection, feature representation, top-feature selection methods, and machine learning classifiers. The results show that machine learning algorithms trained on features provided by SFEM successfully detect new unknown malicious *.docx documents. The Random Forest classifier achieves the highest detection rates, with an AUC of 99.12% and true positive rate (TPR) of 97% that is accompanied by a false positive rate (FPR) of 4.9%. In comparison, the best anti-virus engine achieves a TPR which is ∼25% lower.
Published: 2016

7. Reducing preference elicitation in group decision making

Author: Lior Rokach, Bracha Shapira, Lihi Naamani-Dery, and Meir Kalech
Subjects: Preference learning, Operations research, Computer science, General Engineering, 02 engineering and technology, Recommender system, Some confidence, computer.software_genre, Expert system, Computer Science Applications, Group decision-making, Artificial Intelligence, 020204 information systems, Computational social choice, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Preference elicitation, Data mining, computer
Abstract: Reducing preference elicitation when selecting a winner item.Computing approximate winners with some confidence level.Terminating preference elicitation sooner by returning k alternatives.Two preference aggregation strategies: Least Misery and Majority.A user study on collected data from a group recommender system. Groups may need assistance in reaching a joint decision. Elections can reveal the winning item, but this means the group members need to vote on, or at least consider all available items. Our challenge is to minimize the amount of preferences that need to be elicited and thus reduce the effort required from the group members. We present a model that offers a few innovations. First, rather than offering a single winner, we propose to offer the group the best top-k alternatives. This can be beneficial if a certain item suddenly becomes unavailable, or if the group wishes to choose manually from a few selected items. Secondly, rather than offering a definite winning item, we suggest to approximate the item or the top-k items that best suit the group, according to a predefined confidence level. We study the tradeoff between the accuracy of the proposed winner item and the amount of preference elicitation required. Lastly, we offer to consider different preference aggregation strategies. These strategies differ in their emphasis: towards the individual users (Least Misery Strategy) or towards the majority of the group (Majority Based Strategy). We evaluate our findings on data collected in a user study as well as on real world and simulated datasets and show that selecting the suitable aggregation strategy and relaxing the termination condition can reduce communication cost up to 90%. Furthermore, the commonly used Majority strategy does not always outperform the Least Misery strategy. Addressing these three challenges contributes to the minimization of preference elicitation in expert systems.
Published: 2016

8. Keystroke dynamics obfuscation using key grouping

Author: Lior Rokach, Itay Hazan, and Oded Margalit
Subjects: Password, 0209 industrial biotechnology, business.industry, Computer science, General Engineering, 02 engineering and technology, Encryption, Machine learning, computer.software_genre, Keystroke logging, Session (web analytics), Computer Science Applications, 020901 industrial engineering & automation, Keystroke dynamics, Artificial Intelligence, Identity theft, Obfuscation, 0202 electrical engineering, electronic engineering, information engineering, Key (lock), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Keystroke dynamics is one of the most widely adopted identity verification techniques in remote systems. It is based on modeling users’ specific patterns of typing on the keyboard. When utilized in conjunction with the commonly used passwords, the use of keystroke dynamics can dramatically increase the level of security without interfering with the user experience. However, aspects of keystroke dynamics that applied on passwords, such as processing keystroke events and storing feature vectors or user models, can expose users to identity theft and a new set of privacy risks, thus questioning the added value of keystroke dynamics. In addition, common encryption techniques will be unable to mitigate these threats, since the user's behavior changes from one session to another. In this paper, we suggest key grouping as an obfuscation method to ensure keystroke dynamics privacy. When applied on the keystroke events, the key grouping dramatically reduces the possibility of password theft. To perform the key grouping optimally, we present a novel method which produces groups that can integrated with any keystroke dynamics algorithm. Our method divides the keys into groups using hierarchical clustering with dedicated statistical heuristics algorithm. We tested our method's key grouping output on five keystroke dynamics algorithms using a public dataset and managed to show a consistent improvement of up to 7% in the AUC over other, more intuitive key groupings and random key groupings.
Published: 2020

9. Local-shapelets for fast classification of spectrographic measurements

Author: Daniel Gordon, Aryeh Kontorovich, Lior Rokach, and Danny Hendler
Subjects: Series (mathematics), Artificial Intelligence, Computer science, Decision tree learning, General Engineering, Data mining, computer.software_genre, Throughput (business), Class (biology), computer, Computer Science Applications
Abstract: We present an algorithm for classifying spectrographic measurements.The concept of locality is introduced into an established time series algorithm.A technique for estimating a tolerance parameter is presented.Learning and classification times are reduced by two orders of magnitude.Accuracy levels are retained. Spectroscopy is widely used in the food industry as a time-efficient alternative to chemical testing. Lightning-monitoring systems also employ spectroscopic measurements. The latter application is important as it can help predict the occurrence of severe storms, such as tornadoes.The shapelet based classification method is particularly well-suited for spectroscopic data sets. This technique for classifying time series extracts patterns unique to each class. A significant downside of this approach is the time required to build the classification tree. In addition, for high throughput applications the classification time of long time series is inhibitive. Although some progress has been made in terms of reducing the time complexity of building shapelet based models, the problem of reducing classification time has remained an open challenge.We address this challenge by introducing local-shapelets. This variant of the shapelet method restricts the search for a match between shapelets and time series to the vicinity of the location from which each shapelet was extracted. This significantly reduces the time required to examine each shapelet during both the learning and classification phases. Classification based on local-shapelets is well-suited for spectroscopic data sets as these are typically very tightly aligned. Our experimental results on such data sets demonstrate that the new approach reduces learning and classification time by two orders of magnitude while retaining the accuracy of regular (non-local) shapelets-based classification. In addition, we provide some theoretical justification for local-shapelets.
Published: 2015

10. Novel active learning methods for enhanced PC malware detection in windows OS

Author: Nir Nissim, Yuval Elovici, Robert Moskovitch, and Lior Rokach
Subjects: Point (typography), Computer science, Active learning (machine learning), business.industry, General Engineering, Information security, computer.software_genre, Machine learning, Computer Science Applications, Support vector machine, Task (computing), Artificial Intelligence, Microsoft Windows, Malware, Data mining, Artificial intelligence, Suspect, business, computer
Abstract: The formation of new malwares every day poses a significant challenge to anti-virus vendors since antivirus tools, using manually crafted signatures, are only capable of identifying known malware instances and their relatively similar variants. To identify new and unknown malwares for updating their anti-virus signature repository, anti-virus vendors must daily collect new, suspicious files that need to be analyzed manually by information security experts who then label them as malware or benign. Analyzing suspected files is a time-consuming task and it is impossible to manually analyze all of them. Consequently, anti-virus vendors use machine learning algorithms and heuristics in order to reduce the number of suspect files that must be inspected manually. These techniques, however, lack an essential element – they cannot be daily updated. In this work we introduce a solution for this updatability gap. We present an active learning (AL) framework and introduce two new AL methods that will assist anti-virus vendors to focus their analytical efforts by acquiring those files that are most probably malicious. Those new AL methods are designed and oriented towards new malware acquisition. To test the capability of our methods for acquiring new malwares from a stream of unknown files, we conducted a series of experiments over a ten-day period. A comparison of our methods to existing high performance AL methods and to random selection, which is the naive method, indicates that the AL methods outperformed random selection for all performance measures. Our AL methods outperformed existing AL method in two respects, both related to the number of new malwares acquired daily, the core measure in this study. First, our best performing AL method, termed “Exploitation”, acquired on the 9th day of the experiment about 2.6 times more malwares than the existing AL method and 7.8 more times than the random selection. Secondly, while the existing AL method showed a decrease in the number of new malwares acquired over 10 days, our AL methods showed an increase and a daily improvement in the number of new malwares acquired. Both results point towards increased efficiency that can possibly assist anti-virus vendors.
Published: 2014

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Lior Rokach"'

1. Constraint learning based gradient boosting trees

2. A deep learning framework for predicting burglaries based on multiple contextual factors

3. Explaining anomalies detected by autoencoders using Shapley Additive Explanations

4. A hybrid approach for improving unsupervised fault detection for robotic systems

5. An ensemble method for top-N recommendations from the SVD

6. SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods

7. Reducing preference elicitation in group decision making

8. Keystroke dynamics obfuscation using key grouping

9. Local-shapelets for fast classification of spectrographic measurements

10. Novel active learning methods for enhanced PC malware detection in windows OS

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

10 results on '"Lior Rokach"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources