Search Limiters: Full Text / Topic: dimensionality reduction - Searchworks@Jio Institute Digital Library Search Results

Showing total 6,982 results

Start Over Search Limiters Full Text Topic dimensionality reduction

6,982 results

1. REPLY TO THE PAPER BY LILIANA FORZANI AND R. DENNIS COOK: "A NOTE ON SMOOTHED FUNCTIONAL INVERSE REGRESSION"

Author: Ferré, L. and Yao, A. F.
Published: 2007

2. Unsupervised Clustering of Hyperspectral Paper Data Using t-SNE

Author: Binu Melit Devassy, Peter Nussbaum, and Sony George
Subjects: Clustering high-dimensional data, forensic paper analysis, Channel (digital image), Computer science, 0211 other engineering and technologies, 02 engineering and technology, lcsh:Computer applications to medicine. Medical informatics, 01 natural sciences, Article, Field (computer science), lcsh:QA75.5-76.95, hyperspectral unsupervised clustering, Radiology, Nuclear Medicine and imaging, lcsh:Photography, Electrical and Electronic Engineering, Cluster analysis, 021101 geological & geomatics engineering, hyperspectral dimensionality reduction, business.industry, Dimensionality reduction, 010401 analytical chemistry, Hyperspectral imaging, Pattern recognition, lcsh:TR1-1050, Computer Graphics and Computer-Aided Design, t-SNE, 0104 chemical sciences, Visualization, forensic document analysis, Embedding, lcsh:R858-859.7, Computer Vision and Pattern Recognition, Artificial intelligence, lcsh:Electronic computers. Computer science, business
Abstract: For a suspected forgery that involves the falsification of a document or its contents, the investigator will primarily analyze the document&rsquo, s paper and ink in order to establish the authenticity of the subject under investigation. As a non-destructive and contactless technique, Hyperspectral Imaging (HSI) is gaining popularity in the field of forensic document analysis. HSI returns more information compared to conventional three channel imaging systems due to the vast number of narrowband images recorded across the electromagnetic spectrum. As a result, HSI can provide better classification results. In this publication, we present results of an approach known as the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm, which we have applied to HSI paper data analysis. Even though t-SNE has been widely accepted as a method for dimensionality reduction and visualization of high dimensional data, its usefulness has not yet been evaluated for the classification of paper data. In this research, we present a hyperspectral dataset of paper samples, and evaluate the clustering quality of the proposed method both visually and quantitatively. The t-SNE algorithm shows exceptional discrimination power when compared to traditional PCA with k-means clustering, in both visual and quantitative evaluations.
Published: 2020

3. Analysis on Research Paper Publication Recommender System with Authors - Conferences Matrix

Author: Htay Htay Win
Subjects: Topic model, Theoretical computer science, Computer science, Dimensionality reduction, 02 engineering and technology, 010501 environmental sciences, Recommender system, 01 natural sciences, Correspondence analysis, Linear map, Matrix (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0105 earth and related environmental sciences
Abstract: For years, achievements and discoveries made by researcher are made aware through research papers published in appropriate journals or conferences. Many a time, established s researcher and mainly new user are caught up in the predicament of choosing an appropriate conference to get their work all the time. Every scienti?c conference and journal is inclined towards a particular ?eld of research and there is a extensive group of them for any particular ?eld. Choosing an appropriate venue is needed as it helps in reaching out to the right listener and also to further one’s chance of getting their paper published. In this work, we address the problem of recommending appropriate conferences to the authors to increase their chances of receipt. We present three di?erent approaches for the same involving the use of social network of the authors and the content of the paper in the settings of dimensionality reduction and topic modelling. In all these approaches, we apply Correspondence Analysis (CA) to obtain appropriate relationships between the entities in question, such as conferences and papers. Our models show hopeful results when compared with existing methods such as content-based ?ltering, collaborative ?ltering and hybrid ?ltering.
Published: 2020

4. The effect of COVID-19 on the Egyptian exchange using principal component analysis

Author: Ezzat, Heba M.
Published: 2023
Full Text: View/download PDF

5. Visual Characterization of Paper Using Isomap and Local Binary Patterns

Author: Matti Pietikäinen, Markus Turtinen, and Olli Silven
Subjects: Training set, business.industry, Local binary patterns, Computer science, Dimensionality reduction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Visualization, Artificial Intelligence, Hardware and Architecture, Computer vision, Computer Vision and Pattern Recognition, Pattern matching, Artificial intelligence, Electrical and Electronic Engineering, Isomap, business, Classifier (UML), Software
Abstract: In this paper, we study how a multidimensional local binary pattern (LBP) texture feature data can be visually explored and analyzed. The goal is to determine how true paper properties can be characterized with local texture features from visible light images. We utilize isometric feature mapping (Isomap) for the LBP texture feature data and perform non-linear dimensionality reduction for the data. These 2D projections are then visualized with original images to study data properties. Visualization is utilized in the manner of selecting texture models for unlabeled data and analyzing feature performance when building a training set for a classifier. The approach is experimented on with simulated image data illustrating different paper properties and on-line transilluminated paper images taken from a running paper web in the paper mill. The simulated image set is used to acquire quantitative figures on the performance while the analysis of real-world data is an example of semi-supervised learning.
Published: 2006

6. Input variable selection in time-critical knowledge integration applications: A review, analysis, and recommendation paper

Author: Alireza Mousavi, Stefan Poslad, and Siamak Tavakoli
Subjects: Input variable selection, Computer science, Context (language use), Feature selection, 02 engineering and technology, computer.software_genre, Supervisory control and data acquisition, 01 natural sciences, 010104 statistics & probability, Data acquisition, SCADA, Artificial Intelligence, Knowledge integration, 0202 electrical engineering, electronic engineering, information engineering, Sensitivity (control systems), 0101 mathematics, Dimensionality reduction, Time-critical control, 13. Climate action, 020201 artificial intelligence & image processing, Data mining, State (computer science), Sensitivity analysis, computer, Information Systems
Abstract: This is the post-print version of the final paper published in Advanced Engineering Informatics. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V. The purpose of this research is twofold: first, to undertake a thorough appraisal of existing Input Variable Selection (IVS) methods within the context of time-critical and computation resource-limited dimensionality reduction problems; second, to demonstrate improvements to, and the application of, a recently proposed time-critical sensitivity analysis method called EventTracker to an environment science industrial use-case, i.e., sub-surface drilling. Producing time-critical accurate knowledge about the state of a system (effect) under computational and data acquisition (cause) constraints is a major challenge, especially if the knowledge required is critical to the system operation where the safety of operators or integrity of costly equipment is at stake. Understanding and interpreting, a chain of interrelated events, predicted or unpredicted, that may or may not result in a specific state of the system, is the core challenge of this research. The main objective is then to identify which set of input data signals has a significant impact on the set of system state information (i.e. output). Through a cause-effect analysis technique, the proposed technique supports the filtering of unsolicited data that can otherwise clog up the communication and computational capabilities of a standard supervisory control and data acquisition system. The paper analyzes the performance of input variable selection techniques from a series of perspectives. It then expands the categorization and assessment of sensitivity analysis methods in a structured framework that takes into account the relationship between inputs and outputs, the nature of their time series, and the computational effort required. The outcome of this analysis is that established methods have a limited suitability for use by time-critical variable selection applications. By way of a geological drilling monitoring scenario, the suitability of the proposed EventTracker Sensitivity Analysis method for use in high volume and time critical input variable selection problems is demonstrated. EU
Published: 2013
Full Text: View/download PDF

7. Aspects of wind energy characteristics in transmission related optimisation models: Invited panel discussion paper

Author: Daniel J. Burke and Mark O'Malley
Subjects: Reduction (complexity), Power transmission, Mathematical optimization, Generality, Electric power system, Engineering, Wind power, Power system simulation, Transmission (telecommunications), business.industry, Dimensionality reduction, business
Abstract: This invited panel paper discussion will outline a number of aspects of wind energy characteristics relevant to the optimal wind/transmission model formulation task. Optimal placement of wind capacity on a constrained transmission network is a typical example of this type of problem. In particular the relevance of advanced and computationally intensive stochastic unit commitment to the model formulation will be debated. Optimization constraint matrix structure and techniques to exploit it will be shown to be of considerable importance for this type of problem. The relative merits of different model dimensionality reduction schemes, either through multivariate component analysis and probability discretisation or indeed scenario reduction, will be discussed. A pragmatic acceptance of the imprecise impact of long-term power system uncertainties will be maintained throughout, and wherever possible generality to different types of power systems will be considered.
Published: 2011

8. A Frequency Domain Kernel Function-Based Manifold Dimensionality Reduction and Its Application for Graph-Based Semi-Supervised Classification.

Author: Liang, Zexiao, Gong, Ruyi, Tan, Guoliang, Ji, Shiyin, and Zhan, Ruidian
Subjects: FEATURE extraction, IMAGE processing, CLASSIFICATION, IMAGE recognition (Computer vision), KERNEL functions, CLASSIFICATION algorithms, SUPERVISED learning
Abstract: With the increasing demand for high-resolution images, handling high-dimensional image data has become a key aspect of intelligence algorithms. One effective approach is to preserve the high-dimensional manifold structure of the data and find the accurate mappings in a lower-dimensional space. However, various non-sparse, high-energy occlusions in real-world images can lead to erroneous calculations of sample relationships, invalidating the existing distance-based manifold dimensionality reduction techniques. Many types of noise are difficult to capture and filter in the original domain but can be effectively separated in the frequency domain. Inspired by this idea, a novel approach is proposed in this paper, which obtains the high-dimensional manifold structure according to the correlationships between data points in the frequency domain and accurately maps it to a lower-dimensional space, named Frequency domain-based Manifold Dimensionality Reduction (FMDR). In FMDR, samples are first transformed into frequency domains. Then, interference is filtered based on the distribution in the frequency domain, thereby emphasizing discriminative features. Subsequently, an innovative kernel function is proposed for measuring the similarities between samples according to the correlationships in the frequency domain. With the assistance of these correlationships, a graph structure can be constructed and utilized to find the mapping in a low-dimensional space. To further demonstrate the effectiveness of the proposed algorithm, FMDR is employed for the semi-supervised classification problems in this paper. Experiments using public image datasets indicate that, compared to baseline algorithms and state-of-the-art methods, our approach achieves superior recognition performance. Even with very few labeled data, the advantages of FMDR are still maintained. The effectiveness of FMDR in dimensionality reduction and feature extraction of images makes it widely applicable in fields such as image processing and image recognition. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. A Study on Dimensionality Reduction and Parameters for Hyperspectral Imagery Based on Manifold Learning.

Author: Song, Wenhui, Zhang, Xin, Yang, Guozhu, Chen, Yijin, Wang, Lianchao, and Xu, Hanghang
Subjects: HYPERSPECTRAL imaging systems, FISHER discriminant analysis, FEATURE extraction, REMOTE sensing, K-nearest neighbor classification, SURFACE of the earth, PRINCIPAL components analysis, MULTIDIMENSIONAL scaling
Abstract: With the rapid advancement of remote-sensing technology, the spectral information obtained from hyperspectral remote-sensing imagery has become increasingly rich, facilitating detailed spectral analysis of Earth's surface objects. However, the abundance of spectral information presents certain challenges for data processing, such as the "curse of dimensionality" leading to the "Hughes phenomenon", "strong correlation" due to high resolution, and "nonlinear characteristics" caused by varying surface reflectances. Consequently, dimensionality reduction of hyperspectral data emerges as a critical task. This paper begins by elucidating the principles and processes of hyperspectral image dimensionality reduction based on manifold theory and learning methods, in light of the nonlinear structures and features present in hyperspectral remote-sensing data, and formulates a dimensionality reduction process based on manifold learning. Subsequently, this study explores the capabilities of feature extraction and low-dimensional embedding for hyperspectral imagery using manifold learning approaches, including principal components analysis (PCA), multidimensional scaling (MDS), and linear discriminant analysis (LDA) for linear methods; and isometric mapping (Isomap), locally linear embedding (LLE), Laplacian eigenmaps (LE), Hessian locally linear embedding (HLLE), local tangent space alignment (LTSA), and maximum variance unfolding (MVU) for nonlinear methods, based on the Indian Pines hyperspectral dataset and Pavia University dataset. Furthermore, the paper investigates the optimal neighborhood computation time and overall algorithm runtime for feature extraction in hyperspectral imagery, varying by the choice of neighborhood k and intrinsic dimensionality d values across different manifold learning methods. Based on the outcomes of feature extraction, the study examines the classification experiments of various manifold learning methods, comparing and analyzing the variations in classification accuracy and Kappa coefficient with different selections of neighborhood k and intrinsic dimensionality d values. Building on this, the impact of selecting different bandwidths t for the Gaussian kernel in the LE method and different Lagrange multipliers λ for the MVU method on classification accuracy, given varying choices of neighborhood k and intrinsic dimensionality d, is explored. Through these experiments, the paper investigates the capability and effectiveness of different manifold learning methods in feature extraction and dimensionality reduction within hyperspectral imagery, as influenced by the selection of neighborhood k and intrinsic dimensionality d values, identifying the optimal neighborhood k and intrinsic dimensionality d value for each method. A comparison of classification accuracies reveals that the LTSA method yields superior classification results compared to other manifold learning approaches. The study demonstrates the advantages of manifold learning methods in processing hyperspectral image data, providing an experimental reference for subsequent research on hyperspectral image dimensionality reduction using manifold learning methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Dimensionality reduction model based on integer planning for the analysis of key indicators affecting life expectancy.

Author: Cui, Wei, Xu, Zhiqiang, and Mu, Ren
Subjects: LIFE expectancy, INTEGER programming, DATA reduction, DATA mining, DATA visualization, WORLD health
Abstract: Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance. Additionally, the interpretability of these models presents a persistent challenge. This paper proposes two innovative dimensionality reduction models based on integer programming (DRMBIP). These models assess compactness through the correlation of each indicator with its class center, while separation is evaluated by the correlation between different class centers. In contrast to DRMBIP-p, the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation. This study, getting data from the Global Health Observatory (GHO), investigates 141 indicators that influence life expectancy. The findings reveal that DRMBIP-p effectively reduces the dimensionality of data, ensuring compactness. It also maintains compatibility with other models. Additionally, DRMBIP-v finds the optimal result, showing exceptional separation. Visualization of the results reveals that all classes have a high compactness. The DRMBIP-p requires the input of the correlation threshold parameter, which plays a pivotal role in the effectiveness of the final dimensionality reduction results. In the DRMBIP-v, modifying the threshold parameter to variable potentially emphasizes either separation or compactness. This necessitates an artificial adjustment to the overflow component within the objective function. The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators. Validated by life expectancy data, this paper demonstrates potential to assist data miners with the reduction of data dimensions. To our knowledge, this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering. It not only has applications in life expectancy, but also has obvious advantages in data mining work that requires precise class centers. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. Visualizing and Analyzing Voting Records from Historical Documents

Author: Cantareira, Gabriel Dias, Cole, Nicholas, Abdul-Rahman, Alfie, Scholger, Walter, Vogeler, Georg, Tasovac, Toma, Baillot, Anne, Raunig, Elisabeth, Scholger, Martina, Steiner, Elisabeth, Centre for Information Modelling, and Helling, Patrick
Subjects: Paper, History, Long Presentation, Humanities computing, Voting, Interface design, and analysis, development, Dimensionality Reduction, Timelines
Abstract: In this submission, we present a set of tools for analyzing voting historical records for consitutional conventions extending the Quill research platform and discuss challenges in interacting with this data, focusing on selecting appropriate sampling, dealing with incomplete data, and building an analytical model.
Published: 2023
Full Text: View/download PDF

12. Gradient-based explanation for non-linear non-parametric dimensionality reduction

Author: Corbugy, Sacha, Marion, Rebecca, and Frénay, Benoît
Published: 2024
Full Text: View/download PDF

13. Interpretable linear dimensionality reduction based on bias-variance analysis

Author: Bonetti, Paolo, Metelli, Alberto Maria, and Restelli, Marcello
Published: 2024
Full Text: View/download PDF

14. Application of Dimensionality Reduction and Machine Learning Methods for the Interpretation of Gas Sensor Array Readouts from Mold-Threatened Buildings.

Author: Łagód, Grzegorz, Piłat-Rożek, Magdalena, Majerek, Dariusz, Łazuka, Ewa, Suchorab, Zbigniew, Guz, Łukasz, Kočí, Václav, and Černý, Robert
Subjects: SENSOR arrays, GAS detectors, MACHINE learning, ELECTRONIC noses, SICK building syndrome, MULTILAYER perceptrons, SUPERVISED learning, CONSTRUCTION materials
Abstract: Featured Application: The solutions presented in the work, based on the use of a multi-sensor matrix and the analysis of multidimensional data, can be used in practice for assessing the mycological risk of buildings, detecting the presence of mold in rooms and evaluating the risk of the sick building syndrome. Paper is in the scope of moisture-related problems which are connected with mold threat in buildings, sick building syndrome (SBS) as well as application of electronic nose for evaluation of different building envelopes and building materials. The machine learning methods used to analyze multidimensional signals are important components of the e-nose system. These multidimensional signals are derived from a gas sensor array, which, together with instrumentation, constitute the hardware of this system. The accuracy of the classification and the correctness of the classification of mold threat in buildings largely depend on the appropriate selection of the data analysis methods used. This paper proposes a method of data analysis using Principal Component Analysis, metric multidimensional scaling and Kohonen self-organizing map, which are unsupervised machine learning methods, to visualize and reduce the dimensionality of the data. For the final classification of observations and the identification of datasets from gas sensor arrays analyzing air from buildings threatened by mold, as well as from other reference materials, supervised learning methods such as hierarchical cluster analysis, MLP neural network and the random forest method were used. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

15. Predictive Modeling of Delay in an LTE Network by Optimizing the Number of Predictors Using Dimensionality Reduction Techniques.

Author: Stojčić, Mirko, Banjanin, Milorad K., Vasiljević, Milan, Nedić, Dragana, Stjepanović, Aleksandar, Danilović, Dejan, and Puzić, Goran
Subjects: OPTIMIZATION algorithms, LONG-Term Evolution (Telecommunications), PREDICTION models, SUPPORT vector machines, K-nearest neighbor classification, PARETO principle, MULTILAYER perceptrons
Abstract: Delay in data transmission is one of the key performance indicators (KPIs) of a network. The planning and design value of delay in network management is of crucial importance for the optimal allocation of network resources and their performance focuses. To create optimal solutions, predictive models, which are currently most often based on machine learning (ML), are used. This paper aims to investigate the training, testing and selection of the best predictive delay model for a VoIP service in a Long Term Evolution (LTE) network using three ML techniques: Multilayer Perceptron (MLP), Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN). The space of model input variables is optimized by dimensionality reduction techniques: RReliefF algorithm, Backward selection via the recursive feature elimination algorithm and the Pareto 80/20 rule. A three-segment road in the geo-space between the cities of Banja Luka (BL) and Doboj (Db) in the Republic of Srpska (RS), Bosnia and Herzegovina (BiH), covered by the cellular network (LTE) of the M:tel BL operator was chosen for the case study. The results show that the k-NN model has been selected as the best solution in all three optimization approaches. For the RReliefF optimization algorithm, the best model has six inputs and the minimum relative error (RE) RE = 0.109. For the Backward selection via the recursive feature elimination algorithm, the best model has four inputs and RE = 0.041. Finally, for the Pareto 80/20 rule, the best model has 11 inputs and RE = 0.049. The comparative analysis of the results concludes that, according to observed criteria for the selection of the final model, the best solution is an approach to optimizing the number of predictors based on the Backward selection via the recursive feature elimination algorithm. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. Time Estimation Algorithm of Single-Phase-to-Ground Fault Based on Two-Step Dimensionality Reduction.

Author: Lin, Xin, Chen, Haoran, Xu, Kai, and Xu, Jianyuan
Subjects: TIME perception, DIMENSIONAL reduction algorithms, HILBERT-Huang transform, ALGORITHMS, PRINCIPAL components analysis
Abstract: The fault detection time identified by relying on the over-voltage criterion of zero-sequence voltage often lags behind the actual occurrence time of ground faults, which may cause fault protection methods based on transient quantity principles to miss fault characteristics and lose their protection capability. To accurately estimate the time of occurrence of a single-phase-to-ground fault, this paper proposes a two-step dimensionality reduction algorithm for estimating the time of occurrence of a single-phase-to-ground fault in a distribution network. This algorithm constructs a filter based on Empirical Mode Decomposition (EMD) to establish a high-dimensional feature dataset based on the zero-sequence current of all feeders. After Principal Component Analysis and Hilbert Mapping Algorithm, the high-dimensional data are reduced to two dimensions to construct a two-dimensional feature dataset. The density-based clustering method is used to adaptively divide the data into two categories, fault data and non-fault data, so as to estimate the time of occurrence of the fault. The paper designs 11 sets of experiments including 7 common high-resistance grounding mediums to verify the accuracy of the fault time recognition of this algorithm. The accuracy of this algorithm is within 7.3 ms and it exhibits better detection performance compared to the threshold detection method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

17. Exploring Multidimensional Embeddings for Decision Support Using Advanced Visualization Techniques.

Author: Kurasova, Olga, Budžys, Arnoldas, and Medvedev, Viktor
Subjects: DEEP learning, DECISION support systems, DATA visualization, ARTIFICIAL intelligence, DATA structures
Abstract: As artificial intelligence has evolved, deep learning models have become important in extracting and interpreting complex patterns from raw multidimensional data. These models produce multidimensional embeddings that, while containing a lot of information, are often not directly understandable. Dimensionality reduction techniques play an important role in transforming multidimensional data into interpretable formats for decision support systems. To address this problem, the paper presents an analysis of dimensionality reduction and visualization techniques that embrace complex data representations and are useful inferences for decision systems. A novel framework is proposed, utilizing a Siamese neural network with a triplet loss function to analyze multidimensional data encoded into images, thus transforming these data into multidimensional embeddings. This approach uses dimensionality reduction techniques to transform these embeddings into a lower-dimensional space. This transformation not only improves interpretability but also maintains the integrity of the complex data structures. The efficacy of this approach is demonstrated using a keystroke dynamics dataset. The results support the integration of these visualization techniques into decision support systems. The visualization process not only simplifies the complexity of the data, but also reveals deep patterns and relationships hidden in the embeddings. Thus, a comprehensive framework for visualizing and interpreting complex keystroke dynamics is described, making a significant contribution to the field of user authentication. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Next-Gen Dynamic Hand Gesture Recognition: MediaPipe, Inception-v3 and LSTM-Based Enhanced Deep Learning Model.

Author: Yaseen, Kwon, Oh-Jin, Kim, Jaeho, Jamil, Sonain, Lee, Jinhee, and Ullah, Faiz
Abstract: Gesture recognition is crucial in computer vision-based applications, such as drone control, gaming, virtual and augmented reality (VR/AR), and security, especially in human–computer interaction (HCI)-based systems. There are two types of gesture recognition systems, i.e., static and dynamic. However, our focus in this paper is on dynamic gesture recognition. In dynamic hand gesture recognition systems, the sequences of frames, i.e., temporal data, pose significant processing challenges and reduce efficiency compared to static gestures. These data become multi-dimensional compared to static images because spatial and temporal data are being processed, which demands complex deep learning (DL) models with increased computational costs. This article presents a novel triple-layer algorithm that efficiently reduces the 3D feature map into 1D row vectors and enhances the overall performance. First, we process the individual images in a given sequence using the MediaPipe framework and extract the regions of interest (ROI). The processed cropped image is then passed to the Inception-v3 for the 2D feature extractor. Finally, a long short-term memory (LSTM) network is used as a temporal feature extractor and classifier. Our proposed method achieves an average accuracy of more than 89.7%. The experimental results also show that the proposed framework outperforms existing state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Regularized linear discriminant analysis based on generalized capped l2,q-norm.

Author: Li, Chun-Na, Ren, Pei-Wei, Guo, Yan-Ru, Ye, Ya-Fen, and Shao, Yuan-Hai
Subjects: FISHER discriminant analysis, EIGENVALUES, GENERALIZATION
Abstract: Aiming to improve the robustness and adaptiveness of the recently investigated capped norm linear discriminant analysis (CLDA), this paper proposes a regularized linear discriminant analysis based on the generalized capped l 2 , q -norm (GCLDA). Compared to CLDA, there are two improvements in GCLDA. Firstly, GCLDA uses the capped l 2 , q -norm rather than the capped l 2 , 1 -norm to measure the within-class and between-class distances for arbitrary q > 0 . By selecting an appropriate q, GCLDA is adaptive to different data, and also removes extreme outliers and suppresses the effect of noise more effectively. Secondly, by taking into account a regularization term, GCLDA not only improves its generalization ability but also avoids singularity. GCLDA is solved through a series of generalized eigenvalue problems. Experiments on an artificial dataset, some real world datasets and a high-dimensional dataset demonstrate the effectiveness of GCLDA. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. A Computing System for Complex Cases of Major Recurrent Depression Based on Latent Semantic Analysis: Relationship between Life Themes and Symptoms.

Author: Sumedrea, Alin Gilbert, Sumedrea, Cristian Mihai, and Săvulescu, Florin
Abstract: The paper presents a computing procedure with the goal of suggesting applicable solutions to improve complex cases of major recurrent depression. The focus is on identifying the patients' illness patterns and on finding solutions for alleviating problematic symptoms. The illness patterns synthesize the outcomes of the relationship between the patients' life themes and symptoms. The testing of the effectiveness of illness improvement solutions was conducted by evaluating and comparing the Beck scores of patients after each psychotherapy session. In addition to latent semantic analysis used to identify semantic relationships between life themes and symptoms, the research also employed the correlation method to find life themes/symptoms that are experienced undistortedly and associations between life themes that amplify latent symptoms. The computing system was applied to eleven patients with severe forms of depression and their progress was monitored for six months. The results obtained following the application of the computing system demonstrated its ability to describe personalized illness patterns and to significantly improve, through its suggestions, the illness of all patients. These findings recommend the use of the computing system in severe cases of major recurrent depression. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Multi-Wavelength Computational Ghost Imaging Based on Feature Dimensionality Reduction.

Author: Wang, Hong, Wang, Xiaoqian, Gao, Chao, Wang, Yu, Zhao, Huan, and Yao, Zhihai
Abstract: Multi-wavelength ghost imaging usually involves extensive data processing and faces challenges such as poor reconstructed image quality. In this paper, we propose a multi-wavelength computational ghost imaging method based on feature dimensionality reduction. This method not only reconstructs high-quality color images with fewer measurements but also achieves low-complexity computation and storage. First, we utilize singular value decomposition to optimize the multi-scale measurement matrices of red, green, and blue components as illumination speckles. Subsequently, each component image of the target object is reconstructed using the second-order correlation function. Next, we apply principal component analysis to perform feature dimensionality reduction on these reconstructed images. Finally, we successfully recover a high-quality color reconstructed image. Simulation and experimental results show that our method not only improves the quality of the reconstructed images but also effectively reduces the computational and storage burden. When extended to multiple wavelengths, our method demonstrates greater advantages, making it more feasible to handle large-scale data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Bifurcation analysis on the reduced dopamine neuronal model.

Author: Jiang, Xiaofang, Zhou, Hui, Wang, Feifei, Zheng, Bingxin, and Lu, Bo
Subjects: BIFURCATION theory, DOPAMINERGIC neurons, DIMENSION reduction (Statistics), MATHEMATICAL variables, NONLINEAR theories
Abstract: Bursting is a crucial form of firing in neurons, laden with substantial information. Studying it can aid in understanding the neural coding to identify human behavioral characteristics conducted by these neurons. However, the high-dimensionality of many neuron models imposes a difficult challenge in studying the generative mechanisms of bursting. On account of the high complexity and nonlinearity characteristic of these models, it becomes nearly impossible to theoretically study and analyze them. Thus, this paper proposed to address these issues by focusing on the midbrain dopamine neurons, serving as the central neuron model for the investigation of the bursting mechanisms and bifurcation behaviors exhibited by the neuron. In this study, we considered the dimensionality reduction of a high-dimensional neuronal model and analyzed the dynamical properties of the reduced system. To begin, for the original thirteen-dimensional model, using the correlation between variables, we reduced its dimensionality and obtained a simplified three-dimensional system. Then, we discussed the changing characteristics of the number of spikes within a burst by simultaneously varying two parameters. Finally, we studied the co-dimension-2 bifurcation in the reduced system and presented the bifurcation behavior near the Bogdanov-Takens bifurcation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Performance Study on the Use of Genetic Algorithm for Reducing Feature Dimensionality in an Embedded Intrusion Detection System.

Author: Silva, João Lobo, Fernandes, Rui, and Lopes, Nuno
Subjects: COMPUTER network traffic, MACHINE learning, FEATURE selection, GENETIC algorithms, NUMBER theory
Abstract: Intrusion Detection Systems play a crucial role in a network. They can detect different network attacks and raise warnings on them. Machine Learning-based IDSs are trained on datasets that, due to the context, are inherently large, since they can contain network traffic from different time periods and often include a large number of features. In this paper, we present two contributions: the study of the importance of Feature Selection when using an IDS dataset, while striking a balance between performance and the number of features; and the study of the feasibility of using a low-capacity device, the Nvidia Jetson Nano, to implement an IDS. The results, comparing the GA with other well-known techniques in Feature Selection and Dimensionality Reduction, show that the GA has the best F1-score of 76%, among all feature/dimension sizes. Although the processing time to find the optimal set of features surpasses other methods, we observed that the reduction in the number of features decreases the GA processing time without a significant impact on the F1-score. The Jetson Nano allows the classification of network traffic with an overhead of 10 times in comparison to a traditional server, paving the way to a near real-time GA-based embedded IDS. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Toward Unbiased High-Quality Portraits through Latent-Space Evaluation.

Author: Almhaithawi, Doaa, Bellini, Alessandro, and Cerquitelli, Tania
Subjects: CATEGORIZATION (Psychology), VECTOR spaces, INDUSTRIAL surveys, RACE, DEPERSONALIZATION, YOUNG women
Abstract: Images, texts, voices, and signals can be synthesized by latent spaces in a multidimensional vector, which can be explored without the hurdles of noise or other interfering factors. In this paper, we present a practical use case that demonstrates the power of latent space in exploring complex realities such as image space. We focus on DaVinciFace, an AI-based system that explores the StyleGAN2 space to create a high-quality portrait for anyone in the style of the Renaissance genius Leonardo da Vinci. The user enters one of their portraits and receives the corresponding Da Vinci-style portrait as an output. Since most of Da Vinci's artworks depict young and beautiful women (e.g., "La Belle Ferroniere", "Beatrice de' Benci"), we investigate the ability of DaVinciFace to account for other social categorizations, including gender, race, and age. The experimental results evaluate the effectiveness of our methodology on 1158 portraits acting on the vector representations of the latent space to produce high-quality portraits that retain the facial features of the subject's social categories, and conclude that sparser vectors have a greater effect on these features. To objectively evaluate and quantify our results, we solicited human feedback via a crowd-sourcing campaign. Analysis of the human feedback showed a high tolerance for the loss of important identity features in the resulting portraits when the Da Vinci style is more pronounced, with some exceptions, including Africanized individuals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Machine learning methods in sport injury prediction and prevention: a systematic review

Author: Thomas Tischer, Romain Seil, Hans Van Eetvelde, Christophe Ley, and Luciana De Michelis Mendonça
Subjects: Technology and Engineering, Poison control, Feature selection, Sport injury, Machine learning, computer.software_genre, Suicide prevention, Machine Learning, 03 medical and health sciences, 0302 clinical medicine, Injury prevention, Medicine, Orthopedics and Sports Medicine, 030212 general & internal medicine, Hyperparameter, Orthopedic surgery, Review Paper, business.industry, Dimensionality reduction, 030229 sport sciences, Ensemble learning, Injury prediction, Support vector machine, TRAINING-LOAD, Mathematics and Statistics, Artificial intelligence, business, computer, RD701-811
Abstract: Purpose Injuries are common in sports and can have significant physical, psychological and financial consequences. Machine learning (ML) methods could be used to improve injury prediction and allow proper approaches to injury prevention. The aim of our study was therefore to perform a systematic review of ML methods in sport injury prediction and prevention. Methods A search of the PubMed database was performed on March 24th 2020. Eligible articles included original studies investigating the role of ML for sport injury prediction and prevention. Two independent reviewers screened articles, assessed eligibility, risk of bias and extracted data. Methodological quality and risk of bias were determined by the Newcastle–Ottawa Scale. Study quality was evaluated using the GRADE working group methodology. Results Eleven out of 249 studies met inclusion/exclusion criteria. Different ML methods were used (tree-based ensemble methods (n = 9), Support Vector Machines (n = 4), Artificial Neural Networks (n = 2)). The classification methods were facilitated by preprocessing steps (n = 5) and optimized using over- and undersampling methods (n = 6), hyperparameter tuning (n = 4), feature selection (n = 3) and dimensionality reduction (n = 1). Injury predictive performance ranged from poor (Accuracy = 52%, AUC = 0.52) to strong (AUC = 0.87, f1-score = 85%). Conclusions Current ML methods can be used to identify athletes at high injury risk and be helpful to detect the most important injury risk factors. Methodological quality of the analyses was sufficient in general, but could be further improved. More effort should be put in the interpretation of the ML models.
Published: 2021

26. Comparison of dimensionality reduction and clustering methods for SARS-CoV-2 genome

Author: Untari N. Wisesty and Tati Rajab Mengko
Subjects: SARS-CoV-2, Principal component analysis, Genome clustering, Dimensionality reduction, This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters
Abstract: This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.
Published: 2021

27. Applications and Techniques of Machine Learning in Cancer Classification: A Systematic Review.

Author: Yaqoob, Abrar, Musheer Aziz, Rabia, and verma, Navneet Kumar
Subjects: MACHINE learning, TUMOR grading, SYSTEMATIC reviews, COMPUTATIONAL linguistics, REINFORCEMENT learning
Abstract: The domain of Machine learning has experienced Substantial advancement and development. Recently, showcasing a Broad spectrum of uses like Computational linguistics, image identification, and autonomous systems. With the increasing demand for intelligent systems, it has become crucial to comprehend the different categories of machine acquiring knowledge systems along with their applications in the present world. This paper presents actual use cases of machine learning, including cancer classification, and how machine learning algorithms have been implemented on medical data to categorize diverse forms of cancer and anticipate their outcomes. The paper also discusses supervised, unsupervised, and reinforcement learning, highlighting the benefits and disadvantages of each category of Computational intelligence system. The conclusions of this systematic study on machine learning methods and applications in cancer classification have numerous implications. The main lesson is that through accurate classification of cancer kinds, patient outcome prediction, and identification of possible therapeutic targets, machine learning holds enormous potential for improving cancer diagnosis and therapy. This review offers readers with a broad understanding as of the present advancements in machine learning applied to cancer classification today, empowering them to decide for themselves whether to use these methods in clinical settings. Lastly, the paper wraps up by engaging in a discussion on the future of machine learning, including the potential for new types of systems to be developed as the field advances. Overall, the information included in this survey article is useful for scholars, practitioners, and individuals interested in gaining knowledge about the fundamentals of machine learning and its various applications in different areas of activities. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. A novel robust adaptive subspace learning framework for dimensionality reduction.

Author: Xiong, Weizhi, Yu, Guolin, Ma, Jun, and Liu, Sheng
Subjects: RANDOM forest algorithms, GAUSSIAN distribution, STATISTICS, NOISE
Abstract: High-dimensional data is characterized by its sparsity and noise, which can increase the likelihood of overfitting and compromise the model's generalizability performance. In this paper, a novel robust subspace learning method based on stable adaptive spectral clustering is put forward for dimensionality reduction. Firstly, a robust estimator is used to distinguish the role of normal and abnormal samples in constructing the model, thereby small values are assigned to the outliers, then the influence of outliers on the construction of the learning models will be reduced. Secondly, the p-order of L 2 -norm distance is applied as the distance metric, replacing the square of L 2 -norm distance metric. The L 2 , p -norm often commendably tolerates the biases caused by the outliers in sample data, especially when the outliers are away from the normal data distributions. Thirdly, the adaptive stable spectral clustering based on the L 2 , p -norm is proposed to construct similarity matrix of the novel robust subspace to carry out reflexive embedding learning to learn the local and globe features of the raw data. In the subspace, the data is reconstructed to reduce the influence of noise and outliers, and the similarity matrix is structured by the new features, that is more conducive to subspace learning. Three main roles of the objective function of our model are: (1) preserving the consistency between the original data and the estimation; (2) achieving a "clean" subspace and further removing the outliers; (3) avoiding the trivial solution for each node in the graph. Finally, the random forest algorithm is used to classify and predict the learned subspace with different feature selections. Experimental results show that the proposed method is superior to other subspace learning methods in classification performance. The results of noise experiment and statistical analysis demonstrate the effectiveness of the proposed method once again. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Computational methods for predicting 3D genomic organization from high-resolution chromosome conformation capture data

Author: Kimberly MacKay and Anthony Kusalik
Subjects: AcademicSubjects/SCI01140, Molecular Conformation, Computational biology, Network theory, Biology, Genome, Chromosomes, Chromosome conformation capture, 03 medical and health sciences, 0302 clinical medicine, Chromosome (genetic algorithm), Hi-C, Animals, Humans, genome organization, 030304 developmental biology, Genomic organization, Review Paper, 0303 health sciences, high-resolution chromosome conformation capture data, Dimensionality reduction, 3D genome reconstruction problem, Computational Biology, Statistical model, Genomics, Chromatin, 3D genome prediction, 5C, Graph (abstract data type), Algorithms, 030217 neurology & neurosurgery
Abstract: The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure–function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assays to predict 3D genome organization (the 3D genome reconstruction problem). Many computational tools have been developed that answer this need, but a comprehensive comparison of their underlying algorithmic approaches has not been conducted. This manuscript provides a comprehensive review of the existing computational tools (from November 2006 to September 2019, inclusive) that can be used to predict 3D genome organizations from high-resolution chromosome conformation capture data. Overall, existing tools were found to use a relatively small set of algorithms from one or more of the following categories: dimensionality reduction, graph/network theory, maximum likelihood estimation (MLE) and statistical modeling. Solutions in each category are far from maturity, and the breadth and depth of various algorithmic categories have not been fully explored. While the tools for predicting 3D structure for a genomic region or single chromosome are diverse, there is a general lack of algorithmic diversity among computational tools for predicting the complete 3D genome organization from high-resolution chromosome conformation capture data.
Published: 2020

30. Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF

Author: Meenakshi Venkatasubramanian, Nathan Salomonis, Daniel Schnell, Gowtham Atluri, and Kashish Chetal
Subjects: Statistics and Probability, Computer science, Gene Expression, computer.software_genre, Biochemistry, Non-negative matrix factorization, Matrix decomposition, 03 medical and health sciences, 0302 clinical medicine, Cluster Analysis, Cluster analysis, Molecular Biology, 030304 developmental biology, 0303 health sciences, Sequence Analysis, RNA, Gene Expression Profiling, Dimensionality reduction, Original Papers, Computer Science Applications, Support vector machine, Computational Mathematics, Computational Theory and Mathematics, 030220 oncology & carcinogenesis, Data mining, Single-Cell Analysis, computer, Algorithms
Abstract: Motivation The rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene Selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. Results We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse non-negative matrix factorization, cluster ‘fitness’, support vector machine) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively downsamples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. Availability and implementation ICGS2 is implemented in Python. The source code and documentation are available at http://altanalyze.org. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2020

31. Hidden Variable Models in Text Classification and Sentiment Analysis.

Author: Koochemeshkian, Pantea, Ihou Koffi, Eddy, and Bouguila, Nizar
Subjects: SENTIMENT analysis, GIBBS sampling, PRINCIPAL components analysis, GAUSSIAN distribution, CLASSIFICATION, BAYESIAN field theory
Abstract: In this paper, we are proposing extensions to the multinomial principal component analysis (MPCA) framework, which is a Dirichlet (Dir)-based model widely used in text document analysis. The MPCA is a discrete analogue to the standard PCA (it operates on continuous data using Gaussian distributions). With the extensive use of count data in modeling nowadays, the current limitations of the Dir prior (independent assumption within its components and very restricted covariance structure) tend to prevent efficient processing. As a result, we are proposing some alternatives with flexible priors such as generalized Dirichlet (GD) and Beta-Liouville (BL), leading to GDMPCA and BLMPCA models, respectively. Besides using these priors as they generalize the Dir, importantly, we also implement a deterministic method that uses variational Bayesian inference for the fast convergence of the proposed algorithms. Additionally, we use collapsed Gibbs sampling to estimate the model parameters, providing a computationally efficient method for inference. These two variational models offer higher flexibility while assigning each observation to a distinct cluster. We create several multitopic models and evaluate their strengths and weaknesses using real-world applications such as text classification and sentiment analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Overcoming Dimensionality Constraints: A Gershgorin Circle Theorem-Based Feature Extraction for Weighted Laplacian Matrices in Computer Vision Applications.

Author: Patel, Sahaj Anilbhai and Yildirim, Abidin
Subjects: LAPLACIAN matrices, COMPUTER vision, FEATURE extraction, APPLICATION software, GRAPH theory, CONVOLUTIONAL neural networks
Abstract: In graph theory, the weighted Laplacian matrix is the most utilized technique to interpret the local and global properties of a complex graph structure within computer vision applications. However, with increasing graph nodes, the Laplacian matrix's dimensionality also increases accordingly. Therefore, there is always the "curse of dimensionality"; In response to this challenge, this paper introduces a new approach to reducing the dimensionality of the weighted Laplacian matrix by utilizing the Gershgorin circle theorem by transforming the weighted Laplacian matrix into a strictly diagonal domain and then estimating rough eigenvalue inclusion of a matrix. The estimated inclusions are represented as reduced features, termed GC features; The proposed Gershgorin circle feature extraction (GCFE) method was evaluated using three publicly accessible computer vision datasets, varying image patch sizes, and three different graph types. The GCFE method was compared with eight distinct studies. The GCFE demonstrated a notable positive Z-score compared to other feature extraction methods such as I-PCA, kernel PCA, and spectral embedding. Specifically, it achieved an average Z-score of 6.953 with the 2D grid graph type and 4.473 with the pairwise graph type, particularly on the E_Balanced dataset. Furthermore, it was observed that while the accuracy of most major feature extraction methods declined with smaller image patch sizes, the GCFE maintained consistent accuracy across all tested image patch sizes. When the GCFE method was applied to the E_MNSIT dataset using the K-NN graph type, the GCFE method confirmed its consistent accuracy performance, evidenced by a low standard deviation (SD) of 0.305. This performance was notably lower compared to other methods like Isomap, which had an SD of 1.665, and LLE, which had an SD of 1.325; The GCFE outperformed most feature extraction methods in terms of classification accuracy and computational efficiency. The GCFE method also requires fewer training parameters for deep-learning models than the traditional weighted Laplacian method, establishing its potential for more effective and efficient feature extraction in computer vision tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Is Unsupervised Dimensionality Reduction Sufficient to Decode the Complexities of Electrochemical Impedance Spectra?

Author: Makogon, Aleksei, Kanoufi, Frederic, and Shkirskiy, Viacheslav
Subjects: MACHINE learning, ELECTRONIC data processing, CAPACITORS, AUTOMATION, SUCCESSIVE approximation analog-to-digital converters
Abstract: As electrochemical research undergoes rapid technological progression, the acquisition of substantial amounts of electrochemical impedance spectra (EIS) becomes increasingly feasible. Yet, this advancement introduces intricate challenges in data processing, automation, and interpretation. This paper delves into the sufficiency of unsupervised machine learning (ML) and in particular dimensionality reduction methods in decoding EIS complexities, examining its strengths, limitations, and potential pathways for optimization. As we navigated the intricacies of non‐linear dimensionality reduction, spotlighting t‐distributed stochastic neighbor embedding (t‐SNE) and uniform manifold approximation and projection (UMAP) algorithms, a pattern emerged: these techniques excel at categorizing divergent impedance spectra but show limitations when faced with analogous circuit configurations, especially those substituting a capacitor with a constant phase element. This observation not only underscores a limitation but also accentuates that unsupervised ML approaches, alone, may not fully unravel the nuances of EIS spectra. In the concluding section of our manuscript, we discuss the implications of this finding from a practical standpoint, particularly for electrochemists seeking to apply these methods in their work. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Using Recurrent Neural Network to Optimize Electronic Nose System with Dimensionality Reduction.

Author: Zou, Yanan and Lv, Jianhui
Subjects: ELECTRONIC noses, RECURRENT neural networks, ELECTRONIC systems, PATTERN recognition systems, OLFACTORY perception, ELECTRONIC paper
Abstract: Electronic nose is an electronic olfactory system that simulates the biological olfactory mechanism, which mainly includes gas sensor, data pre-processing, and pattern recognition. In recent years, the proposals of electronic nose have been widely developed, which proves that electronic nose is a considerably important tool. However, the most recent studies concentrate on the applications of electronic nose, which gradually neglects the inherent technique improvement of electronic nose. Although there are some proposals on the technique improvement, they usually pay attention to the modification of gas sensor module and barely consider the improvement of the last two modules. Therefore, this paper optimizes the electronic nose system from the perspective of data pre-processing and pattern recognition. Recurrent neural network (RNN) is used to do pattern recognition and guarantee accuracy rate and stability. Regarding the high-dimensional data pre-processing, the method of locally linear embedding (LLE) is used to do dimensionality reduction. The experiments are made based on the real sensor drift dataset, and the results show that the proposed optimization mechanism not only has higher accuracy rate and stability, but also has lower response time than the three baselines. In addition, regarding the usage of RNN model, the experimental results also show its efficiency in terms of recall ratio, precision ratio, and F1 value. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

35. A Dimension Reduction Approach for Energy Landscape: Identifying Intermediate States in Metabolism-EMT Network

Author: Xin Kang and Chunhe Li
Subjects: Dynamical systems theory, Computer science, dimension reduction, General Chemical Engineering, Science, Gene regulatory network, General Physics and Astronomy, Medicine (miscellaneous), 02 engineering and technology, 010402 general chemistry, Dynamical system, 01 natural sciences, Biochemistry, Genetics and Molecular Biology (miscellaneous), Stability (probability), epithelial‐mesenchymal transitions, General Materials Science, transition paths, gene regulatory networks, Full Paper, Dimensionality reduction, energy landscape, General Engineering, Energy landscape, State (functional analysis), Full Papers, 021001 nanoscience & nanotechnology, 0104 chemical sciences, 0210 nano-technology, Biological system, Biological network
Abstract: Dimension reduction is a challenging problem in complex dynamical systems. Here, a dimension reduction approach of landscape (DRL) for complex dynamical systems is proposed, by mapping a high‐dimensional system on a low‐dimensional energy landscape. The DRL approach is applied to three biological networks, which validates that new reduced dimensions preserve the major information of stability and transition of original high‐dimensional systems. The consistency of barrier heights calculated from the low‐dimensional landscape and transition actions calculated from the high‐dimensional system further shows that the landscape after dimension reduction can quantify the global stability of the system. The epithelial‐mesenchymal transition (EMT) and abnormal metabolism are two hallmarks of cancer. With the DRL approach, a quadrastable landscape for metabolism‐EMT network is identified, including epithelial (E), abnormal metabolic (A), hybrid E/M (H), and mesenchymal (M) cell states. The quantified energy landscape and kinetic transition paths suggest that for the EMT process, the cells at E state need to first change their metabolism, then enter the M state. The work proposes a general framework for the dimension reduction of a stochastic dynamical system, and advances the mechanistic understanding of the underlying relationship between EMT and cellular metabolism., Dimension reduction is a challenging problem in complex dynamical systems. An effective dimension reduction approach for a dynamical system is proposed based on the energy landscape theory, which provides a general way to quantify the global and stochastic dynamics of dynamical networks. This study also sheds light on the mechanism of the underlying relationship between EMT and cellular metabolism.
Published: 2020

36. Estimating the Number of Clusters in High-Dimensional Large Datasets.

Author: Zhu, Xutong and Li, Lingli
Subjects: DOCUMENT clustering, ALGORITHMS, DATA analysis, DIMENSION reduction (Statistics), EXPLORATORY factor analysis
Abstract: Clustering is a basic primer of exploratory tasks. In order to obtain valuable results, the parameters in the clustering algorithm, the number of clusters must be set appropriately. Existing methods for determining the number of clusters perform well on low-dimensional small datasets, but how to effectively determine the optimal number of clusters on large high-dimensional datasets is still a challenging problem. In this paper, the authors design a method for effectively estimating the optimal number of clusters on large-scale high-dimensional datasets that can overcome the shortcomings of existing estimation methods and accurately and quickly estimate the optimal number of clusters on large-scale high-dimensional datasets. Extensive experiments show that it (1) outperforms existing estimation methods in accuracy and efficiency, (2) generalizes across different datasets, and (3) is suitable for high-dimensional large datasets. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

37. An Enhanced Trace Ratio Linear Discriminant Analysis for Fault Diagnosis: An Illustrated Example Using HDD Data.

Author: Yang, Ang, Wang, Yu, Zi, Yanyang, and Chow, Tommy W. S.
Subjects: FISHER discriminant analysis, FAULT diagnosis, DATA disk drives, HILBERT-Huang transform, CARDBOARD, DIMENSION reduction (Statistics)
Abstract: This paper proposes a pattern recognition method for the fault diagnosis of machinery based on an enhanced trace ratio linear discriminant analysis (ETR-LDA) algorithm. It is an extension of TR-LDA that pursues to maximize the between-class distance and minimize the within-class distance. Since TR-LDA focuses on the total between-class distance and ignores the attention to local features, it may lead to deficiencies in the classification of specific patterns/classes, which is defined as short board in this paper. To cope with this limitation, the proposed ETR-LDA makes a reconsideration of the relationship between the total between-class distance and the global separability. More specifically, a new objective function is derived in ETR-LDA by taking the smallest between-class distance into account. The optimal solution of this objective function is proved to improve the smallest between-class distance with no change to the convergence and the global optimality of the algorithm, and the separability of different classes is increased eventually. Both synthetic data and hard disk drive wearing experimental data are employed to verify the efficiency of the proposed method. The results show that ETR-LDA is able to distinguish those fault categories and outperform TR-LDA and other dimensionality reduction methods. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

38. Partition: a surjective mapping approach for dimensionality reduction

Author: Sebastian Stintzing, Malcolm Barrett, Wu Zhang, Heinz-Josef Lenz, Volker Heinemann, Francesca Battaglin, Shu Cao, and Joshua Millstein
Subjects: Statistics and Probability, Genome, business.industry, Computer science, Dimensionality reduction, Genomics, computer.software_genre, Original Papers, Biochemistry, Partition (database), Computer Science Applications, Matrix decomposition, Surjective function, Computational Mathematics, Software, Computational Theory and Mathematics, Neoplasms, Principal component analysis, Humans, Data mining, business, Molecular Biology, computer, Algorithms
Abstract: Motivation Large amounts of information generated by genomic technologies are accompanied by statistical and computational challenges due to redundancy, badly behaved data and noise. Dimensionality reduction (DR) methods have been developed to mitigate these challenges. However, many approaches are not scalable to large dimensions or result in excessive information loss. Results The proposed approach partitions data into subsets of related features and summarizes each into one and only one new feature, thus defining a surjective mapping. A constraint on information loss determines the size of the reduced dataset. Simulation studies demonstrate that when multiple related features are associated with a response, this approach can substantially increase the number of true associations detected as compared to principal components analysis, non-negative matrix factorization or no DR. This increase in true discoveries is explained both by a reduced multiple-testing challenge and a reduction in extraneous noise. In an application to real data collected from metastatic colorectal cancer tumors, more associations between gene expression features and progression free survival and response to treatment were detected in the reduced than in the full untransformed dataset. Availability and implementation Freely available R package from CRAN, https://cran.r-project.org/package=partition. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2019

39. Atlas-based methods for efficient characterization of patient-specific ventricular activation patterns

Author: Andrew D. McCulloch, Jeffrey H. Omens, Sachin Govil, James C. Perry, Jake M Joblon, Nickolas Forsch, and Kevin P. Vincent
Subjects: Heart Ventricles, 030204 cardiovascular system & hematology, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, 0302 clinical medicine, Physiology (medical), Medicine, Humans, Sensitivity (control systems), Vectorcardiography, Spatial analysis, medicine.diagnostic_test, business.industry, Atlas (topology), Dimensionality reduction, Pattern recognition, Arrhythmias, Cardiac, Heart, Patient specific, Magnetic Resonance Imaging, Ventricular activation, Supplement Papers, Principal component analysis, Artificial intelligence, Cardiology and Cardiovascular Medicine, business
Abstract: Aims Ventricular activation patterns can aid clinical decision-making directly by providing spatial information on cardiac electrical activation or indirectly through derived clinical indices. The aim of this work was to derive an atlas of the major modes of variation of ventricular activation from model-predicted 3D bi-ventricular activation time distributions and to relate these modes to corresponding vectorcardiograms (VCGs). We investigated how the resulting dimensionality reduction can improve and accelerate the estimation of activation patterns from surface electrogram measurements. Methods and results Atlases of activation time (AT) and VCGs were derived using principal component analysis on a dataset of simulated electrophysiology simulations computed on eight patient-specific bi-ventricular geometries. The atlases provided significant dimensionality reduction, and the modes of variation in the two atlases described similar features. Utility of the atlases was assessed by resolving clinical waveforms against them and the VCG atlas was able to accurately reconstruct the patient VCGs with fewer than 10 modes. A sensitivity analysis between the two atlases was performed by calculating a compact Jacobian. Finally, VCGs generated by varying AT atlas modes were compared with clinical VCGs to estimate patient-specific activation maps, and the resulting errors between the clinical and atlas-based VCGs were less than those from more computationally expensive method. Conclusion Atlases of activation and VCGs represent a new method of identifying and relating the features of these high-dimensional signals that capture the major sources of variation between patients and may aid in identifying novel clinical indices of arrhythmia risk or therapeutic outcome.
Published: 2020

40. qSNE: quadratic rate t-SNE optimizer with automatic parameter tuning for large datasets

Author: Eleonora Petrucci, Katja Kaipio, Juha Koiranen, Oskari Lehtonen, Antti Häkkinen, Luca Pasquini, Olli Carpén, Sakari Hietanen, Johanna Hynninen, Rainer Lehtonen, Mauro Biffoni, Julia Casado, Sampsa Hautaniemi, Sampsa Hautaniemi / Principal Investigator, Research Program in Systems Oncology, Faculty of Medicine, Research Programs Unit, HUSLAB, Precision Cancer Pathology, Department of Pathology, Olli Mikael Carpen / Principal Investigator, and Bioinformatics
Subjects: Statistics and Probability, Source code, Perplexity, AcademicSubjects/SCI01060, Computer science, media_common.quotation_subject, education, computer.software_genre, Biochemistry, Upsampling, 03 medical and health sciences, 0302 clinical medicine, Quadratic equation, Molecular Biology, 030304 developmental biology, media_common, 0303 health sciences, Dimensionality reduction, 113 Computer and information sciences, Original Papers, Computer Science Applications, Computational Mathematics, Computational Theory and Mathematics, Rate of convergence, 030220 oncology & carcinogenesis, Embedding, Data mining, Data and Text Mining, computer, Software
Abstract: Motivation Non-parametric dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE), are the most frequently used methods in the exploratory analysis of single-cell datasets. Current implementations scale poorly to massive datasets and often require downsampling or interpolative approximations, which can leave less-frequent populations undiscovered and much information unexploited. Results We implemented a fast t-SNE package, qSNE, which uses a quasi-Newton optimizer, allowing quadratic convergence rate and automatic perplexity (level of detail) optimizer. Our results show that these improvements make qSNE significantly faster than regular t-SNE packages and enables full analysis of large datasets, such as mass cytometry data, without downsampling. Availability and implementation Source code and documentation are openly available at https://bitbucket.org/anthakki/qsne/. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2020

41. Dimensionality reduction and machine learning based model of software cost estimation.

Author: Zhang, Wei, Cheng, Haixin, Zhan, Siyu, Luo, Ming, Wang, Feng, Huang, Zhan, Bai, Changxin, and Hu, Dekun
Subjects: MACHINE learning, RANDOM forest algorithms, REGRESSION trees, COMPUTER software
Abstract: Software Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low prediction accuracy, poor robustness, and poor interpretability in SCE, this paper proposes a SCE model based on Autoencoder and Random Forest. First, preprocess the project data, remove outliers, and build regression trees to fill in missing attributes in the data. Second, construct a Autoencoder to reduce the dimensionality of factors that affect software cost. Subsequently, the performance of the model was trained and validated using the XGBoost framework on three datasets: COCOMO81, Albrecht, and Desharnais, and compared with common cost prediction models. The experimental results show that the MMRE, MdMRE, and PRED (0.25) values of the proposed model on the COCOMO81 dataset reached 0.21, 0.16, and 0.71, respectively. Compared with other models, the proposed model achieved significant improvements in accuracy and robustness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Two-Stage Dimensionality Reduction for Social Media Engagement Classification.

Author: Vieira Sobrinho, Jose Luis, Teles Vieira, Flavio Henrique, and Assis Cardoso, Alisson
Subjects: MACHINE learning, CLASSIFICATION, SOCIAL media
Abstract: The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper's foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Improved Fault Detection in Chemical Engineering Processes via Non-Parametric Kolmogorov–Smirnov-Based Monitoring Strategy.

Author: Kini, K. Ramakrishna, Madakyaru, Muddu, Harrou, Fouzi, Menon, Mukund Kumar, and Sun, Ying
Subjects: CHEMICAL engineering, CHEMICAL processes, CHEMICAL engineers, TUBULAR reactors, FEATURE extraction, PRINCIPAL components analysis
Abstract: Fault detection is crucial in maintaining reliability, safety, and consistent product quality in chemical engineering processes. Accurate fault detection allows for identifying anomalies, signaling deviations from the system's nominal behavior, ensuring the system operates within desired performance parameters, and minimizing potential losses. This paper presents a novel semi-supervised data-based monitoring technique for fault detection in multivariate processes. To this end, the proposed approach merges the capabilities of Principal Component Analysis (PCA) for dimensionality reduction and feature extraction with the Kolmogorov–Smirnov (KS)-based scheme for fault detection. The KS indicator is computed between the two distributions in a moving window of fixed length, allowing it to capture sensitive details that enhance the detection of faults. Moreover, no labeling is required when using this fault detection approach, making it flexible in practice. The performance of the proposed PCA–KS strategy is assessed for different sensor faults on benchmark processes, specifically the Plug Flow Reactor (PFR) process and the benchmark Tennessee Eastman (TE) process. Different sensor faults, including bias, intermittent, and aging faults, are considered in this study to evaluate the proposed fault detection scheme. The results demonstrate that the proposed approach surpasses traditional PCA-based methods. Specifically, when applied to PFR data, it achieves a high average detection rate of 98.31% and a low false alarm rate of 0.25%. Similarly, when applied to the TE process, it provides a good average detection rate of 97.27% and a false alarm rate of 6.32%. These results underscore the efficacy of the proposed PCA–KS approach in enhancing the fault detection of high-dimensional processes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Fusion of Coherent and Non-Coherent Pol-SAR Features for Land Cover Classification.

Author: Karachristos, Konstantinos, Koukiou, Georgia, and Anastassopoulos, Vassilis
Subjects: FISHER discriminant analysis, SURFACE of the earth, LAND cover, REMOTE sensing, MULTISENSOR data fusion
Abstract: Remote Sensing plays a fundamental role in acquiring crucial information about the Earth's surface from a distance, especially through fully polarimetric data, which offers a rich source of information for diverse applications. However, extracting meaningful insights from this intricate data necessitates sophisticated techniques. In addressing this challenge, one predominant trend that has emerged is known as target decomposition techniques. These techniques can be broadly classified into coherent and non-coherent methods. Each of these methods provides high-quality information using different procedures. In this context, this paper introduces innovative feature fusion techniques, amalgamating coherent and non-coherent information. While coherent techniques excel in detailed exploration and specific feature extraction, non-coherent methods offer a broader perspective. Our feature fusion techniques aim to harness the strengths of both approaches, providing a comprehensive and high-quality fusion of information. In the first approach, features derived from Pauli coherent decomposition, Freeman–Durden non-coherent technique, and the Symmetry criterion from Cameron's stepwise algorithm are combined to construct a sophisticated feature vector. This fusion is achieved using the well-established Fisher Linear Discriminant Analysis algorithm. In the second approach, the Symmetry criterion serves as the basis for fusing coherent and non-coherent coefficients, resulting in the creation of a new feature vector. Both approaches aim to exploit information simultaneously extracted from coherent and non-coherent methods in feature extraction from Remote Sensing data through fusion at the feature level. To evaluate the effectiveness of the feature generated by the proposed fusion techniques, we employ a land cover classification procedure. This involves utilizing a basic classifier, achieving overall accuracies of approximately 82% and 86% for each of the two proposed techniques. Furthermore, the accuracy in individual classes surpasses 92%. The evaluation aims to gauge the effectiveness of the fusion methods in enhancing feature extraction from fully polarimetric data and opens avenues for further exploration in the integration of coherent and non-coherent features for remote sensing applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Probabilistic prediction of wind power based on improved Bayesian neural network.

Author: Zhiguang Deng, Xu Zhang, Zhengming Li, Jinghua Yang, Xin Lv, Qian Wu, Biwei Zhu, Hittawe, Mazen, and Dorbane, Abdelhakim
Subjects: WIND power, BAYESIAN analysis, FORECASTING, TRANSFORMER models, ENTROPY (Information theory)
Abstract: Deterministic wind power prediction can be used for long time-scale optimization of power dispatching systems, but the probability and fluctuation range of prediction results cannot be calculated. A Bayesian LSTM neural network (BNN-LSTM) is constructed based on Bayesian networks by placing a priori distributions on top of the LSTM network layer weight parameters. First, the temporal convolutional neural network (TCNN) is used to process the historical time-series data for wind power prediction, which is used to extract the correlation features of the time-series data and learn the trend changes of the time-series data. Then, the mutual information entropy method is used to analyze the meteorological dataset of wind power, which is used to eliminate the variables with small correlation and reduce the dimension of the meteorological dataset, so as to simplify the overall structure of the prediction model. At the same time, the Embedding structure is used to learn the temporal classification features of wind power. Finally, the time series data processed by TCNN, the meteorological data after dimensionality reduction, and the time classification feature data are fed into the BNN-LSTM prediction model together. Compared with a Bayesian neural network, continuous interval method, and Temporal Fusion Transformer (TFT), which is one of the most advanced time series prediction networks, the improved BNN-LSTM can respond more accurately to wind power fluctuations with better prediction results. The comprehensive index of probability prediction of pinball loss is smaller than those of the other three methods by 53.2%, 24.4%, and 11.3%, and the Winkler index is 3.5 %, 34.6 %, and 8.2 % smaller, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Context-flexible cartography with Siamese topological neural networks.

Author: Hartono, Pitoyo
Subjects: CARTOGRAPHY, SELF-organizing maps
Abstract: Cartography is a technique for creating maps, which are graphical representations of spatial information. Traditional cartography involves the creation of geographical data, such as locations of countries, geographical features of mountains, rivers, and oceans, and celestial objects. However, cartography has recently been utilized to display various data, such as antigenic signatures, graphically. Hence, it is natural to consider a new cartography that can flexibly deal with various data types. This study proposes a model of Siamese topological neural networks consisting of a pair of hierarchical neural networks, each with a low-dimensional internal layer for creating context-flexible maps. The proposed Siamese topological neural network transfers high-dimensional data with various contexts into their low-dimensional spatial representations on a map that humans can use to gain insights from the data. Here, it is enough to define a metric of difference between an arbitrary pair of data instances for training the proposed neural network. As the metric can be arbitrarily defined, the proposed neural network realizes context-flexible cartography useful for visual data analysis. This paper applies the proposed network for visualizing various demographic data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Nonlinear dimensionality reduction based visualization of single-cell RNA sequencing data.

Author: Yousuff, Mohamed, Babu, Rajasekhara, and Rathinam, Anand
Subjects: RNA sequencing, NUCLEOTIDE sequencing, CYTOLOGY, ORGANS (Anatomy), DATA visualization
Abstract: Single-cell multi-omics technology has catalyzed a transformative shift in contemporary cell biology, illuminating the nuanced relationship between genotype and phenotype. This paradigm shift hinges on the understanding that while genomic structures remain uniform across cells within an organism, the expression patterns dictate physiological traits. Leveraging high throughput sequencing, single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool, enabling comprehensive transcriptomic analysis at unprecedented resolution. This paper navigates through a landscape of dimensionality reduction techniques essential for distilling meaningful insights from the scRNA-seq datasets. Notably, while foundational, Principal Component Analysis may fall short of capturing the intricacies of diverse cell types. In response, nonlinear techniques have garnered traction, offering a more nuanced portrayal of cellular relationships. Among these, Pairwise Controlled Manifold Approximation Projection (PaCMAP) stands out for its capacity to preserve local and global structures. We present an augmented iteration, Compactness Preservation Pairwise Controlled Manifold Approximation Projection (CP-PaCMAP), a novel advancement for scRNA-seq data visualization. Employing benchmark datasets from critical human organs, we demonstrate the superior efficacy of CP-PaCMAP in preserving compactness, offering a pivotal breakthrough for enhanced classification and clustering in scRNA-seq analysis. A comprehensive suite of metrics, including Trustworthiness, Continuity, Mathew Correlation Coefficient, and Mantel test, collectively validate the fidelity and utility of proposed and existing techniques. These metrics provide a multi-dimensional evaluation, elucidating the performance of CP-PaCMAP compared to other dimensionality reduction techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Geometric multidimensional scaling: efficient approach for data dimensionality reduction.

Author: Dzemyda, Gintautas and Sabaliauskas, Martynas
Subjects: MULTIDIMENSIONAL scaling, DATA reduction, GLOBAL optimization, PARALLEL algorithms
Abstract: Multidimensional scaling (MDS) is an often-used method to reduce the dimensionality of multidimensional data nonlinearly and to present the data visually. MDS minimizes some stress function which variables are coordinates of points in the projected lower-dimensional space. Recently, the so-called Geometric MDS has been developed, where the stress function and multidimensional scaling, in general, are considered from the geometric point of view. Using ideas of Geometric MDS, it is possible to construct the iterative procedure of minimization of the stress where coordinates of a separate point of the projected space are moved to the new position defined analytically. In this paper, we discover and prove the main advantage of Geometric MDS theoretically: changing the position of all the points of the projected space simultaneously (independently of each other) in the directions and with steps, defined analytically by Geometric MDS strategy for a separate point, decreases the MDS stress. Moreover, the analytical updating of coordinates of projected points in each iteration has a simple geometric interpretation. New properties of Geometric MDS have been discovered. The obtained results allow us for the future development of a class of new both sequential and parallel algorithms. Ideas for global optimization of the stress are highlighted. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Dimensionality Reduction for Hierarchical Multi-Label Classification: A Systematic Mapping Study.

Author: Osvaldo Vieira, Raimundo and Bronoski Borges, Helyane
Abstract: Hierarchical multi-label classification problems typically deal with datasets with many attributes and labels, which can negatively impact the classifier performance. The application of dimensionality reduction methods can significantly improve the performance of classifiers. Dimensionality reduction can be performed by feature extraction or feature selection, according to the problem domain and datasets characteristics. This work carried out a systematic literature mapping to identify the approaches and techniques of dimensionality reduction that have been used in hierarchical multi-label classification tasks. Searches were performed on 7 important databases for the Computer Science field. From a list of 184 retrieved papers, 12 were selected for analysis, from which it was possible to determine a general overview of studies conducted from 2010 to 2022. It was identified that feature selection was the most frequent reduction method, with filter approach standing out. In addition, it was detected that most of the works used tree hierarchical structure. As its main outcome, this paper presents the state of the art of dimensionality reduction problem for hierarchical multi-label classification, indicating trends and research issues in the field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Research on folding diversity in statistical learning methods for RNA secondary structure prediction

Author: Yi-Ping Phoebe Chen, YiZhou Li, Yu Zhu, ZhaoYang Xie, and Min Zhu
Subjects: 0301 basic medicine, RNA Folding, Computer science, Machine learning, computer.software_genre, Applied Microbiology and Biotechnology, Nucleic acid secondary structure, 03 medical and health sciences, RNA secondary structure prediction, Cluster analysis, Molecular Biology, Ecology, Evolution, Behavior and Systematics, Selection (genetic algorithm), Uncategorized, Structure (mathematical logic), Models, Statistical, business.industry, Dimensionality reduction, RNA, stochastic context-free grammar, Cell Biology, Folding (DSP implementation), statistical learning model, folding diversity, 030104 developmental biology, Stochastic context-free grammar, Nucleic Acid Conformation, Artificial intelligence, business, computer, Algorithms, Software, Developmental Biology, Research Paper
Abstract: How to improve the prediction accuracy of RNA secondary structure is currently a hot topic. The existing prediction methods for a single sequence do not fully consider the folding diversity which may occur among RNAs with different functions or sources. This paper explores the relationship between folding diversity and prediction accuracy, and puts forward a new method to improve the prediction accuracy of RNA secondary structure. Our research investigates the following: 1. The folding feature based on stochastic context-free grammar is proposed. By using dimension reduction and clustering techniques, some public data sets are analyzed. The results show that there is significant folding diversity among different RNA families. 2. To assign folding rules to RNAs without structural information, a classification method based on production probability is proposed. The experimental results show that the classification method proposed in this paper can effectively classify the RNAs of unknown structure. 3. Based on the existing prediction methods of statistical learning models, an RNA secondary structure prediction framework is proposed, namely "Cluster - Training - Parameter Selection - Prediction". The results show that, with information on folding diversity, prediction accuracy can be significantly improved.
Published: 2017

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

6,982 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources