Database: Complementary Index / Publication Year Range: Last 10 years / Topic: data reduction - Searchworks@Jio Institute Digital Library Search Results

Showing total 1,233 results

Start Over Topic data reduction Publication Year Range Last 10 years Database Complementary Index

1,233 results

1. Application of attenuated total reflectance—Fourier transform infrared spectroscopy—(ATR-FTIR) and principal component analysis (PCA) in identification of copying pencils on different paper substrates.

Author: Grzelec, Małgorzata
Subjects: GENTIAN violet, PRINCIPAL components analysis, METHYLENE blue, DATA reduction, PENCILS
Abstract: Presence of copying pencils in heritage objects poses a significant challenge for conservators due to their proneness to fading, sensitivity to solvents and difficulties in differentiation from regular graphite pencils. In this paper a method of copying pencils identification by means of ATR—FTIR spectroscopy is used. A protocol for spectra processing and dimensionality reduction of spectral data by means of principal component analysis has been developed, allowing for pencil types differentiation and providing an easy to read visual representation of the collected data. The protocol has been developed on mock-up samples and tested on objects from the archives of the State Museum Auschwitz – Birkenau. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Factors Forming Community Resilience Affected By Floods.

Author: Suhaeb, Firdaus W., Rasyid, Sri Jayanti, Wahda, Muhammad Aksha, Ramli, Mauliadi, and Kaseng, Ernawati S.
Subjects: HAZARD mitigation, FLOODS, RIVER conservation, DATA reduction, CONFERENCE papers, JOB stress
Abstract: This Conference Paper uses descriptive-qualitative research to describe and analyze factors forming community resilience affected by floods. The determination of informants is determined deliberately on flood-affected communities using criteria according to the purpose of the study. Primary data were obtained from in-depth observations and interviews, while secondary data were obtained from library sources and relevant data. The data collection techniques used were observation, interviews, and documentation, while the data were analyzed using descriptive-qualitative analysis through several stages, namely data reduction, data presentation, and conclusions. The results of this study show that factors forming the resilience of the community to flood disasters are: (1) Value factors that have existed in the community for many years in the flooded land, namely mutual assistance; (2) Economic factors, finding alternative jobs or coping strategies by flood-affected communities; (3) Social factors, namely knowledge and skills to adapt to flood disasters through non-formal training and counselling on disaster and disaster mitigation from their experience or obtained through social media and mass media, such as television and radio; (4) Institutional factors, namely socialization of early flood warning, socialization of flood disaster mitigation, appeals for the prohibition of throwing garbage in rivers and essential food assistance before and during the occurrence of flood by the relevant government; and (5) Infrastructure factors that include the construction of facilities and infrastructure such as river dredging, drainage construction, and river cliff protection walls. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. A low‐cost method for testing and analyzing the cervical range of motion.

Author: Zhang, Xun, Xu, Guanghua, Li, Zejin, Teng, Zhicheng, Zhang, Xin, and Zhang, Sicong
Subjects: RANGE of motion of joints, ROTATIONAL motion, TEST methods, SPONDYLOSIS, DECOMPOSITION method, DATA reduction
Abstract: Measurement of the cervical range of motion (CROM) is significant for early diagnosis of cervical spondylosis and determination of the severity of the disease. Aiming at the problem of convenient and continuous measurement of CROM and the reduction of the impact of data fluctuations, this paper proposes a low‐cost method for testing and analyzing CROM. This paper analyzes the correspondence between smartphone orientation sensor angle and CROM, obtains smartphone orientation sensing data, and uses the extreme‐point symmetric mode decomposition method by calculating the energy of the difference value to extract the adaptive global mean curve of the CROM data, and calculates the cervical ROM. The statistical analysis of the CROM test results is carried out. Results show that the method proposed can obtain the CROM including flexion and extension, lateral flexion, and rotation in a single measurement, which has the advantages of continuous measurement and low cost. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Energy function-based and norm-free event-triggering for scheduling control data transmissions.

Author: Kurtoglu, Deniz, Yucelen, Tansel, and Muse, Jonathan A.
Subjects: DATA transmission systems, STATE feedback (Feedback control systems), ENERGY function, SCHEDULING, DATA reduction
Abstract: This paper studies the problem of scheduling control data transmissions from embedded processors to physical systems. For this problem, we propose novel state feedback and output feedback control architectures that are predicated on energy function-based and norm-free event-triggering conditions, where the embedded processor broadcasts a sampled data of its control signal value through a zero-order-hold operator to the physical system when the left side of the event-triggering condition equals to its right side. In this context, the energy function-based feature means that the right sides of the proposed event-triggering conditions involve an energy function as well as its time-derivative to make the selection of these right sides user-adjustable. Furthermore, the norm-free feature means that the left sides of the proposed event-triggering conditions do not depend on signal norms to allow for better control data transmission reduction. System-theoretical analyses of our event-triggered state feedback and output feedback control architectures are shown using the same energy functions and their time-derivatives used in the proposed event-triggering conditions, where illustrative numerical examples are also presented to demonstrate the efficacy of our contributions. To the best of our knowledge, the results given in this paper explore how event-triggering conditions are linked not only to energy functions but also to their time-derivatives for the first time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Low-Power Preprocessing System at MCU-Based Application Nodes for Reducing Data Transmission.

Author: Kim, Donguk, Roh, Chanhwi, Baek, Donkyu, and Choi, Seong-gon
Subjects: COMPUTER hardware description languages, DATA transmission systems, LOGIC design, EDGE computing, DATA reduction
Abstract: Edge computing enables prompt responses in IoT environments, such as the operation of autonomous vehicles and unmanned aerial vehicles. However, with the increase in sensor nodes, the computational burden on the computing node also increases. Specifically, data filtering and reduction at application nodes add to the energy burden for battery-operated devices. In this paper, we propose a preprocessing system at the application node that requires low power consumption for data transmission reduction. Based on our simulations, we identify the minimum data size needed to preserve the signal. We first design the preprocessing system using a hardware description language to evaluate its performance. Then, we implement the open-library-based MCU system, including the proposed preprocessing IP, to assess its operation and overhead. Our implementation of the preprocessing system reduces data transmission by 50% with acceptable information loss. Additionally, the area and power consumption after the logic synthesis of the preprocessing IP within the entire MCU system are evaluated at only 3.6% and 13.1%, respectively. By performing preprocessing using the MCU and proposed IP, nearly 74.4% power reduction is achieved compared to using the existing MCU core. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. A novel ensemble deep reinforcement learning model for short‐term load forecasting based on Q‐learning dynamic model selection.

Author: He, Xin, Zhao, Wenlu, Zhang, Licheng, Zhang, Qiushi, and Li, Xinyu
Subjects: REINFORCEMENT learning, LOAD forecasting (Electric power systems), RECURRENT neural networks, ARTIFICIAL intelligence, DATA reduction
Abstract: Short‐term load forecasting is critical for power system planning and operations, and ensemble forecasting methods for electricity loads have been shown to be effective in obtaining accurate forecasts. However, the weights in ensemble prediction models are usually preset based on the overall performance after training, which prevents the model from adapting in the face of different scenarios, limiting the improvement of prediction performance. In order to improve the accurateness and validity of the ensemble prediction method further, this paper proposes an ensemble deep reinforcement learning approach using Q‐learning dynamic weight assignment to consider local behaviours caused by changes in the external environment. Firstly, the variational mode decomposition is used to reduce the non‐stationarity of the original data by decomposing the load sequence. Then, the recurrent neural network (RNN), long short‐term memory (LSTM), and gated recurrent unit (GRU) are selected as the basic power load predictors. Finally, the optimal weights are ensembled for the three sub‐predictors by the optimal weights generated using the Q‐learning algorithm, and the final results are obtained by combining their respective predictions. The results show that the forecasting capability of the proposed method outperforms all sub‐models and several baseline ensemble forecasting methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Online evolution of a phased array for ultrasonic imaging by a novel adaptive data acquisition method.

Author: Lukacs, Peter, Stratoudaki, Theodosia, Davis, Geo, and Gachagan, Anthony
Subjects: PHASED array antennas, ULTRASONIC arrays, ACQUISITION of data, ULTRASONIC imaging, DATA reduction, TIME measurements
Abstract: Ultrasonic imaging, using ultrasonic phased arrays, has an enormous impact in science, medicine and society and is a widely used modality in many application fields. The maximum amount of information which can be captured by an array is provided by the data acquisition method capturing the complete data set of signals from all possible combinations of ultrasonic generation and detection elements of a dense array. However, capturing this complete data set requires long data acquisition time, large number of array elements and transmit channels and produces a large volume of data. All these reasons make such data acquisition unfeasible due to the existing phased array technology or non-applicable to cases requiring fast measurement time. This paper introduces the concept of an adaptive data acquisition process, the Selective Matrix Capture (SMC), which can adapt, dynamically, to specific imaging requirements for efficient ultrasonic imaging. SMC is realised experimentally using Laser Induced Phased Arrays (LIPAs), that use lasers to generate and detect ultrasound. The flexibility and reconfigurability of LIPAs enable the evolution of the array configuration, on-the-fly. The SMC methodology consists of two stages: a stage for detecting and localising regions of interest, by means of iteratively synthesising a sparse array, and a second stage for array optimisation to the region of interest. The delay-and-sum is used as the imaging algorithm and the experimental results are compared to images produced using the complete generation-detection data set. It is shown that SMC, without a priori knowledge of the test sample, is able to achieve comparable results, while preforming ∼ 10 times faster data acquisition and achieving ∼ 10 times reduction in data size. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. The Spanish CMS Analysis Facility at CIEMAT.

Author: Cárdenas-Montes, M., Delgado Peris, A., Flix, J., Hernández, J.M., León Holgado, J., Morcillo Pérez, C., Pérez-Calero Yzquierdo, A., and Rodríguez Calonge, F.J.
Subjects: LARGE Hadron Collider, COMPACT muon solenoid experiment, DATA reduction, GRID computing, DATA analysis
Abstract: The increasingly larger data volumes that the LHC experiments will accumulate in the coming years, especially in the High-Luminosity LHC era, call for a paradigm shift in the way experimental datasets are accessed and analyzed. The current model, based on data reduction on the Grid infrastructure, followed by interactive data analysis of manageable size samples on the physicists' individual computers, will be superseded by the adoption of Analysis Facilities. This rapidly evolving concept is converging to include dedicated hardware infrastructures and computing services optimized for the effective analysis of large HEP data samples. This paper describes the actual implementation of this new analysis facility model at the CIEMAT institute, in Spain, to support the local CMS experiment community. Our work details the deployment of dedicated highly performant hardware, the operation of data staging and caching services ensuring prompt and efficient access to CMS physics analysis datasets, and the integration and optimization of a custom analysis framework based on ROOT's RDataFrame and CMS NanoAOD format. Finally, performance results obtained by benchmarking the deployed infrastructure and software against a CMS analysis workflow are summarized. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Methods for motion artifact reduction in online brain-computer interface experiments: a systematic review.

Author: Schmoigl-Tonis, Mathias, Schranz, Christoph, and Müller-Putz, Gernot R.
Subjects: BRAIN-computer interfaces, OPEN access publishing, EVIDENCE gaps, VIRTUAL communities, ELECTROMAGNETIC induction, DATA reduction
Abstract: Brain-computer interfaces (BCIs) have emerged as a promising technology for enhancing communication between the human brain and external devices. Electroencephalography (EEG) is particularly promising in this regard because it has high temporal resolution and can be easily worn on the head in everyday life. However, motion artifacts caused by muscle activity, fasciculation, cable swings, or magnetic induction pose significant challenges in real-world BCI applications. In this paper, we present a systematic review of methods for motion artifact reduction in online BCI experiments. Using the PRISMA filter method, we conducted a comprehensive literature search on PubMed, focusing on open access publications from 1966 to 2022. We evaluated 2,333 publications based on predefined filtering rules to identify existing methods and pipelines for motion artifact reduction in EEG data. We present a lookup table of all papers that passed the defined filters, all used methods, and pipelines and compare their overall performance and suitability for online BCI experiments. We summarize suitable methods, algorithms, and concepts for motion artifact reduction in online BCI applications, highlight potential research gaps, and discuss existing community consensus. This review aims to provide a comprehensive overview of the current state of the field and guide researchers in selecting appropriate methods for motion artifact reduction in online BCI experiments. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. Development of Basic Knowledge Construction Technique to Reduce The Volume of High-Dimensional Big Data.

Author: Karya, Gede, Sitohang, Benhard, Akbar, Saiful, and Moertini, Veronica S.
Subjects: INFORMATION & communication technologies, EXTRACTION techniques, DATA warehousing, DATA reduction, HIGH technology, BIG data
Abstract: Big data is growing fast following Big data has the characteristics of high volume, speed, and variety (3v) and continues to grow exponentially following the development of the world's use of information and communication technology. The main problem with big data is the data deluge. The need for technology and large data storage and processing methods to keep pace with the exponential data growth rate is potentially limitless, giving rise to the problem of exponentially increasing technology requirements as well. The weakness of previous big data analysis approaches (batch and online real time processing) is that it requires high technology (large storage, memory and processing). This paper proposes a new approach in big-data analysis by separating the basic knowledge (BK) construction process from original data with a much smaller volume (volume reduction), then analyzing it into final knowledge, thus requiring smaller/simpler analysis technology. The proposals include formulating the definition and representation of BK, developing methods for constructing BK from source data, and analyzing BK into final knowledge. We propose a BK construction method based on a knowledge extraction technique using BIRCH clustering algorithm for instance reduction, and handling high-dimensional problems by parallelizing the dimension calculation process to calculate the distance between instances. We use the Adjusted Rand Index (ARI) to measure the similarity of the final knowledge of the baseline and proposed methods. First, the BIRCH baseline was modified by parallelizing the calculations has succeeded in increasing speed from 17% to 25%. Next, the parallel BIRCH (PBIRCH) baseline was broken into BK construction and BK analysis, has succeeded in reducing volume by 96% or more and increasing speed by 43.50%, with similar final knowledge results (ARI=1). Based on these results, we conclude that the BK construction method and analysis from BK into final knowledge for highdimensional big data have significantly reduced volume and speed up the analytical process without reducing the quality of the final knowledge. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Acoustic logging array signal denoising using U-net and a case study in a TangGu oil field.

Author: Fu, Xin, Gou, Yang, and Wei, Fuqiang
Subjects: SIGNAL denoising, CONVOLUTIONAL neural networks, ARTIFICIAL neural networks, NOISE control, OIL fields, ACOUSTIC emission testing, DATA reduction
Abstract: This study developed a noise-reduction method for acoustic logging array signals using a deep neural network algorithm in the time-frequency domain. Initially, we derived analytical solutions for the received waveforms when the acoustic logging tool was positioned either at the centre or eccentrically within the borehole. To simulate the received waveforms across various formations, we developed a real-axis integration algorithm. Subsequently, we devised a noise-reduction algorithm workflow based on a convolutional neural network and configured the structure and parameters of the U-net using TensorFlow. To address the scarcity of open datasets, we established both signal and noise datasets. The signal dataset was generated using theoretical simulation encompassing various model parameters, while the noise dataset was collected during tool testing and downhole operations. The trained model demonstrated substantial noise-reduction capabilities during validation. To validate the effectiveness of the algorithm, we applied noise reduction to actual data collected during downhole operations in a TangGu oil field, yielding impressive results across different types of noisy data. Therefore, the U-net-based time-domain noise-reduction algorithm proposed in this paper holds the potential to significantly improve the quality of acoustic logging array signals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. ROME/REA: Three-year, Tri-color Timeseries Photometry of the Galactic Bulge.

Author: Street, R. A., Bachelet, E., Tsapras, Y., Hundertmark, M. P. G., Bozza, V., Bramich, D. M., Cassan, A., Dominik, M., Figuera Jaimes, R., Horne, K., Mao, S., Saha, A., Wambsganss, J., and Zang, Weicheng
Subjects: PHOTOMETRY, GALACTIC bulges, VERY large array telescopes, DATA release, IMAGE analysis, DATA reduction
Abstract: The Robotic Observations of Microlensing Events/Reactive Event Assessment Survey was a Key Project at Las Cumbres Observatory (hereafter LCO) which continuously monitored 20 selected fields (3.76 sq.deg) in the Galactic Bulge throughout their seasonal visibility window over a three-year period, between 2017 March and 2020 March. Observations were made in three optical passbands (SDSS − g ′ , − r ′ , − i ′ ), and LCO's multi-site telescope network enabled the survey to achieve a typical cadence of ∼10 hr in i ′ and ∼15 hr in g ′ and r ′ . In addition, intervals of higher cadence (<1 hr) data were obtained during monitoring of key microlensing events within the fields. This paper describes the Difference Image Analysis data reduction pipeline developed to process these data, and the process for combining the photometry from LCO's three observing sites in the Southern Hemisphere. The full timeseries photometry for all ∼8 million stars, down to a limiting magnitude of i ∼ 18 mag is provided in the data release accompanying this paper, and samples of the data are presented for exemplar microlensing events, illustrating how the tri-band data are used to derive constraints on the microlensing source star parameters, a necessary step in determining the physical properties of the lensing object. The timeseries data also enables a wealth of additional science, for example in characterizing long-timescale stellar variability, and a few examples of the data for known variables are presented. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Tensor eigenvectors for projection pursuit.

Author: Loperfido, Nicola
Abstract: Tensor eigenvectors naturally generalize matrix eigenvectors to multi-way arrays: eigenvectors of symmetric tensors of order k and dimension p are stationary points of polynomials of degree k in p variables on the unit sphere. Dominant eigenvectors of symmetric tensors maximize polynomials in several variables on the unit sphere, while base eigenvectors are roots of polynomials in several variables. In this paper, we focus on skewness-based projection pursuit and on third-order tensor eigenvectors, which provide the simplest, yet relevant connections between tensor eigenvectors and projection pursuit. Skewness-based projection pursuit finds interesting data projections using the dominant eigenvector of the sample third standardized cumulant to maximize skewness. Skewness-based projection pursuit also uses base eigenvectors of the sample third cumulant to remove skewness and facilitate the search for interesting data features other than skewness. Our contribution to the literature on tensor eigenvectors and on projection pursuit is twofold. Firstly, we show how skewness-based projection pursuit might be helpful in sequential cluster detection. Secondly, we show some asymptotic results regarding both dominant and base tensor eigenvectors of sample third cumulants. The practical relevance of the theoretical results is assessed with six well-known data sets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques.

Author: Folino, Francesco, Folino, Gianluigi, Pisani, Francesco Sergio, Pontieri, Luigi, and Sabatino, Pietro
Subjects: FEDERATED learning, DATA reduction, DEEP learning, INTERNET security
Abstract: In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is proposed for the federated learning and application of a distributed classification model in domains (like cybersecurity and healthcare) where different parties of the federation store different subsets of features for a number of data instances. The framework is designed to limit the risk of information leakage and computation/communication costs in both model training (through data sampling) and application (leveraging the conditional-computation abilities of sparse MoEs). Experiments on real data have shown the proposed approach to ensure a better balance between efficiency and model accuracy, compared to other VFL-based solutions. Notably, in a real-life cybersecurity case study focused on malware classification (the KronoDroid dataset), the proposed method surpasses competitors even though it utilizes only 50% and 75% of the training set, which is fully utilized by the other approaches in the competition. This method achieves reductions in the rate of false positives by 16.9% and 18.2%, respectively, and also delivers satisfactory results on the other evaluation metrics. These results showcase our framework's potential to significantly enhance cybersecurity threat detection and prevention in a collaborative yet secure manner. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. The Research on Deep Learning-Driven Dimensionality Reduction and Strain Prediction Techniques Based on Flight Parameter Data.

Author: Huang, Wenbo, Wang, Rui, Zhang, Mengchuang, and Yin, Zhiping
Subjects: DEEP learning, STRUCTURAL health monitoring, DATA reduction, FORECASTING
Abstract: Loads and strains in critical areas play a crucial role in aircraft structural health monitoring, the tracking of individual aircraft lifespans, and the compilation of load spectra. Direct measurement of actual flight loads presents challenges. This process typically involves using load-strain stiffness matrices, derived from ground calibration tests, to map measured flight parameters to loads at critical locations. Presently, deep learning neural network methods are rapidly developing, offering new perspectives for this task. This paper explores the potential of deep learning models in predicting flight parameter loads and strains, integrating the methods of flight parameter preprocessing techniques, flight maneuver recognition (FMR), virtual ground calibration tests for wings, dimensionality reduction of flight data through Autoencoder (AE) network models, and the application of Long Short-Term Memory (LSTM) network models to predict strains. These efforts contribute to the prediction of strains in critical areas based on flight parameters, thereby enabling real-time assessment of aircraft damage. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Multistage Compressor Design Based on Dimensional Zooming.

Author: Li, Hefei, Zheng, Qun, and Jiang, Bin
Subjects: COMPRESSOR performance, COMPRESSORS, DATA reduction
Abstract: This paper proposed an axial multistage compressor aerodynamic design methodology based on dimensional zooming. Two-dimensional design parameters are acquired through the dimensionality reduction of three-dimensional data to avoid dependence on the empirical loss models. The research on the zooming design method is conducted on a self-designed five-stage compressor. The method has considered the three-dimensional end wall viscosity loss and modified the blade profiles and blade stacking. It provided a path on how to improve the blade design based on computational fluid dynamics analysis. The aerodynamic performance, compressor characteristics of the prototype, and the zooming design are investigated and compared. The results show that the zooming design compressor performances are improved than those of the prototype one. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Prospective Elementary School Teachers Environmental Literacy: What, Why, and How?

Author: Suratmi, Suratmi, Supriatna, Nana, Sopandi, Wahyu, and Wulan, Ana Ratna
Subjects: ELEMENTARY school teachers, ENVIRONMENTAL literacy, DATA reduction, HEALTH literacy, REFERENCE sources
Abstract: Environmental literacy is one of the main subjects and themes in developing 21stcentury skills. Environmental literacy can be used as an effort to overcome problems related to environmental issues. This paper aims to provide an overview of the role of universities in producing prospective elementary school teachers who have good environmental literacy and are expected to be able to implement it in learning. For this purpose, there are 3 questions in this paper 1) What are the components of measuring environmental literacy from elementary school teacher's prospective? Why is it necessary to develop environmental literacy in elementary school teacher's prospective? How can environmental literacy be trained and improved? This study uses a descriptive method with a qualitative approach. Data collection techniques used were literature documents such as international articles, national articles, and relevant books. The data analysis technique includes four stages, namely (1) data collection, (2) data presentation, (3) data reduction, and data inventory (4) data conclusion. The results of this study are in the form of information on measuring components of environmental literacy for prospective elementary school teachers, reasons why it is necessary to develop literacy, and alternative learning models that can be used to instill environmental literacy for prospective elementary school teachers. The results of this study can be used as reference material in training and developing the environmental literacy of students, especially in universities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Analyzing Data Reduction Techniques: An Experimental Perspective.

Author: Fernandes, Vítor, Carvalho, Gonçalo, Pereira, Vasco, and Bernardino, Jorge
Subjects: DATA reduction, DIGITAL technology, ENERGY consumption, DATA compression, BIG data
Abstract: The exponential growth in data generation has become a ubiquitous phenomenon in today's rapidly growing digital technology. Technological advances and the number of connected devices are the main drivers of this expansion. However, the exponential growth of data presents challenges across different architectures, particularly in terms of inefficient energy consumption, suboptimal bandwidth utilization, and the rapid increase in data stored in cloud environments. Therefore, data reduction techniques are crucial to reduce the amount of data transferred and stored. This paper provides a comprehensive review of various data reduction techniques and introduces a taxonomy to classify these methods based on the type of data loss. The experiments conducted in this study include distinct data types, assessing the performance and applicability of these techniques across different datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Analysis of the adhesively bonded composite double cantilever beam specimen with emphasis on bondline constraint, adherend through-thickness flexibility and fracture process zone relative size.

Author: de Morais, A. B.
Subjects: ADHESIVE joints, COMPOSITE construction, DOUBLE bonds, CANTILEVERS, LAMINATED composite beams, FINITE element method, MATERIAL plasticity, DATA reduction
Abstract: The double cantilever beam (DCB) specimen is widely used to characterise the mode I fracture of adhesive joints. This paper analyses some particular characteristics of adhesively bonded composite DCB specimens which could affect test results. Three-dimensional (3D) and two-dimensional (2D) finite element analyses (FEA) were conducted in order to evaluate the effects bondline constraint and adherend through-thickness flexibility on the specimen response. Since beam theory-based data reduction schemes are widespread, beam models were also employed to analyze the effects of adherend through-thickness flexibility and fracture process zone relative size. It is shown that, although composite adherends are usually thinner and have much lower transverse moduli than metal adherends, the level of bondline constraint is similarly high. This may: limit the level of adhesive plastic deformations in the fracture process zone; generate high bondline tractions that increase the likelihood of interface failure and interlaminar damage in the composite adherends. The present analyses also show relevant effects of adherend through-thickness flexibility in the adhesive elastic loading stage. Finally, smaller fracture process zones relative to metal adherend DCB specimens were predicted by a beam cohesive zone model. This may explain lower fracture energy values reported with composite adherends in some studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Data reduction for SVM training using density-based border identification.

Author: Shalaby, Mohammed, Farouk, Mohamed, and Khater, Hatem A.
Subjects: DATA reduction, SUPPORT vector machines, QUADRATIC programming, DATA extraction, STATISTICAL decision making
Abstract: Numerous classification and regression problems have extensively used Support Vector Machines (SVMs). However, the SVM approach is less practical for large datasets because of its processing cost. This is primarily due to the requirement of optimizing a quadratic programming problem to determine the decision boundary during training. As a result, methods for selecting data instances that have a better likelihood of being chosen as support vectors by the SVM algorithm have been developed to help minimize the bulk of training data. This paper presents a density-based method, called Density-based Border Identification (DBI), in addition to four different variations of the method, for the lessening of the SVM training data through the extraction of a layer of border instances. For higher-dimensional datasets, the extraction is performed on lower-dimensional embeddings obtained by Uniform Manifold Approximation and Projection (UMAP), and the resulting subset can be repetitively used for SVM training in higher dimensions. Experimental findings on different datasets, such as Banana, USPS, and Adult9a, have shown that the best-performing variations of the proposed method effectively reduced the size of the training data and achieved acceptable training and prediction speedups while maintaining an adequate classification accuracy compared to training on the original dataset. These results, as well as comparisons to a selection of related state-of-the-art methods from the literature, such as Border Point extraction based on Locality-Sensitive Hashing (BPLSH), Clustering-Based Convex Hull (CBCH), and Shell Extraction (SE), suggest that our proposed methods are effective and potentially useful. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. SLM based Circular (6, 2) mapping scheme with improved SER performance for PAPR reduction in OCDM without side information.

Author: Singh, Mohit Kumar and Goel, Ashish
Subjects: RAYLEIGH fading channels, ADDITIVE white Gaussian noise channels, RADIO transmitter fading, DATA reduction, WIRELESS channels
Abstract: Like OFDM, OCDM signal also suffers from high PAPR. SLM is an attractive PAPR reduction method but it needs to transmit the information regarding phase sequence i.e. SI to the receiver, which results in reduction in SE and data rate. Various SLM based SI free and SI embedding schemes are available in the literature. In this paper, different possible quaternary to 8-QAM mapping schemes for SI free SLM based PAPR reduction in OCDM system are discussed. Moreover, a new mapping scheme to eliminate the SI requirement has also been presented in this manuscript for SLM based PAPR reduction in OCDM system. Proposed Circular (6, 2) mapping scheme doesn't require SI at the receiving end which results in increase in SE and data rate as compared to standard SLM technique. Random sequence with phase factors {1, −1} is used to generate the phase sequences for the proposed mapping scheme. Also, the analytical expression for SER of Circular (6, 2) mapping scheme over AWGN channel is derived. We have also evaluated the mathematical expression for SER of other possible quaternary to 8-QAM mapping schemes. Computer simulation using MATLAB are also carried out to investigate the performance of all these mapping schemes in terms of PAPR reduction performance as well as SER performance over AWGN and multipath Rayleigh fading channel. Proposed scheme achieves the same PAPR reduction and improved SER performance w.r.t. other six schemes taken into consideration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Measurement of Diffusion Coefficients in Binary Mixtures and Solutions by the Taylor Dispersion Method.

Author: Martin Trusler, J. P.
Subjects: DIFFUSION measurements, DIFFUSION coefficients, DISPERSION (Chemistry), BINARY mixtures, DATA reduction, ACQUISITION of data
Abstract: The theory and application of the Taylor Dispersion technique for measuring diffusion coefficients in binary systems is reviewed. The theory discussed in this paper includes both the ideal Taylor–Aris model and the estimation of corrections required to account for small deviations from this ideal associated with a practical apparatus. Based on the theoretical treatment, recommendations are given for the design of practical instruments together with suggestions for calibration, data acquisition and reduction, and the rigorous estimation of uncertainties. The analysis indicates that relative uncertainties on the order of 1% are achievable in practice. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Supervised maximum variance unfolding.

Author: Yang, Deliang and Qi, Hou-Duo
Subjects: MULTIDIMENSIONAL scaling, DATA structures, EUCLIDEAN distance, DATA reduction, DATA visualization
Abstract: Maximum Variance Unfolding (MVU) is among the first methods in nonlinear dimensionality reduction for data visualization and classification. It aims to preserve local data structure and in the meantime push the variance among data as big as possible. However, MVU in general remains a computationally challenging problem and this may explain why it is less popular than other leading methods such as Isomap and t-SNE. In this paper, based on a key observation that the structure-preserving term in MVU is actually the squared stress in Multi-Dimensional Scaling (MDS), we replace the term with the stress function from MDS, resulting in a model that is usable. The property of the usability guarantees the "crowding phenomenon" will not happen in the dimension reduced results. The new model also allows us to combine label information and hence we call it the supervised MVU (SMVU). We then develop a fast algorithm that is based on Euclidean distance matrix optimization. By making use of the majorization-mininmization technique, the algorithm at each iteration solves a number of one-dimensional optimization problems, each having a closed-form solution. This strategy significantly speeds up the computation. We demonstrate the advantage of SMVU on some standard data sets against a few leading algorithms including Isomap and t-SNE. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. A Globally Convergent Inertial First-Order Optimization Method for Multidimensional Scaling.

Author: Ram, Noga and Sabach, Shoham
Subjects: MULTIDIMENSIONAL scaling, NONSMOOTH optimization, DATA reduction, DATA visualization, ALGORITHMS
Abstract: Multidimensional scaling (MDS) is a popular tool for dimensionality reduction and data visualization. Given distances between data points and a target low-dimension, the MDS problem seeks to find a configuration of these points in the low-dimensional space, such that the inter-point distances are preserved as well as possible. We focus on the most common approach to formulate the MDS problem, known as stress minimization, which results in a challenging non-smooth and non-convex optimization problem. In this paper, we propose an inertial version of the well-known SMACOF Algorithm, which we call AI-SMACOF. This algorithm is proven to be globally convergent, and to the best of our knowledge this is the first result of this kind for algorithms aiming at solving the stress MDS minimization. In addition to the theoretical findings, numerical experiments provide another evidence for the superiority of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. A novel semi data dimension reduction type weighting scheme of the multi-model ensemble for accurate assessment of twenty-first century drought.

Author: Mukhtar, Alina, Ali, Zulfiqar, Nazeer, Amna, Dhahbi, Sami, Kartal, Veysi, and Deebani, Wejdan
Subjects: CLIMATE change models, DROUGHT management, MARKOV processes, TWENTY-first century, DATA reduction
Abstract: Accurately and reliably predicting droughts under multiple models of Global Climate Models (GCMs) is a challenging task. To address this challenge, the Multimodel Ensemble (MME) method has become a valuable tool for merging multiple models and producing more accurate forecasts. This paper aims to enhance drought monitoring modules for the twenty-first century using multiple GCMs. To achieve this goal, the research introduces a new weighing paradigm called the Multimodel Homo-min Pertinence-max Hybrid Weighted Average (MHmPmHWAR) for the accurate aggregation of multiple GCMs. Secondly, the research proposes a new drought index called the Condensed Multimodal Multi-Scalar Standardized Drought Index (CMMSDI). To assess the effectiveness of MHmPmHWAR, the research compared its findings with the Simple Model Average (SMA). In the application, eighteen different GCM models of the Coupled Model Intercomparison Project Phase 6 (CMIP6) were considered at thirty-two grid points of the Tibet Plateau region. Mann–Kendall (MK) test statistics and Steady States Probabilities (SSPs) of Markov chain were used to assess the long-term trend in drought and its classes. The analysis of trends indicated that the number of grid points demonstrating an upward trend was significantly greater than those displaying a downward trend in terms of spatial coverage, at a significance level of 0.05. When examining scenario SSP1-2.6, the probability of moderate wet and normal drought was greater in nearly all temporal scales than other categories. The outcomes of SSP2-4.5 demonstrated that the likelihoods of moderate drought and normal drought were higher than other classifications. Additionally, the results of SSP5-8.5 were comparable to those of SSP2-4.5, underscoring the importance of taking effective actions to alleviate drought impacts in the future. The results demonstrate the effectiveness of the MHmPmHWAR and CMMSDI approaches in predicting droughts under multiple GCMs, which can contribute to effective drought monitoring and management. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Demonstration of neutron time‐of‐flight diffraction with an event‐mode imaging detector.

Author: Jäger, Tim T., Losko, Adrian S., Wolfertz, Alexander, Schmidt, Søren, Bertelsen, Mads, Khaplanov, Anton, Agnew, Sean R., Funama, Fumiaki, Morgano, Manuel, Roth, Markus, Gochanour, Jason R., Long, Alexander M., Lutterotti, Luca, and Vogel, Sven C.
Subjects: NEUTRON counters, DATA acquisition systems, IMAGE converters, RIETVELD refinement, DATA reduction, SCINTILLATORS, NEUTRON diffraction
Abstract: Neutron diffraction beamlines have traditionally relied on deploying large detector arrays of 3He tubes or neutron‐sensitive scintillators coupled with photomultipliers to efficiently probe crystallographic and microstructure information of a given material. Given the large upfront cost of custom‐made data acquisition systems and the recent scarcity of 3He, new diffraction beamlines or upgrades to existing ones demand innovative approaches. This paper introduces a novel Timepix3‐based event‐mode imaging neutron diffraction detector system as well as first results of a silicon powder diffraction measurement made at the HIPPO neutron powder diffractometer at the Los Alamos Neutron Science Center. Notably, these initial measurements were conducted simultaneously with the 3He array on HIPPO, enabling direct comparison. Data reduction for this type of data was implemented in the MAUD code, enabling Rietveld analysis. Results from the Timepix3‐based setup and HIPPO were benchmarked against McStas simulations, showing good agreement for peak resolution. With further development, systems such as the one presented here may substantially reduce the cost of detector systems for new neutron instrumentation as well as for upgrades of existing beamlines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. A method for predicting photovoltaic output power based on PCC-GRA-PCA meteorological elements dimensionality reduction method.

Author: Yang, Lingsheng, Cui, Xiangyu, and Li, Wei
Subjects: DIMENSION reduction (Statistics), PEARSON correlation (Statistics), PRINCIPAL components analysis, PHOTOVOLTAIC power generation, DATA reduction, PHOTOVOLTAIC power systems
Abstract: Photovoltaic (PV) power generation forecasting models require a large amount of meteorological data, which may include irrelevant and redundant information. As the volume of data increases, the dataset is likely to contain a significant amount of irrelevant and redundant information. This paper proposes a method for reducing dimensionality based on PCC-GRA-PCA method, which aims to simplify the model and reduce computational complexity. Firstly, the dimension reduction method analyzes the feature importance of various meteorological elements by using Pearson Correlation Coefficient (PCC) and Grey Relation Analysis (GRA), which can achieve the preliminary dimension reduction of data by selecting the most relevant features. Next, the data is processed using Principal Component Analysis (PCA) to achieve a secondary dimension reduction of meteorological data through feature transformation. Finally, a photovoltaic power prediction model has been established using the OVMD-tSSA-LSSVM algorithm. After analysis, it was found that the prediction model showed improvements in R2, MAE, RMSE, and MAPE after PCC-GRA-PCA dimensionality reduction compared to the prediction model before dimensionality reduction, as well as the prediction model after LDA and PCA dimensionality reduction. This demonstrates the effectiveness of reducing data dimensionality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Asteroid spin and shape properties from Gaia DR3 photometry.

Author: Cellino, A., Tanga, P., Muinonen, K., and Mignard, F.
Subjects: SMALL solar system bodies, SOLAR system, DATA release, ELECTRONIC data processing, DATA reduction
Abstract: Context. The third data release of Gaia, in June 2022, included the first large sample of sparse photometric data for more than 150 000 Solar System objects (SSOs), mainly asteroids. Aims. The SSO photometric data can be processed to derive information on the physical properties for a large number of objects, including spin properties, surface photometric behaviour in a variety of illumination conditions, and overall shape. Methods. After selecting a set of 22 815 objects for which an adequate number of accurate photometric measurements had been obtained by Gaia, we applied the 'genetic' algorithm of photometric inversion developed by the Gaia Data Processing and Analysis Consortium to process SSO photometric data. Given the need to minimise the required data processing time, the algorithm was set to adopt a simple triaxial ellipsoid shape model. Results. Our results show that in spite of the limited variety of observing circumstances and the limited numbers of measurements per object at present (in the majority of cases no greater than 40 and still far from the number expected at the end of the mission of about 60–70), the proportion of correct determinations for the spin period among the observed targets is about 85%. This percentage is based on a comparison with reliable literature data following a moderate filtering procedure developed to remove dubious solutions. Conclusions. The analysis performed in this paper is important in the context of developing further improvements to the adopted data reduction procedure. This includes the possible development of better solution filtering procedures that take into account, for each object, the possible presence of multiple, equivalent spin period solutions that have not been systematically investigated in this preliminary application. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Implementation of the Directorate General of Treasury's Electronic-Office Online Application at the Makassar State Treasury Service Office.

Author: Nasrullah, Muh., Siraj, Muhammad Luthfi, Didin, and Wardah, Siti Syarifah Wafikah
Subjects: RESEARCH methodology, GOVERNMENT policy, OUTREACH programs, DATA reduction
Abstract: This study aims to find out and analyze the implementation of the Directorate General of Treasury application at the Makassar State Treasury Service Office from a communication perspective. This research method is a qualitative type with a policy approach to analyze government policies through data collection techniques of interviews, observation and documentation as well as analytical techniques used qualitative interactive models with four paths, namely data reduction, data presentation and drawing conclusions. The results showed that the implementation of the e-DJPb Online Application at the Makassar State Treasury Service Office was considered to be implemented quite effectively and efficiently. This is marked by the process of implementing inter-agency communication from the Directorate General of Taxes from the central to the regions which has been informed in stages regarding the regulations governing the implementation of the e-DJPb application which has been transformed into an online-based service. Apart from that the level of communication by the Makassar state treasury service office has also provided clear and transparent information to the public through regular outreach activities and this information has also been conveyed by local officials to the communities it serves. With the distribution of information carried out by local employees, it makes it easier for the community to do tax administration online and creates paper and cost efficiency so that the local government no longer spends the budget for administrative purposes that have been integrated into the online system. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

30. Dimensionality reduction model based on integer planning for the analysis of key indicators affecting life expectancy.

Author: Cui, Wei, Xu, Zhiqiang, and Mu, Ren
Subjects: LIFE expectancy, INTEGER programming, DATA reduction, DATA mining, DATA visualization, WORLD health
Abstract: Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance. Additionally, the interpretability of these models presents a persistent challenge. This paper proposes two innovative dimensionality reduction models based on integer programming (DRMBIP). These models assess compactness through the correlation of each indicator with its class center, while separation is evaluated by the correlation between different class centers. In contrast to DRMBIP-p, the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation. This study, getting data from the Global Health Observatory (GHO), investigates 141 indicators that influence life expectancy. The findings reveal that DRMBIP-p effectively reduces the dimensionality of data, ensuring compactness. It also maintains compatibility with other models. Additionally, DRMBIP-v finds the optimal result, showing exceptional separation. Visualization of the results reveals that all classes have a high compactness. The DRMBIP-p requires the input of the correlation threshold parameter, which plays a pivotal role in the effectiveness of the final dimensionality reduction results. In the DRMBIP-v, modifying the threshold parameter to variable potentially emphasizes either separation or compactness. This necessitates an artificial adjustment to the overflow component within the objective function. The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators. Validated by life expectancy data, this paper demonstrates potential to assist data miners with the reduction of data dimensions. To our knowledge, this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering. It not only has applications in life expectancy, but also has obvious advantages in data mining work that requires precise class centers. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

31. The principled principal: the case of Australian Steiner schools.

Author: Eacott, Scott
Subjects: SCHOOL administration, DATA reduction, LEGAL authorities, DECISION making
Abstract: Purpose: Steiner schools represent a natural experiment in the provision of schooling. With a history dating back more than 100 years, leadership, leaders and the principal do not sit easily with Steiner educators. The contemporary regulatory environment requires a "principal" or legal authority at the school-building level, creating a tension for Steiner schools. This makes Steiner schools an ideal case study for understanding the contemporary role of the principal. Design/methodology/approach: This paper is based on an interview-based study with 24 heads of Australian Steiner schools. Conducted on Microsoft Teams, all by the principal investigator, the interviews generated a 171,742-word corpus subjected to an inductive analytical approach. Data reduction led to four themes, and this paper focuses on one (principles not prescription) and its implications for the principalship and school governance. Findings: Embedding the principalship in a philosophy (or theory) of education re-couples school administration with schooling and bases decision-making in principles rather than individuals. It also alters the role of data and evidence from accountability to justifying principles. Research limitations/implications: Rather than a focus on individuals or roles, this paper argues that the underlying principles of organisational decision-making should be the central focus of research. Practical implications: Ensuring organisational coherence, by balancing the diversity of positions on core principles is the core task of the contemporary principal. Originality/value: Exploiting natural experiments in the provision of schooling makes it possible to argue for how schooling, and specifically the principalship, can be different. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. Low-Cost Data-Driven Robot Collision Localization Using a Sparse Modular Point Matrix.

Author: Lin, Haoyu, Quan, Pengkun, Liang, Zhuo, Wei, Dongbo, and Di, Shichun
Subjects: CONVOLUTIONAL neural networks, ELECTRIC charge, SUPPORT vector machines, MOBILE robots, FEATURE extraction, ROBOTS, DATA reduction
Abstract: In the context of automatic charging for electric vehicles, collision localization for the end-effector of robots not only serves as a crucial visual complement but also provides essential foundations for subsequent response design. In this scenario, data-driven collision localization methods are considered an ideal choice. However, due to the typically high demands on the data scale associated with such methods, they may significantly increase the construction cost of models. To mitigate this issue to some extent, in this paper, we propose a novel approach for robot collision localization based on a sparse modular point matrix (SMPM) in the context of automatic charging for electric vehicles. This method, building upon the use of collision point matrix templates, strategically introduces sparsity to the sub-regions of the templates, aiming to reduce the scale of data collection. Additionally, we delve into the exploration of data-driven models adapted to SMPMs. We design a feature extractor that combines a convolutional neural network (CNN) with an echo state network (ESN) to perform adaptive feature extraction on collision vibration signals. Simultaneously, by incorporating a support vector machine (SVM) as a classifier, the model is capable of accurately estimating the specific region in which the collision occurs. The experimental results demonstrate that the proposed collision localization method maintains a collision localization accuracy of 91.27% and a collision localization RMSE of 1.46 mm, despite a 48.15% reduction in data scale. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. DATA-DRIVEN CONSTRUCTION OF HIERARCHICAL MATRICES WITH NESTED BASES.

Author: DIFENG CAI, HUA HUANG, CHOW, EDMOND, and YUANZHE XI
Subjects: KERNEL functions, FAST multipole method, MATRICES (Mathematics), DATA reduction, COMPUTATIONAL complexity
Abstract: Hierarchical matrices provide a powerful representation for significantly reducing the computational complexity associated with dense kernel matrices. For example, the fast multipole method (FMM) and its variants are highly efficient when the kernel function is related to fundamental solutions of classical elliptic PDEs. For general kernel functions, interpolation-based methods are widely used for the efficient construction of hierarchical matrices. In this paper, we present a fast hierarchical data reduction (HiDR) procedure with O(n) complexity for the memoryefficient construction of hierarchical matrices with nested bases where n is the number of data points. HiDR aims to reduce the given data in a hierarchical way so as to obtain O(1) representations for all nearfield and farfield interactions. Based on HiDR, a linear complexity H² matrix construction algorithm is proposed. The use of data-driven methods enables better efficiency than other general-purpose methods and flexible computation without accessing the kernel function. Experiments demonstrate significantly improved memory efficiency of the proposed data-driven method compared to interpolation-based methods over a wide range of kernels. For the Coulomb kernel, the proposed general-purpose algorithm offers competitive performance compared to FMM and its variants, such as PVFMM. The data-driven approach not only works for general kernels but also leads to much smaller precomputation costs compared to PVFMM. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. A Filter-Based Improved Multi-Objective Equilibrium Optimizer for Single-Label and Multi-Label Feature Selection Problem.

Author: Wang, Wendong, Li, Yu, Liu, Jingsen, and Zhou, Huan
Subjects: FEATURE selection, DATA reduction, EQUILIBRIUM, BASE pairs, BIG data
Abstract: Effectively reducing the dimensionality of big data and retaining its key information has been a research challenge. As an important step in data pre-processing, feature selection plays a critical role in reducing data size and increasing the overall value of the data. Many previous studies have focused on single-label feature selection, however, with the increasing variety of data types, the need for feature selection on multi-label data types has also arisen. Unlike single-labeled data, multi-labeled data with more combinations of classifications place higher demands on the capabilities of feature selection algorithms. In this paper, we propose a filter-based Multi-Objective Equilibrium Optimizer algorithm (MOEO-Smp) to solve the feature selection problem for both single-label and multi-label data. MOEO-Smp rates the optimization results of solutions and features based on four pairs of optimization principles, and builds three equilibrium pools to guide exploration and exploitation based on the total scores of solutions and features and the ranking of objective fitness values, respectively. Seven UCI single-label datasets and two Mulan multi-label datasets and one COVID-19 multi-label dataset are used to test the feature selection capability of MOEO-Smp, and the feature selection results are compared with 10 other state-of-the-art algorithms and evaluated using three and seven different metrics, respectively. Feature selection experiments and comparisons with the results in other literatures show that MOEO-Smp not only has the highest classification accuracy and excellent dimensionality reduction on single-labeled data, but also performs better on multi-label data in terms of Hamming loss, accuracy, dimensionality reduction, and so on. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Compact Data Learning for Machine Learning Classifications.

Author: Kim, Song-Kyoo
Subjects: MACHINE learning, ARTIFICIAL intelligence, STATISTICAL accuracy, BIG data, CLASSIFICATION, ARRHYTHMIA
Abstract: This paper targets the area of optimizing machine learning (ML) training data by constructing compact data. The methods of optimizing ML training have improved and become a part of artificial intelligence (AI) system development. Compact data learning (CDL) is an alternative practical framework to optimize a classification system by reducing the size of the training dataset. CDL originated from compact data design, which provides the best assets without handling complex big data. CDL is a dedicated framework for improving the speed of the machine learning training phase without affecting the accuracy of the system. The performance of an ML-based arrhythmia detection system and its variants with CDL maintained the same statistical accuracy. ML training with CDL could be maximized by applying an 85% reduced input dataset, which indicated that a trained ML system could have the same statistical accuracy by only using 15% of the original training dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Single-Frame-Based Data Compression for CAN Security.

Author: Jin, Shi-Yi, Seo, Dong-Hyun, Kim, Yeon-Jin, Kim, Yong-Eun, Woo, Samuel, and Chung, Jin-Gyun
Subjects: DATA compression, MESSAGE authentication codes, KIA automobiles, LOSSLESS data compression
Abstract: To authenticate a controller area network (CAN) data frame, a message authentication code (MAC) must be sent along with the CAN frame, but there is no space reserved for the MAC in the CAN frame. Recently, difference-based compression (DBC) algorithms have been used to create a space inside the frame. DBC has the advantage of being very efficient, but its drawback is that, if an error occurs in one frame, the effects of that error propagate to subsequent frames. In this paper, a CAN data compression algorithm is proposed that compresses the current frame without relying on previous frames. Therefore, an error generated in one frame cannot be propagated to subsequent frames. In addition, a CAN signal grouping technique is proposed based on entropy analysis. To efficiently authenticate CAN frames, the length of the compressed data must be 4 bytes or less (4BL). Simulation shows that the 4BL-compression ratio of a Kia Sorento vehicle is 99.36% in the DBC method, but 100% in the proposed method. In an LS Mtron tractor, the 4BL-compression ratio is 98.58% in the DBC method, but 100% in the proposed method. In addition, the execution time of the proposed compression algorithm is only 27.39% of that of the DBC algorithm. The results show that the proposed algorithm has better compression characteristics for CAN security than the DBC algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. A Hierarchical Security Event Correlation Model for Real-Time Threat Detection and Response.

Author: Maosa, Herbert, Ouazzane, Karim, and Ghanem, Mohamed Chahine
Subjects: INTRUSION detection systems (Computer security), FIREWALLS (Computer security), DATA security failures, CLUSTER analysis (Statistics), DATA reduction
Abstract: An intrusion detection system (IDS) perform postcompromise detection of security breaches whenever preventive measures such as firewalls do not avert an attack. However, these systems raise a vast number of alerts that must be analyzed and triaged by security analysts. This process is largely manual, tedious, and time-consuming. Alert correlation is a technique that reduces the number of intrusion alerts by aggregating alerts that are similar in some way. However, the correlation is performed outside the IDS through third-party systems and tools, after the IDS has already generated a high volume of alerts. These third-party systems add to the complexity of security operations. In this paper, we build on the highly researched area of alert and event correlation by developing a novel hierarchical event correlation model that promises to reduce the number of alerts issued by an intrusion detection system. This is achieved by correlating the events before the IDS classifies them. The proposed model takes the best features from similarity and graph-based correlation techniques to deliver an ensemble capability not possible by either approach separately. Further, we propose a correlation process for events rather than alerts as is the case in the current art. We further develop our own correlation and clustering algorithm which is tailor-made to the correlation and clustering of network event data. The model is implemented as a proof of concept with experiments run on standard intrusion detection sets. The correlation achieves an 87% data reduction through aggregation, producing nearly 21,000 clusters in about 30 s. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Filling the Gaps: Using Synthetic Low-Altitude Aerial Images to Increase Operational Design Domain Coverage.

Author: Rüter, Joachim, Maienschein, Theresa, Schirmer, Sebastian, Schopferer, Simon, and Torens, Christoph
Subjects: OBJECT recognition (Computer vision), DATA reduction, MACHINE learning, DRONE aircraft, DATA management, VIDEO surveillance
Abstract: A key necessity for the safe and autonomous flight of Unmanned Aircraft Systems (UAS) is their reliable perception of the environment, for example, to assess the safety of a landing site. For visual perception, Machine Learning (ML) provides state-of-the-art results in terms of performance, but the path to aviation certification has yet to be determined as current regulation and standard documents are not applicable to ML-based components due to their data-defined properties. However, the European Union Aviation Safety Agency (EASA) published the first usable guidance documents that take ML-specific challenges, such as data management and learning assurance, into account. In this paper, an important concept in this context is addressed, namely the Operational Design Domain (ODD) that defines the limitations under which a given ML-based system is designed to operate and function correctly. We investigated whether synthetic data can be used to complement a real-world training dataset which does not cover the whole ODD of an ML-based system component for visual object detection. The use-case in focus is the detection of humans on the ground to assess the safety of landing sites. Synthetic data are generated using the methods proposed in the EASA documents, namely augmentations, stitching and simulation environments. These data are used to augment a real-world dataset to increase ODD coverage during the training of Faster R-CNN object detection models. Our results give insights into the generation techniques and usefulness of synthetic data in the context of increasing ODD coverage. They indicate that the different types of synthetic images vary in their suitability but that augmentations seem to be particularly promising when there is not enough real-world data to cover the whole ODD. By doing so, our results contribute towards the adoption of ML technology in aviation and the reduction of data requirements for ML perception systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Geometric multidimensional scaling: efficient approach for data dimensionality reduction.

Author: Dzemyda, Gintautas and Sabaliauskas, Martynas
Subjects: MULTIDIMENSIONAL scaling, DATA reduction, GLOBAL optimization, PARALLEL algorithms
Abstract: Multidimensional scaling (MDS) is an often-used method to reduce the dimensionality of multidimensional data nonlinearly and to present the data visually. MDS minimizes some stress function which variables are coordinates of points in the projected lower-dimensional space. Recently, the so-called Geometric MDS has been developed, where the stress function and multidimensional scaling, in general, are considered from the geometric point of view. Using ideas of Geometric MDS, it is possible to construct the iterative procedure of minimization of the stress where coordinates of a separate point of the projected space are moved to the new position defined analytically. In this paper, we discover and prove the main advantage of Geometric MDS theoretically: changing the position of all the points of the projected space simultaneously (independently of each other) in the directions and with steps, defined analytically by Geometric MDS strategy for a separate point, decreases the MDS stress. Moreover, the analytical updating of coordinates of projected points in each iteration has a simple geometric interpretation. New properties of Geometric MDS have been discovered. The obtained results allow us for the future development of a class of new both sequential and parallel algorithms. Ideas for global optimization of the stress are highlighted. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. El problema de la reducción dimensional. Análisis de Componentes Principales (PCA).

Author: Pernice, Sergio A.
Subjects: PRINCIPAL components analysis, DATA reduction, MACHINE learning
Abstract: Copyright of Revista Mutis is the property of Revista Mutis and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

41. Meeting Diverse Learning Needs: Exploring Effective Sociology Teacher Strategies in Differentiated Learning.

Author: Tenri Awaru, A. Octamaya, Said Ahmad, M. Ridwan, Sadriani, Andi, and Fajri Maulana, Muh.
Subjects: PROBLEM-based learning, LEARNING strategies, TEACHERS, SOCIOLOGY, GROUP formation, LEARNING, DATA reduction
Abstract: A major challenge for sociology teachers is to teach students the complex sociology subject matter based on their learning needs. The purpose of this study is to explore the strategies used by sociology teachers in applying differentiated learning at the high school level in Makassar City. This research is a qualitative research that uses teachers and students as research informants. Data were collected through interviews, observations, and documentation. Data analysis was done through the stages of data collection, data reduction, and data presentation. Results show that sociology teachers in Makassar City have implemented effective strategies for differentiated learning. These include the use of diverse learning resources other than textbooks, namely audiovisual materials in the form of videos and presentation slides; visual materials in the form of infographics, images, and maps; utilizing relevant articles from blogs and Conference Papers; and initial assessments to identify students’ individual needs in the form of questionnaire filling, diagnostic tests, and pretests. Group formation is flexible based on the student’s ability and group members are rotated in each meeting, applying various learning methods and models, such as problem-based learning, and combining it with cooperative learning models. The teachers also provide additional support in the form of having discussions with students, answering their questions, assigning different assignments, and providing additional material. The findings of this research are expected to contribute to developing different learning approaches, especially in sociology subjects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Dual-Graph-Regularization Constrained Nonnegative Matrix Factorization with Label Discrimination for Data Clustering.

Author: Li, Jie, Li, Yaotang, and Li, Chaoqian
Subjects: MATRIX decomposition, NONNEGATIVE matrices, DATA reduction, MACHINE learning, DATA visualization
Abstract: NONNEGATIVE matrix factorization (NMF) is an effective technique for dimensionality reduction of high-dimensional data for tasks such as machine learning and data visualization. However, for practical clustering tasks, traditional NMF ignores the manifold information of both the data space and feature space, as well as the discriminative information of the data. In this paper, we propose a semisupervised NMF called dual-graph-regularization-constrained nonnegative matrix factorization with label discrimination (DCNMFLD). DCNMFLD combines dual graph regularization and prior label information as additional constraints, making full use of the intrinsic geometric and discriminative structures of the data, and can efficiently enhance the discriminative and exclusionary nature of clustering and improve the clustering performance. The evaluation of the clustering experimental results on four benchmark datasets demonstrates the effectiveness of our new algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model-Based Augmentation †.

Author: Macaluso, Girolamo, Sestini, Alessandro, and Bagdanov, Andrew D.
Subjects: REINFORCEMENT learning, DATA augmentation, STOCHASTIC convergence, DATA reduction, AUTOMATIC data collection systems
Abstract: Offline reinforcement learning leverages pre-collected datasets of transitions to train policies. It can serve as an effective initialization for online algorithms, enhancing sample efficiency and speeding up convergence. However, when such datasets are limited in size and quality, offline pre-training can produce sub-optimal policies and lead to a degraded online reinforcement learning performance. In this paper, we propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective. Our approach leverages a world model of the environment trained on the offline dataset to augment states during offline pre-training. We evaluate our approach on a variety of MuJoCo robotic tasks, and our results show that it can jumpstart online fine-tuning and substantially reduce—in some cases by an order of magnitude—the required number of environment interactions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Feature reduction of unbalanced data classification based on density clustering.

Author: Wang, Zhen-Fei, Yuan, Pei-Yao, Cao, Zhong-Ya, and Zhang, Li-Ying
Subjects: FEATURE selection, DATA reduction, CLASSIFICATION algorithms, BIG data
Abstract: With the development of big data, the problem of imbalanced data sets is becoming more and more serious. When dealing with high-dimensional imbalanced datasets, traditional classification algorithms usually tend to favor the majority class and ignore the minority class, which results in poor classification performance. In this paper, we study the issue of high-dimensional imbalanced dataset classification and propose a feature selection algorithm based on density clustering and importance measure (DBIM). DBIM firstly constructs multiple balanced subsets by randomly under-sampling the majority classes with the same number of samples as the minority classes and uses DBSCAN as the base classifier. This process quickly discovers feature distribution features based on density and generates the initial feature subspace. To select features with a strong classification of class labels, we propose to rank and select the generated initial feature subspace according to their importance. To avoid the redundancy between features and generate high-quality feature subsets, we further propose to design a new class distribution-based weight index combined with the redundancy evaluation index in the DBIM algorithm to calculate between features. Experimental results on eight publicly available datasets show that the DBIM algorithm proposed in this paper can generate feature subsets with high relevance and low redundancy, and can effectively reduce the dimensionality of high-dimensional imbalanced datasets and improve the classification performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Potential pre-processing techniques of data mining on mutual funds.

Author: Singla, Shikha, Gupta, Gaurav, and Bhathal, Gurjit Singh
Subjects: DATA mining, MUTUAL funds, NET Asset Value, DATA integration, DATA scrubbing, DATA reduction
Abstract: Data pre-processing is an essential and requisite step in the data mining process because the raw data collected from the different sources may be imperfect, inconsistent and noisy in nature. The quality of data plays a very important role during the analysis process. The results of the analysis process primarily depend upon the quality of the data input. So, the knowledge discovery process is performed but before that the data pre-processing is the mainstep to be performed on raw data. This paper is based on two main steps-data collection and data pre-processing. Data collection is to collect the raw data from different sources according to the need of research analysis. Data pre-processing is defined as the transformation of the raw data into a structured and understandable format. Moreover, data pre-processing performs not only the transformation of data but also reduces the size of the original data. This process is mainly divided into four parts, i.e. data integration, data cleaning, data transformation, and data reduction. In this paper, the fifteen-year Net Asset Value (NAV) data of twenty mutual funds are taken for analysis purpose. This paper explains several techniques to modify the raw data into an understandable format during data pre-processing process. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. Malware detection for IoT devices using hybrid system of whitelist and machine learning based on lightweight flow data.

Author: Nakahara, Masataka, Okui, Norihiro, Kobayashi, Yasuaki, Miyake, Yutaka, and Kubota, Ayumu
Subjects: HYBRID systems, MACHINE learning, INTERNET of things, DATA reduction, MALWARE, DATA transmission systems
Abstract: For the security of IoT devices, the number and type of devices are generally large, so it is important to collect data efficiently and detect threats in a lightweight way. In this paper, we propose the architecture for malware detection, a method to detect malware using flow information, and a method to decrease the amount of transmission data between the servers in this architecture. We evaluate the performance of malware detection and the amount of data before and after the data reduction. And show that the performance of malware detection is maintained even though the amount of data is reduced. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

47. Virtual Integration of Satellite and In-situ Observation Networks (VISION) v1.0: In-Situ Observations Simulator.

Author: Russo, Maria Rosa, Bartholomew, Sadie L., Hassell, David, Mason, Alex M., Neininger, Erica, Perman, A. James, Sproson, David A. J., Watson-Parris, Duncan, and Abraham, Nathan Luke
Subjects: BUOYS, RESEARCH aircraft, DATA reduction, DATA modeling, ATMOSPHERIC models, MODELS & modelmaking, FLIGHT simulators
Abstract: This work presents the first step in the development of the VISION toolkit, a set of python tools that allows for easy, efficient and more meaningful comparison between global atmospheric models and observational data. Whilst observational data and modelling capabilities are expanding in parallel, there are still barriers preventing these two data sources to be used in synergy. This arises from differences in spatial and temporal sampling between models and observational platforms: observational data from a research aircraft, for example, is sampled on specified flight trajectories at very high temporal resolution. Proper comparison with model data requires generating, storing and handling a large amount of highly temporally resolved model files, resulting in a process which is data, labour, and time intensive. In this paper we focus on comparison between model data and in-situ observations (from aircrafts, ships, buoys, sondes etc.). A stand-alone code, In-Situ Observation simulator, or ISO_simulator in short, is described here: this software reads modelled variables and observational data files and outputs model data interpolated in space and time to match observations. This model data is then written to NetCDF files that can be efficiently archived, due to their small sizes, and directly compared to observations. This method achieves a large reduction in the size of model data being produced for comparison with flight and other in-situ data. By interpolating global, gridded, hourly files onto observations locations, we reduce data output for a typical climate resolution run, from ~3 Gb per model variable per month to ~15 Mb per model variable per month (a 200 times reduction in data volume). The VISION toolkit is fast and easy to use, therefore enabling the exploitation of large observational datasets spanning decades, to be used for large scale model evaluation. Although this code has been initially tested within the Unified Model (UM) framework, which is shared by the UK Earth System Model (UKESM), it was written as a flexible tool and it can be extended to work with other models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Wavelet-based 3D Data Cube Denoising Using Three Scales of Dependency.

Author: Chen, Guang Yi and Krzyzak, Adam
Subjects: NOISE control, CUBES, WAVELET transforms, DATA reduction, THRESHOLDING algorithms, NOISE
Abstract: In this paper, we propose a novel method for 3D data cube denoising, where the 3D data cube is corrupted by noise with spatially varying noise levels. We perform 3D dual tree complex wavelet transform (DTCWT) to the 3D data cube, and then conduct wavelet-based thresholding based on three scales of dependency in wavelet coefficients. Instead of using the global noise level, we estimate the noise levels locally, which improve the denoising results substantially. We conduct inverse DTCWT to obtain the noise reduced data cubes. Experiments demonstrate that our proposed method outperforms block matching and 3D filtering, video block matching and 3D filtering, 2D bivariate shrinkage, and 3D bivariate shrinkage significantly for noise reduction of 3D data cubes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Immediate traceability of ash based on laser-induced breakdown spectroscopy and machine learning.

Author: Cai, Yuyao, Wan, Enlai, Cai, Jinzhu, Zhang, Xinyang, Ye, Yanpeng, Yin, Yijun, and Liu, Yuzhu
Subjects: LASER-induced breakdown spectroscopy, MACHINE learning, FISHER discriminant analysis, RANDOM forest algorithms, ENVIRONMENTAL monitoring, DATA reduction
Abstract: This article reports on an advanced level physical laboratory experiment designed for college level undergraduate education and for scholars who need specialized training on using and interpreting Laser-Induced Breakdown Spectroscopy. The technical principle, experimental operation and sample preparation of Laser-Induced Breakdown Spectroscopy are introduced in detail. A presentation and discussion of the use of Laser-Induced Breakdown Spectroscopy in the traceability of four common samples is emphasized. Combining Laser-Induced Breakdown Spectroscopy with Machine Learning, two distinct datasets are constructed through the extraction of spectral features. Dimensionality reduction of spectral data is performed using Linear Discriminant Analysis, while the Random Forest model is employed for provenance classification. Finally, the interpretability of the Random Forest model is leveraged to explore the contributions of different spectral elements to provenance tracing. Results demonstrated the system's effectiveness in not only accurately identifying ash types but also in elucidating the influential chemical components, offering significant implications for material analysis and environmental monitoring. On an educational standpoint, this paper will allow any reader, in particular, undergraduate and graduate students, to gain a better understanding of the theory and practice of Laser-Induced Breakdown Spectroscopy and machine learning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Distributed data filtering and modeling for fog and networked manufacturing.

Author: Li, Yifu, Wang, Lening, Chen, Xiaoyu, and Jin, Ran
Subjects: DATA modeling, DATA reduction, DATA transmission systems, MANUFACTURING processes, MACHINE learning, MULTICASTING (Computer networks)
Abstract: Fog Manufacturing applies both Fog and Cloud Computing collaboratively in Smart Manufacturing to create an interconnected network through sensing, actuation, and computation nodes. Fog Manufacturing has become a promising research component to be integrated into the existing Smart Manufacturing paradigm and provides reliable and responsive computation services. However, Fog nodes' relatively limited communication bandwidth and computation capabilities call for reduced data communication load and computation time latency for modeling. There has long been a lack of an integrated framework to automatically reduce manufacturing data and perform computationally efficient modeling/machine learning. This research direction is increasingly important as both the computational demands and Fog/networked Manufacturing become prevalent. This paper proposes an integrated and distributed framework for data reduction and modeling of multiple systems in a Smart Manufacturing network considering the system similarities. A simulation study and a Fog Manufacturing testbed for ingot growth manufacturing validated that the proposed framework significantly reduces the sample size used for improved computational runtime metrics while outperforming various other data reduction methods in modeling performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,233 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources