Publication Type: Academic Journals / Publication Year Range: This year / Topic: data reduction - Searchworks@Jio Institute Digital Library Search Results

1. Application of attenuated total reflectance—Fourier transform infrared spectroscopy—(ATR-FTIR) and principal component analysis (PCA) in identification of copying pencils on different paper substrates.

Author: Grzelec, Małgorzata
Subjects: *GENTIAN violet, *PRINCIPAL components analysis, *METHYLENE blue, *DATA reduction, *PENCILS
Abstract: Presence of copying pencils in heritage objects poses a significant challenge for conservators due to their proneness to fading, sensitivity to solvents and difficulties in differentiation from regular graphite pencils. In this paper a method of copying pencils identification by means of ATR—FTIR spectroscopy is used. A protocol for spectra processing and dimensionality reduction of spectral data by means of principal component analysis has been developed, allowing for pencil types differentiation and providing an easy to read visual representation of the collected data. The protocol has been developed on mock-up samples and tested on objects from the archives of the State Museum Auschwitz – Birkenau. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Factors Forming Community Resilience Affected By Floods.

Author: Suhaeb, Firdaus W., Rasyid, Sri Jayanti, Wahda, Muhammad Aksha, Ramli, Mauliadi, and Kaseng, Ernawati S.
Subjects: HAZARD mitigation, FLOODS, RIVER conservation, DATA reduction, CONFERENCE papers, JOB stress
Abstract: This Conference Paper uses descriptive-qualitative research to describe and analyze factors forming community resilience affected by floods. The determination of informants is determined deliberately on flood-affected communities using criteria according to the purpose of the study. Primary data were obtained from in-depth observations and interviews, while secondary data were obtained from library sources and relevant data. The data collection techniques used were observation, interviews, and documentation, while the data were analyzed using descriptive-qualitative analysis through several stages, namely data reduction, data presentation, and conclusions. The results of this study show that factors forming the resilience of the community to flood disasters are: (1) Value factors that have existed in the community for many years in the flooded land, namely mutual assistance; (2) Economic factors, finding alternative jobs or coping strategies by flood-affected communities; (3) Social factors, namely knowledge and skills to adapt to flood disasters through non-formal training and counselling on disaster and disaster mitigation from their experience or obtained through social media and mass media, such as television and radio; (4) Institutional factors, namely socialization of early flood warning, socialization of flood disaster mitigation, appeals for the prohibition of throwing garbage in rivers and essential food assistance before and during the occurrence of flood by the relevant government; and (5) Infrastructure factors that include the construction of facilities and infrastructure such as river dredging, drainage construction, and river cliff protection walls. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. A low‐cost method for testing and analyzing the cervical range of motion.

Author: Zhang, Xun, Xu, Guanghua, Li, Zejin, Teng, Zhicheng, Zhang, Xin, and Zhang, Sicong
Subjects: RANGE of motion of joints, ROTATIONAL motion, TEST methods, SPONDYLOSIS, DECOMPOSITION method, DATA reduction
Abstract: Measurement of the cervical range of motion (CROM) is significant for early diagnosis of cervical spondylosis and determination of the severity of the disease. Aiming at the problem of convenient and continuous measurement of CROM and the reduction of the impact of data fluctuations, this paper proposes a low‐cost method for testing and analyzing CROM. This paper analyzes the correspondence between smartphone orientation sensor angle and CROM, obtains smartphone orientation sensing data, and uses the extreme‐point symmetric mode decomposition method by calculating the energy of the difference value to extract the adaptive global mean curve of the CROM data, and calculates the cervical ROM. The statistical analysis of the CROM test results is carried out. Results show that the method proposed can obtain the CROM including flexion and extension, lateral flexion, and rotation in a single measurement, which has the advantages of continuous measurement and low cost. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Energy function-based and norm-free event-triggering for scheduling control data transmissions.

Author: Kurtoglu, Deniz, Yucelen, Tansel, and Muse, Jonathan A.
Subjects: DATA transmission systems, STATE feedback (Feedback control systems), ENERGY function, SCHEDULING, DATA reduction
Abstract: This paper studies the problem of scheduling control data transmissions from embedded processors to physical systems. For this problem, we propose novel state feedback and output feedback control architectures that are predicated on energy function-based and norm-free event-triggering conditions, where the embedded processor broadcasts a sampled data of its control signal value through a zero-order-hold operator to the physical system when the left side of the event-triggering condition equals to its right side. In this context, the energy function-based feature means that the right sides of the proposed event-triggering conditions involve an energy function as well as its time-derivative to make the selection of these right sides user-adjustable. Furthermore, the norm-free feature means that the left sides of the proposed event-triggering conditions do not depend on signal norms to allow for better control data transmission reduction. System-theoretical analyses of our event-triggered state feedback and output feedback control architectures are shown using the same energy functions and their time-derivatives used in the proposed event-triggering conditions, where illustrative numerical examples are also presented to demonstrate the efficacy of our contributions. To the best of our knowledge, the results given in this paper explore how event-triggering conditions are linked not only to energy functions but also to their time-derivatives for the first time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Feature reduction of unbalanced data classification based on density clustering.

Author: Wang, Zhen-Fei, Yuan, Pei-Yao, Cao, Zhong-Ya, and Zhang, Li-Ying
Subjects: FEATURE selection, DATA reduction, CLASSIFICATION algorithms, BIG data
Abstract: With the development of big data, the problem of imbalanced data sets is becoming more and more serious. When dealing with high-dimensional imbalanced datasets, traditional classification algorithms usually tend to favor the majority class and ignore the minority class, which results in poor classification performance. In this paper, we study the issue of high-dimensional imbalanced dataset classification and propose a feature selection algorithm based on density clustering and importance measure (DBIM). DBIM firstly constructs multiple balanced subsets by randomly under-sampling the majority classes with the same number of samples as the minority classes and uses DBSCAN as the base classifier. This process quickly discovers feature distribution features based on density and generates the initial feature subspace. To select features with a strong classification of class labels, we propose to rank and select the generated initial feature subspace according to their importance. To avoid the redundancy between features and generate high-quality feature subsets, we further propose to design a new class distribution-based weight index combined with the redundancy evaluation index in the DBIM algorithm to calculate between features. Experimental results on eight publicly available datasets show that the DBIM algorithm proposed in this paper can generate feature subsets with high relevance and low redundancy, and can effectively reduce the dimensionality of high-dimensional imbalanced datasets and improve the classification performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Online evolution of a phased array for ultrasonic imaging by a novel adaptive data acquisition method.

Author: Lukacs, Peter, Stratoudaki, Theodosia, Davis, Geo, and Gachagan, Anthony
Subjects: PHASED array antennas, ULTRASONIC arrays, ACQUISITION of data, ULTRASONIC imaging, DATA reduction, TIME measurements
Abstract: Ultrasonic imaging, using ultrasonic phased arrays, has an enormous impact in science, medicine and society and is a widely used modality in many application fields. The maximum amount of information which can be captured by an array is provided by the data acquisition method capturing the complete data set of signals from all possible combinations of ultrasonic generation and detection elements of a dense array. However, capturing this complete data set requires long data acquisition time, large number of array elements and transmit channels and produces a large volume of data. All these reasons make such data acquisition unfeasible due to the existing phased array technology or non-applicable to cases requiring fast measurement time. This paper introduces the concept of an adaptive data acquisition process, the Selective Matrix Capture (SMC), which can adapt, dynamically, to specific imaging requirements for efficient ultrasonic imaging. SMC is realised experimentally using Laser Induced Phased Arrays (LIPAs), that use lasers to generate and detect ultrasound. The flexibility and reconfigurability of LIPAs enable the evolution of the array configuration, on-the-fly. The SMC methodology consists of two stages: a stage for detecting and localising regions of interest, by means of iteratively synthesising a sparse array, and a second stage for array optimisation to the region of interest. The delay-and-sum is used as the imaging algorithm and the experimental results are compared to images produced using the complete generation-detection data set. It is shown that SMC, without a priori knowledge of the test sample, is able to achieve comparable results, while preforming ∼ 10 times faster data acquisition and achieving ∼ 10 times reduction in data size. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. A bidirectional reversible and multilevel location privacy protection method based on attribute encryption.

Author: Hu, Zhaowei, Hu, Kaiyi, and Hasan, Milu Md Khaled
Subjects: LOCATION data, QUALITY of service, DATA reduction, PRIVATE security services, TRUST, ACCESS control
Abstract: Various methods such as k-anonymity and differential privacy have been proposed to safeguard users' private information in the publication of location service data. However, these typically employ a rigid "all-or-nothing" privacy standard that fails to accommodate users' more nuanced and multi-level privacy-related needs. Data is irrecoverable once anonymized, leading to a permanent reduction in location data quality, in turn significantly diminishing data utility. In the paper, a novel, bidirectional and multi-layered location privacy protection method based on attribute encryption is proposed. This method offers layered, reversible, and fine-grained privacy safeguards. A hierarchical privacy protection scheme incorporates various layers of dummy information, using an access structure tree to encrypt identifiers for these dummies. Multi-level location privacy protection is achieved after adding varying amounts of dummy information at different hierarchical levels N. This allows for precise control over the de-anonymization process, where users may adjust the granularity of anonymized data based on their own trust levels for multi-level location privacy protection. This method includes an access policy which functions via an attribute encryption-based access control system, generating decryption keys for data identifiers according to user attributes, facilitating a reversible transformation between data anonymity and de-anonymity. The complexities associated with key generation, distribution, and management are thus markedly reduced. Experimental comparisons with existing methods demonstrate that the proposed method effectively balances service quality and location privacy, providing users with multi-level and reversible privacy protection services. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Low-Power Preprocessing System at MCU-Based Application Nodes for Reducing Data Transmission.

Author: Kim, Donguk, Roh, Chanhwi, Baek, Donkyu, and Choi, Seong-gon
Subjects: COMPUTER hardware description languages, DATA transmission systems, LOGIC design, EDGE computing, DATA reduction
Abstract: Edge computing enables prompt responses in IoT environments, such as the operation of autonomous vehicles and unmanned aerial vehicles. However, with the increase in sensor nodes, the computational burden on the computing node also increases. Specifically, data filtering and reduction at application nodes add to the energy burden for battery-operated devices. In this paper, we propose a preprocessing system at the application node that requires low power consumption for data transmission reduction. Based on our simulations, we identify the minimum data size needed to preserve the signal. We first design the preprocessing system using a hardware description language to evaluate its performance. Then, we implement the open-library-based MCU system, including the proposed preprocessing IP, to assess its operation and overhead. Our implementation of the preprocessing system reduces data transmission by 50% with acceptable information loss. Additionally, the area and power consumption after the logic synthesis of the preprocessing IP within the entire MCU system are evaluated at only 3.6% and 13.1%, respectively. By performing preprocessing using the MCU and proposed IP, nearly 74.4% power reduction is achieved compared to using the existing MCU core. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Tensor eigenvectors for projection pursuit

Author: Loperfido, Nicola
Published: 2024
Full Text: View/download PDF

10. A novel ensemble deep reinforcement learning model for short‐term load forecasting based on Q‐learning dynamic model selection.

Author: He, Xin, Zhao, Wenlu, Zhang, Licheng, Zhang, Qiushi, and Li, Xinyu
Subjects: REINFORCEMENT learning, LOAD forecasting (Electric power systems), RECURRENT neural networks, ARTIFICIAL intelligence, DATA reduction
Abstract: Short‐term load forecasting is critical for power system planning and operations, and ensemble forecasting methods for electricity loads have been shown to be effective in obtaining accurate forecasts. However, the weights in ensemble prediction models are usually preset based on the overall performance after training, which prevents the model from adapting in the face of different scenarios, limiting the improvement of prediction performance. In order to improve the accurateness and validity of the ensemble prediction method further, this paper proposes an ensemble deep reinforcement learning approach using Q‐learning dynamic weight assignment to consider local behaviours caused by changes in the external environment. Firstly, the variational mode decomposition is used to reduce the non‐stationarity of the original data by decomposing the load sequence. Then, the recurrent neural network (RNN), long short‐term memory (LSTM), and gated recurrent unit (GRU) are selected as the basic power load predictors. Finally, the optimal weights are ensembled for the three sub‐predictors by the optimal weights generated using the Q‐learning algorithm, and the final results are obtained by combining their respective predictions. The results show that the forecasting capability of the proposed method outperforms all sub‐models and several baseline ensemble forecasting methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. ROME/REA: Three-year, Tri-color Timeseries Photometry of the Galactic Bulge.

Author: Street, R. A., Bachelet, E., Tsapras, Y., Hundertmark, M. P. G., Bozza, V., Bramich, D. M., Cassan, A., Dominik, M., Figuera Jaimes, R., Horne, K., Mao, S., Saha, A., Wambsganss, J., and Zang, Weicheng
Subjects: *PHOTOMETRY, *GALACTIC bulges, *VERY large array telescopes, *DATA release, *IMAGE analysis, *DATA reduction
Abstract: The Robotic Observations of Microlensing Events/Reactive Event Assessment Survey was a Key Project at Las Cumbres Observatory (hereafter LCO) which continuously monitored 20 selected fields (3.76 sq.deg) in the Galactic Bulge throughout their seasonal visibility window over a three-year period, between 2017 March and 2020 March. Observations were made in three optical passbands (SDSS − g ′ , − r ′ , − i ′ ), and LCO's multi-site telescope network enabled the survey to achieve a typical cadence of ∼10 hr in i ′ and ∼15 hr in g ′ and r ′ . In addition, intervals of higher cadence (<1 hr) data were obtained during monitoring of key microlensing events within the fields. This paper describes the Difference Image Analysis data reduction pipeline developed to process these data, and the process for combining the photometry from LCO's three observing sites in the Southern Hemisphere. The full timeseries photometry for all ∼8 million stars, down to a limiting magnitude of i ∼ 18 mag is provided in the data release accompanying this paper, and samples of the data are presented for exemplar microlensing events, illustrating how the tri-band data are used to derive constraints on the microlensing source star parameters, a necessary step in determining the physical properties of the lensing object. The timeseries data also enables a wealth of additional science, for example in characterizing long-timescale stellar variability, and a few examples of the data for known variables are presented. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Development of Basic Knowledge Construction Technique to Reduce The Volume of High-Dimensional Big Data.

Author: Karya, Gede, Sitohang, Benhard, Akbar, Saiful, and Moertini, Veronica S.
Subjects: INFORMATION & communication technologies, EXTRACTION techniques, DATA warehousing, DATA reduction, HIGH technology, BIG data
Abstract: Big data is growing fast following Big data has the characteristics of high volume, speed, and variety (3v) and continues to grow exponentially following the development of the world's use of information and communication technology. The main problem with big data is the data deluge. The need for technology and large data storage and processing methods to keep pace with the exponential data growth rate is potentially limitless, giving rise to the problem of exponentially increasing technology requirements as well. The weakness of previous big data analysis approaches (batch and online real time processing) is that it requires high technology (large storage, memory and processing). This paper proposes a new approach in big-data analysis by separating the basic knowledge (BK) construction process from original data with a much smaller volume (volume reduction), then analyzing it into final knowledge, thus requiring smaller/simpler analysis technology. The proposals include formulating the definition and representation of BK, developing methods for constructing BK from source data, and analyzing BK into final knowledge. We propose a BK construction method based on a knowledge extraction technique using BIRCH clustering algorithm for instance reduction, and handling high-dimensional problems by parallelizing the dimension calculation process to calculate the distance between instances. We use the Adjusted Rand Index (ARI) to measure the similarity of the final knowledge of the baseline and proposed methods. First, the BIRCH baseline was modified by parallelizing the calculations has succeeded in increasing speed from 17% to 25%. Next, the parallel BIRCH (PBIRCH) baseline was broken into BK construction and BK analysis, has succeeded in reducing volume by 96% or more and increasing speed by 43.50%, with similar final knowledge results (ARI=1). Based on these results, we conclude that the BK construction method and analysis from BK into final knowledge for highdimensional big data have significantly reduced volume and speed up the analytical process without reducing the quality of the final knowledge. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Acoustic logging array signal denoising using U-net and a case study in a TangGu oil field.

Author: Fu, Xin, Gou, Yang, and Wei, Fuqiang
Subjects: SIGNAL denoising, CONVOLUTIONAL neural networks, ARTIFICIAL neural networks, NOISE control, OIL fields, ACOUSTIC emission testing, DATA reduction
Abstract: This study developed a noise-reduction method for acoustic logging array signals using a deep neural network algorithm in the time-frequency domain. Initially, we derived analytical solutions for the received waveforms when the acoustic logging tool was positioned either at the centre or eccentrically within the borehole. To simulate the received waveforms across various formations, we developed a real-axis integration algorithm. Subsequently, we devised a noise-reduction algorithm workflow based on a convolutional neural network and configured the structure and parameters of the U-net using TensorFlow. To address the scarcity of open datasets, we established both signal and noise datasets. The signal dataset was generated using theoretical simulation encompassing various model parameters, while the noise dataset was collected during tool testing and downhole operations. The trained model demonstrated substantial noise-reduction capabilities during validation. To validate the effectiveness of the algorithm, we applied noise reduction to actual data collected during downhole operations in a TangGu oil field, yielding impressive results across different types of noisy data. Therefore, the U-net-based time-domain noise-reduction algorithm proposed in this paper holds the potential to significantly improve the quality of acoustic logging array signals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques.

Author: Folino, Francesco, Folino, Gianluigi, Pisani, Francesco Sergio, Pontieri, Luigi, and Sabatino, Pietro
Subjects: FEDERATED learning, DATA reduction, DEEP learning, INTERNET security
Abstract: In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is proposed for the federated learning and application of a distributed classification model in domains (like cybersecurity and healthcare) where different parties of the federation store different subsets of features for a number of data instances. The framework is designed to limit the risk of information leakage and computation/communication costs in both model training (through data sampling) and application (leveraging the conditional-computation abilities of sparse MoEs). Experiments on real data have shown the proposed approach to ensure a better balance between efficiency and model accuracy, compared to other VFL-based solutions. Notably, in a real-life cybersecurity case study focused on malware classification (the KronoDroid dataset), the proposed method surpasses competitors even though it utilizes only 50% and 75% of the training set, which is fully utilized by the other approaches in the competition. This method achieves reductions in the rate of false positives by 16.9% and 18.2%, respectively, and also delivers satisfactory results on the other evaluation metrics. These results showcase our framework's potential to significantly enhance cybersecurity threat detection and prevention in a collaborative yet secure manner. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Multistage Compressor Design Based on Dimensional Zooming.

Author: Li, Hefei, Zheng, Qun, and Jiang, Bin
Subjects: COMPRESSOR performance, COMPRESSORS, DATA reduction
Abstract: This paper proposed an axial multistage compressor aerodynamic design methodology based on dimensional zooming. Two-dimensional design parameters are acquired through the dimensionality reduction of three-dimensional data to avoid dependence on the empirical loss models. The research on the zooming design method is conducted on a self-designed five-stage compressor. The method has considered the three-dimensional end wall viscosity loss and modified the blade profiles and blade stacking. It provided a path on how to improve the blade design based on computational fluid dynamics analysis. The aerodynamic performance, compressor characteristics of the prototype, and the zooming design are investigated and compared. The results show that the zooming design compressor performances are improved than those of the prototype one. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. The Research on Deep Learning-Driven Dimensionality Reduction and Strain Prediction Techniques Based on Flight Parameter Data.

Author: Huang, Wenbo, Wang, Rui, Zhang, Mengchuang, and Yin, Zhiping
Subjects: DEEP learning, STRUCTURAL health monitoring, DATA reduction, FORECASTING
Abstract: Loads and strains in critical areas play a crucial role in aircraft structural health monitoring, the tracking of individual aircraft lifespans, and the compilation of load spectra. Direct measurement of actual flight loads presents challenges. This process typically involves using load-strain stiffness matrices, derived from ground calibration tests, to map measured flight parameters to loads at critical locations. Presently, deep learning neural network methods are rapidly developing, offering new perspectives for this task. This paper explores the potential of deep learning models in predicting flight parameter loads and strains, integrating the methods of flight parameter preprocessing techniques, flight maneuver recognition (FMR), virtual ground calibration tests for wings, dimensionality reduction of flight data through Autoencoder (AE) network models, and the application of Long Short-Term Memory (LSTM) network models to predict strains. These efforts contribute to the prediction of strains in critical areas based on flight parameters, thereby enabling real-time assessment of aircraft damage. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Prospective Elementary School Teachers Environmental Literacy: What, Why, and How?

Author: Suratmi, Suratmi, Supriatna, Nana, Sopandi, Wahyu, and Wulan, Ana Ratna
Subjects: ELEMENTARY school teachers, ENVIRONMENTAL literacy, DATA reduction, HEALTH literacy, REFERENCE sources
Abstract: Environmental literacy is one of the main subjects and themes in developing 21stcentury skills. Environmental literacy can be used as an effort to overcome problems related to environmental issues. This paper aims to provide an overview of the role of universities in producing prospective elementary school teachers who have good environmental literacy and are expected to be able to implement it in learning. For this purpose, there are 3 questions in this paper 1) What are the components of measuring environmental literacy from elementary school teacher's prospective? Why is it necessary to develop environmental literacy in elementary school teacher's prospective? How can environmental literacy be trained and improved? This study uses a descriptive method with a qualitative approach. Data collection techniques used were literature documents such as international articles, national articles, and relevant books. The data analysis technique includes four stages, namely (1) data collection, (2) data presentation, (3) data reduction, and data inventory (4) data conclusion. The results of this study are in the form of information on measuring components of environmental literacy for prospective elementary school teachers, reasons why it is necessary to develop literacy, and alternative learning models that can be used to instill environmental literacy for prospective elementary school teachers. The results of this study can be used as reference material in training and developing the environmental literacy of students, especially in universities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Analysis of the adhesively bonded composite double cantilever beam specimen with emphasis on bondline constraint, adherend through-thickness flexibility and fracture process zone relative size.

Author: de Morais, A. B.
Subjects: ADHESIVE joints, COMPOSITE construction, DOUBLE bonds, CANTILEVERS, LAMINATED composite beams, FINITE element method, MATERIAL plasticity, DATA reduction
Abstract: The double cantilever beam (DCB) specimen is widely used to characterise the mode I fracture of adhesive joints. This paper analyses some particular characteristics of adhesively bonded composite DCB specimens which could affect test results. Three-dimensional (3D) and two-dimensional (2D) finite element analyses (FEA) were conducted in order to evaluate the effects bondline constraint and adherend through-thickness flexibility on the specimen response. Since beam theory-based data reduction schemes are widespread, beam models were also employed to analyze the effects of adherend through-thickness flexibility and fracture process zone relative size. It is shown that, although composite adherends are usually thinner and have much lower transverse moduli than metal adherends, the level of bondline constraint is similarly high. This may: limit the level of adhesive plastic deformations in the fracture process zone; generate high bondline tractions that increase the likelihood of interface failure and interlaminar damage in the composite adherends. The present analyses also show relevant effects of adherend through-thickness flexibility in the adhesive elastic loading stage. Finally, smaller fracture process zones relative to metal adherend DCB specimens were predicted by a beam cohesive zone model. This may explain lower fracture energy values reported with composite adherends in some studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Analyzing Data Reduction Techniques: An Experimental Perspective.

Author: Fernandes, Vítor, Carvalho, Gonçalo, Pereira, Vasco, and Bernardino, Jorge
Subjects: DATA reduction, DIGITAL technology, ENERGY consumption, DATA compression, BIG data
Abstract: The exponential growth in data generation has become a ubiquitous phenomenon in today's rapidly growing digital technology. Technological advances and the number of connected devices are the main drivers of this expansion. However, the exponential growth of data presents challenges across different architectures, particularly in terms of inefficient energy consumption, suboptimal bandwidth utilization, and the rapid increase in data stored in cloud environments. Therefore, data reduction techniques are crucial to reduce the amount of data transferred and stored. This paper provides a comprehensive review of various data reduction techniques and introduces a taxonomy to classify these methods based on the type of data loss. The experiments conducted in this study include distinct data types, assessing the performance and applicability of these techniques across different datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Data reduction for SVM training using density-based border identification.

Author: Shalaby, Mohammed, Farouk, Mohamed, and Khater, Hatem A.
Subjects: DATA reduction, SUPPORT vector machines, QUADRATIC programming, DATA extraction, STATISTICAL decision making
Abstract: Numerous classification and regression problems have extensively used Support Vector Machines (SVMs). However, the SVM approach is less practical for large datasets because of its processing cost. This is primarily due to the requirement of optimizing a quadratic programming problem to determine the decision boundary during training. As a result, methods for selecting data instances that have a better likelihood of being chosen as support vectors by the SVM algorithm have been developed to help minimize the bulk of training data. This paper presents a density-based method, called Density-based Border Identification (DBI), in addition to four different variations of the method, for the lessening of the SVM training data through the extraction of a layer of border instances. For higher-dimensional datasets, the extraction is performed on lower-dimensional embeddings obtained by Uniform Manifold Approximation and Projection (UMAP), and the resulting subset can be repetitively used for SVM training in higher dimensions. Experimental findings on different datasets, such as Banana, USPS, and Adult9a, have shown that the best-performing variations of the proposed method effectively reduced the size of the training data and achieved acceptable training and prediction speedups while maintaining an adequate classification accuracy compared to training on the original dataset. These results, as well as comparisons to a selection of related state-of-the-art methods from the literature, such as Border Point extraction based on Locality-Sensitive Hashing (BPLSH), Clustering-Based Convex Hull (CBCH), and Shell Extraction (SE), suggest that our proposed methods are effective and potentially useful. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. SLM based Circular (6, 2) mapping scheme with improved SER performance for PAPR reduction in OCDM without side information.

Author: Singh, Mohit Kumar and Goel, Ashish
Subjects: RAYLEIGH fading channels, ADDITIVE white Gaussian noise channels, RADIO transmitter fading, DATA reduction, WIRELESS channels
Abstract: Like OFDM, OCDM signal also suffers from high PAPR. SLM is an attractive PAPR reduction method but it needs to transmit the information regarding phase sequence i.e. SI to the receiver, which results in reduction in SE and data rate. Various SLM based SI free and SI embedding schemes are available in the literature. In this paper, different possible quaternary to 8-QAM mapping schemes for SI free SLM based PAPR reduction in OCDM system are discussed. Moreover, a new mapping scheme to eliminate the SI requirement has also been presented in this manuscript for SLM based PAPR reduction in OCDM system. Proposed Circular (6, 2) mapping scheme doesn't require SI at the receiving end which results in increase in SE and data rate as compared to standard SLM technique. Random sequence with phase factors {1, −1} is used to generate the phase sequences for the proposed mapping scheme. Also, the analytical expression for SER of Circular (6, 2) mapping scheme over AWGN channel is derived. We have also evaluated the mathematical expression for SER of other possible quaternary to 8-QAM mapping schemes. Computer simulation using MATLAB are also carried out to investigate the performance of all these mapping schemes in terms of PAPR reduction performance as well as SER performance over AWGN and multipath Rayleigh fading channel. Proposed scheme achieves the same PAPR reduction and improved SER performance w.r.t. other six schemes taken into consideration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Measurement of Diffusion Coefficients in Binary Mixtures and Solutions by the Taylor Dispersion Method.

Author: Martin Trusler, J. P.
Subjects: DIFFUSION measurements, DIFFUSION coefficients, DISPERSION (Chemistry), BINARY mixtures, DATA reduction, ACQUISITION of data
Abstract: The theory and application of the Taylor Dispersion technique for measuring diffusion coefficients in binary systems is reviewed. The theory discussed in this paper includes both the ideal Taylor–Aris model and the estimation of corrections required to account for small deviations from this ideal associated with a practical apparatus. Based on the theoretical treatment, recommendations are given for the design of practical instruments together with suggestions for calibration, data acquisition and reduction, and the rigorous estimation of uncertainties. The analysis indicates that relative uncertainties on the order of 1% are achievable in practice. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. A Filter-Based Improved Multi-Objective Equilibrium Optimizer for Single-Label and Multi-Label Feature Selection Problem.

Author: Wang, Wendong, Li, Yu, Liu, Jingsen, and Zhou, Huan
Subjects: FEATURE selection, DATA reduction, EQUILIBRIUM, BASE pairs, BIG data
Abstract: Effectively reducing the dimensionality of big data and retaining its key information has been a research challenge. As an important step in data pre-processing, feature selection plays a critical role in reducing data size and increasing the overall value of the data. Many previous studies have focused on single-label feature selection, however, with the increasing variety of data types, the need for feature selection on multi-label data types has also arisen. Unlike single-labeled data, multi-labeled data with more combinations of classifications place higher demands on the capabilities of feature selection algorithms. In this paper, we propose a filter-based Multi-Objective Equilibrium Optimizer algorithm (MOEO-Smp) to solve the feature selection problem for both single-label and multi-label data. MOEO-Smp rates the optimization results of solutions and features based on four pairs of optimization principles, and builds three equilibrium pools to guide exploration and exploitation based on the total scores of solutions and features and the ranking of objective fitness values, respectively. Seven UCI single-label datasets and two Mulan multi-label datasets and one COVID-19 multi-label dataset are used to test the feature selection capability of MOEO-Smp, and the feature selection results are compared with 10 other state-of-the-art algorithms and evaluated using three and seven different metrics, respectively. Feature selection experiments and comparisons with the results in other literatures show that MOEO-Smp not only has the highest classification accuracy and excellent dimensionality reduction on single-labeled data, but also performs better on multi-label data in terms of Hamming loss, accuracy, dimensionality reduction, and so on. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Compact Data Learning for Machine Learning Classifications.

Author: Kim, Song-Kyoo
Subjects: MACHINE learning, ARTIFICIAL intelligence, STATISTICAL accuracy, BIG data, CLASSIFICATION, ARRHYTHMIA
Abstract: This paper targets the area of optimizing machine learning (ML) training data by constructing compact data. The methods of optimizing ML training have improved and become a part of artificial intelligence (AI) system development. Compact data learning (CDL) is an alternative practical framework to optimize a classification system by reducing the size of the training dataset. CDL originated from compact data design, which provides the best assets without handling complex big data. CDL is a dedicated framework for improving the speed of the machine learning training phase without affecting the accuracy of the system. The performance of an ML-based arrhythmia detection system and its variants with CDL maintained the same statistical accuracy. ML training with CDL could be maximized by applying an 85% reduced input dataset, which indicated that a trained ML system could have the same statistical accuracy by only using 15% of the original training dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Single-Frame-Based Data Compression for CAN Security.

Author: Jin, Shi-Yi, Seo, Dong-Hyun, Kim, Yeon-Jin, Kim, Yong-Eun, Woo, Samuel, and Chung, Jin-Gyun
Subjects: DATA compression, MESSAGE authentication codes, KIA automobiles, LOSSLESS data compression
Abstract: To authenticate a controller area network (CAN) data frame, a message authentication code (MAC) must be sent along with the CAN frame, but there is no space reserved for the MAC in the CAN frame. Recently, difference-based compression (DBC) algorithms have been used to create a space inside the frame. DBC has the advantage of being very efficient, but its drawback is that, if an error occurs in one frame, the effects of that error propagate to subsequent frames. In this paper, a CAN data compression algorithm is proposed that compresses the current frame without relying on previous frames. Therefore, an error generated in one frame cannot be propagated to subsequent frames. In addition, a CAN signal grouping technique is proposed based on entropy analysis. To efficiently authenticate CAN frames, the length of the compressed data must be 4 bytes or less (4BL). Simulation shows that the 4BL-compression ratio of a Kia Sorento vehicle is 99.36% in the DBC method, but 100% in the proposed method. In an LS Mtron tractor, the 4BL-compression ratio is 98.58% in the DBC method, but 100% in the proposed method. In addition, the execution time of the proposed compression algorithm is only 27.39% of that of the DBC algorithm. The results show that the proposed algorithm has better compression characteristics for CAN security than the DBC algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. DATA-DRIVEN CONSTRUCTION OF HIERARCHICAL MATRICES WITH NESTED BASES.

Author: DIFENG CAI, HUA HUANG, CHOW, EDMOND, and YUANZHE XI
Subjects: KERNEL functions, FAST multipole method, MATRICES (Mathematics), DATA reduction, COMPUTATIONAL complexity
Abstract: Hierarchical matrices provide a powerful representation for significantly reducing the computational complexity associated with dense kernel matrices. For example, the fast multipole method (FMM) and its variants are highly efficient when the kernel function is related to fundamental solutions of classical elliptic PDEs. For general kernel functions, interpolation-based methods are widely used for the efficient construction of hierarchical matrices. In this paper, we present a fast hierarchical data reduction (HiDR) procedure with O(n) complexity for the memoryefficient construction of hierarchical matrices with nested bases where n is the number of data points. HiDR aims to reduce the given data in a hierarchical way so as to obtain O(1) representations for all nearfield and farfield interactions. Based on HiDR, a linear complexity H² matrix construction algorithm is proposed. The use of data-driven methods enables better efficiency than other general-purpose methods and flexible computation without accessing the kernel function. Experiments demonstrate significantly improved memory efficiency of the proposed data-driven method compared to interpolation-based methods over a wide range of kernels. For the Coulomb kernel, the proposed general-purpose algorithm offers competitive performance compared to FMM and its variants, such as PVFMM. The data-driven approach not only works for general kernels but also leads to much smaller precomputation costs compared to PVFMM. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. A Hierarchical Security Event Correlation Model for Real-Time Threat Detection and Response.

Author: Maosa, Herbert, Ouazzane, Karim, and Ghanem, Mohamed Chahine
Subjects: INTRUSION detection systems (Computer security), FIREWALLS (Computer security), DATA security failures, CLUSTER analysis (Statistics), DATA reduction
Abstract: An intrusion detection system (IDS) perform postcompromise detection of security breaches whenever preventive measures such as firewalls do not avert an attack. However, these systems raise a vast number of alerts that must be analyzed and triaged by security analysts. This process is largely manual, tedious, and time-consuming. Alert correlation is a technique that reduces the number of intrusion alerts by aggregating alerts that are similar in some way. However, the correlation is performed outside the IDS through third-party systems and tools, after the IDS has already generated a high volume of alerts. These third-party systems add to the complexity of security operations. In this paper, we build on the highly researched area of alert and event correlation by developing a novel hierarchical event correlation model that promises to reduce the number of alerts issued by an intrusion detection system. This is achieved by correlating the events before the IDS classifies them. The proposed model takes the best features from similarity and graph-based correlation techniques to deliver an ensemble capability not possible by either approach separately. Further, we propose a correlation process for events rather than alerts as is the case in the current art. We further develop our own correlation and clustering algorithm which is tailor-made to the correlation and clustering of network event data. The model is implemented as a proof of concept with experiments run on standard intrusion detection sets. The correlation achieves an 87% data reduction through aggregation, producing nearly 21,000 clusters in about 30 s. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Low-Cost Data-Driven Robot Collision Localization Using a Sparse Modular Point Matrix.

Author: Lin, Haoyu, Quan, Pengkun, Liang, Zhuo, Wei, Dongbo, and Di, Shichun
Subjects: CONVOLUTIONAL neural networks, ELECTRIC charge, SUPPORT vector machines, MOBILE robots, FEATURE extraction, ROBOTS, DATA reduction
Abstract: In the context of automatic charging for electric vehicles, collision localization for the end-effector of robots not only serves as a crucial visual complement but also provides essential foundations for subsequent response design. In this scenario, data-driven collision localization methods are considered an ideal choice. However, due to the typically high demands on the data scale associated with such methods, they may significantly increase the construction cost of models. To mitigate this issue to some extent, in this paper, we propose a novel approach for robot collision localization based on a sparse modular point matrix (SMPM) in the context of automatic charging for electric vehicles. This method, building upon the use of collision point matrix templates, strategically introduces sparsity to the sub-regions of the templates, aiming to reduce the scale of data collection. Additionally, we delve into the exploration of data-driven models adapted to SMPMs. We design a feature extractor that combines a convolutional neural network (CNN) with an echo state network (ESN) to perform adaptive feature extraction on collision vibration signals. Simultaneously, by incorporating a support vector machine (SVM) as a classifier, the model is capable of accurately estimating the specific region in which the collision occurs. The experimental results demonstrate that the proposed collision localization method maintains a collision localization accuracy of 91.27% and a collision localization RMSE of 1.46 mm, despite a 48.15% reduction in data scale. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Filling the Gaps: Using Synthetic Low-Altitude Aerial Images to Increase Operational Design Domain Coverage.

Author: Rüter, Joachim, Maienschein, Theresa, Schirmer, Sebastian, Schopferer, Simon, and Torens, Christoph
Subjects: OBJECT recognition (Computer vision), DATA reduction, MACHINE learning, DRONE aircraft, DATA management, VIDEO surveillance
Abstract: A key necessity for the safe and autonomous flight of Unmanned Aircraft Systems (UAS) is their reliable perception of the environment, for example, to assess the safety of a landing site. For visual perception, Machine Learning (ML) provides state-of-the-art results in terms of performance, but the path to aviation certification has yet to be determined as current regulation and standard documents are not applicable to ML-based components due to their data-defined properties. However, the European Union Aviation Safety Agency (EASA) published the first usable guidance documents that take ML-specific challenges, such as data management and learning assurance, into account. In this paper, an important concept in this context is addressed, namely the Operational Design Domain (ODD) that defines the limitations under which a given ML-based system is designed to operate and function correctly. We investigated whether synthetic data can be used to complement a real-world training dataset which does not cover the whole ODD of an ML-based system component for visual object detection. The use-case in focus is the detection of humans on the ground to assess the safety of landing sites. Synthetic data are generated using the methods proposed in the EASA documents, namely augmentations, stitching and simulation environments. These data are used to augment a real-world dataset to increase ODD coverage during the training of Faster R-CNN object detection models. Our results give insights into the generation techniques and usefulness of synthetic data in the context of increasing ODD coverage. They indicate that the different types of synthetic images vary in their suitability but that augmentations seem to be particularly promising when there is not enough real-world data to cover the whole ODD. By doing so, our results contribute towards the adoption of ML technology in aviation and the reduction of data requirements for ML perception systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Meeting Diverse Learning Needs: Exploring Effective Sociology Teacher Strategies in Differentiated Learning.

Author: Tenri Awaru, A. Octamaya, Said Ahmad, M. Ridwan, Sadriani, Andi, and Fajri Maulana, Muh.
Subjects: PROBLEM-based learning, LEARNING strategies, TEACHERS, SOCIOLOGY, GROUP formation, LEARNING, DATA reduction
Abstract: A major challenge for sociology teachers is to teach students the complex sociology subject matter based on their learning needs. The purpose of this study is to explore the strategies used by sociology teachers in applying differentiated learning at the high school level in Makassar City. This research is a qualitative research that uses teachers and students as research informants. Data were collected through interviews, observations, and documentation. Data analysis was done through the stages of data collection, data reduction, and data presentation. Results show that sociology teachers in Makassar City have implemented effective strategies for differentiated learning. These include the use of diverse learning resources other than textbooks, namely audiovisual materials in the form of videos and presentation slides; visual materials in the form of infographics, images, and maps; utilizing relevant articles from blogs and Conference Papers; and initial assessments to identify students’ individual needs in the form of questionnaire filling, diagnostic tests, and pretests. Group formation is flexible based on the student’s ability and group members are rotated in each meeting, applying various learning methods and models, such as problem-based learning, and combining it with cooperative learning models. The teachers also provide additional support in the form of having discussions with students, answering their questions, assigning different assignments, and providing additional material. The findings of this research are expected to contribute to developing different learning approaches, especially in sociology subjects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Dual-Graph-Regularization Constrained Nonnegative Matrix Factorization with Label Discrimination for Data Clustering.

Author: Li, Jie, Li, Yaotang, and Li, Chaoqian
Subjects: MATRIX decomposition, NONNEGATIVE matrices, DATA reduction, MACHINE learning, DATA visualization
Abstract: NONNEGATIVE matrix factorization (NMF) is an effective technique for dimensionality reduction of high-dimensional data for tasks such as machine learning and data visualization. However, for practical clustering tasks, traditional NMF ignores the manifold information of both the data space and feature space, as well as the discriminative information of the data. In this paper, we propose a semisupervised NMF called dual-graph-regularization-constrained nonnegative matrix factorization with label discrimination (DCNMFLD). DCNMFLD combines dual graph regularization and prior label information as additional constraints, making full use of the intrinsic geometric and discriminative structures of the data, and can efficiently enhance the discriminative and exclusionary nature of clustering and improve the clustering performance. The evaluation of the clustering experimental results on four benchmark datasets demonstrates the effectiveness of our new algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. El problema de la reducción dimensional. Análisis de Componentes Principales (PCA).

Author: Pernice, Sergio A.
Subjects: PRINCIPAL components analysis, DATA reduction, MACHINE learning
Abstract: Copyright of Revista Mutis is the property of Revista Mutis and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

33. Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model-Based Augmentation †.

Author: Macaluso, Girolamo, Sestini, Alessandro, and Bagdanov, Andrew D.
Subjects: REINFORCEMENT learning, DATA augmentation, STOCHASTIC convergence, DATA reduction, AUTOMATIC data collection systems
Abstract: Offline reinforcement learning leverages pre-collected datasets of transitions to train policies. It can serve as an effective initialization for online algorithms, enhancing sample efficiency and speeding up convergence. However, when such datasets are limited in size and quality, offline pre-training can produce sub-optimal policies and lead to a degraded online reinforcement learning performance. In this paper, we propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective. Our approach leverages a world model of the environment trained on the offline dataset to augment states during offline pre-training. We evaluate our approach on a variety of MuJoCo robotic tasks, and our results show that it can jumpstart online fine-tuning and substantially reduce—in some cases by an order of magnitude—the required number of environment interactions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Geometric multidimensional scaling: efficient approach for data dimensionality reduction.

Author: Dzemyda, Gintautas and Sabaliauskas, Martynas
Subjects: MULTIDIMENSIONAL scaling, DATA reduction, GLOBAL optimization, PARALLEL algorithms
Abstract: Multidimensional scaling (MDS) is an often-used method to reduce the dimensionality of multidimensional data nonlinearly and to present the data visually. MDS minimizes some stress function which variables are coordinates of points in the projected lower-dimensional space. Recently, the so-called Geometric MDS has been developed, where the stress function and multidimensional scaling, in general, are considered from the geometric point of view. Using ideas of Geometric MDS, it is possible to construct the iterative procedure of minimization of the stress where coordinates of a separate point of the projected space are moved to the new position defined analytically. In this paper, we discover and prove the main advantage of Geometric MDS theoretically: changing the position of all the points of the projected space simultaneously (independently of each other) in the directions and with steps, defined analytically by Geometric MDS strategy for a separate point, decreases the MDS stress. Moreover, the analytical updating of coordinates of projected points in each iteration has a simple geometric interpretation. New properties of Geometric MDS have been discovered. The obtained results allow us for the future development of a class of new both sequential and parallel algorithms. Ideas for global optimization of the stress are highlighted. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Boosting interclass boundary preservation (BIBP): a KD-tree enhanced data reduction algorithm

Author: Fuangkhon, Piyabute
Published: 2024
Full Text: View/download PDF

36. Diagnosis of high‐speed railway ballastless track arching based on unsupervised learning framework.

Author: Tang, Xueyang, Wang, Yi, Cai, Xiaopei, Yang, Fei, and Hou, Yue
Subjects: *INFRASTRUCTURE (Economics), *FEATURE extraction, *CONCEPT learning, *METHODS engineering, *DATA reduction
Abstract: Vehicle‐mounted detection methods have been widely applied in the maintenance of high‐speed railways (HSRs), providing feasibility for diagnosing ballastless track arching. However, applying detection data faces several key limitations: (1) The threshold mostly requires manual setting, making recognition accuracy highly subjective; (2) the extensive workload of manual inspections makes it challenging to label detection data, hindering the application of supervised learning approaches. To address these problems, this paper utilizes the longitudinal level irregularity data obtained from vehicle‐mounted detection, employing the concept of unsupervised learning for dimensionality reduction, combined with clustering algorithms and minimal label fine‐tuning, to design two frameworks: the fully unsupervised framework (FUF) and the few‐shot fine‐tuned framework (FFF). Experiments on dynamic detection data from a Chinese HSR line were conducted, comparing the performance of data dimensionality reduction, clustering, and classification under different strategy combinations. The results show that the improved variational autoencoder significantly enhances the performance of the encoder in dimensionality reduction, facilitating better feature extraction; the FUF achieves effective clustering outcomes without any labeled samples and its adjusted rand index score exceeded 0.8, showcasing its robustness and applicability in scenarios with no prior annotations; the FFF requires only a small number of labeled samples (labeling ratio of 5%) and achieves excellent performance, with metrics such as accuracy exceeding 0.85, thus greatly reducing the reliance on labeled data. This study offers a novel method for solving engineering issues with limited labeled data, providing an efficient solution for identifying track arching defects and advancing railway infrastructure monitoring. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Hybrid optimization‐based topology construction and DRNN‐based prediction method for data reduction in IoT.

Author: Pawar, Bhakti B. and Jadhav, Devyani S.
Subjects: *RECURRENT neural networks, *WIRELESS sensor networks, *DATA reduction, *INTERNET of things, *PREDICTION models
Abstract: Summary The Internet of Things (IoT) acts as a prevalent networking setup that plays a vital role in everyday activities due to the increased services provided through uniform data collection. In this research paper, a hybrid optimization approach for the construction of heterogeneous multi‐hop IoT wireless sensor network (WSN) network topology and data aggregation and reduction is performed using a deep learning model. Initially, the IoT network is stimulated and the network topology is constructed using Namib Beetle Spotted Hyena Optimization (NBSHO) by considering different network parameters and encoding solutions. Moreover, the data aggregation and reduction in the IoT network are performed using a Deep Recurrent Neural Network (DRNN)‐based prediction model. In addition, the performance improvement of the designed NBSHO + DRNN approach is validated. Here, the designed NBSHO + DRNN method achieved a packet delivery ratio (PDR) of 0.469, energy of 0.367 J, prediction error of 0.237, and delay of 0.595 s. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Supervised maximum variance unfolding.

Author: Yang, Deliang and Qi, Hou-Duo
Subjects: MULTIDIMENSIONAL scaling, DATA structures, EUCLIDEAN distance, DATA reduction, DATA visualization
Abstract: Maximum Variance Unfolding (MVU) is among the first methods in nonlinear dimensionality reduction for data visualization and classification. It aims to preserve local data structure and in the meantime push the variance among data as big as possible. However, MVU in general remains a computationally challenging problem and this may explain why it is less popular than other leading methods such as Isomap and t-SNE. In this paper, based on a key observation that the structure-preserving term in MVU is actually the squared stress in Multi-Dimensional Scaling (MDS), we replace the term with the stress function from MDS, resulting in a model that is usable. The property of the usability guarantees the "crowding phenomenon" will not happen in the dimension reduced results. The new model also allows us to combine label information and hence we call it the supervised MVU (SMVU). We then develop a fast algorithm that is based on Euclidean distance matrix optimization. By making use of the majorization-mininmization technique, the algorithm at each iteration solves a number of one-dimensional optimization problems, each having a closed-form solution. This strategy significantly speeds up the computation. We demonstrate the advantage of SMVU on some standard data sets against a few leading algorithms including Isomap and t-SNE. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Demonstration of neutron time‐of‐flight diffraction with an event‐mode imaging detector.

Author: Jäger, Tim T., Losko, Adrian S., Wolfertz, Alexander, Schmidt, Søren, Bertelsen, Mads, Khaplanov, Anton, Agnew, Sean R., Funama, Fumiaki, Morgano, Manuel, Roth, Markus, Gochanour, Jason R., Long, Alexander M., Lutterotti, Luca, and Vogel, Sven C.
Subjects: *NEUTRON counters, *DATA acquisition systems, *IMAGE converters, *RIETVELD refinement, *DATA reduction, *SCINTILLATORS, *NEUTRON diffraction
Abstract: Neutron diffraction beamlines have traditionally relied on deploying large detector arrays of 3He tubes or neutron‐sensitive scintillators coupled with photomultipliers to efficiently probe crystallographic and microstructure information of a given material. Given the large upfront cost of custom‐made data acquisition systems and the recent scarcity of 3He, new diffraction beamlines or upgrades to existing ones demand innovative approaches. This paper introduces a novel Timepix3‐based event‐mode imaging neutron diffraction detector system as well as first results of a silicon powder diffraction measurement made at the HIPPO neutron powder diffractometer at the Los Alamos Neutron Science Center. Notably, these initial measurements were conducted simultaneously with the 3He array on HIPPO, enabling direct comparison. Data reduction for this type of data was implemented in the MAUD code, enabling Rietveld analysis. Results from the Timepix3‐based setup and HIPPO were benchmarked against McStas simulations, showing good agreement for peak resolution. With further development, systems such as the one presented here may substantially reduce the cost of detector systems for new neutron instrumentation as well as for upgrades of existing beamlines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. A Globally Convergent Inertial First-Order Optimization Method for Multidimensional Scaling.

Author: Ram, Noga and Sabach, Shoham
Subjects: *MULTIDIMENSIONAL scaling, *NONSMOOTH optimization, *DATA reduction, *DATA visualization, *ALGORITHMS
Abstract: Multidimensional scaling (MDS) is a popular tool for dimensionality reduction and data visualization. Given distances between data points and a target low-dimension, the MDS problem seeks to find a configuration of these points in the low-dimensional space, such that the inter-point distances are preserved as well as possible. We focus on the most common approach to formulate the MDS problem, known as stress minimization, which results in a challenging non-smooth and non-convex optimization problem. In this paper, we propose an inertial version of the well-known SMACOF Algorithm, which we call AI-SMACOF. This algorithm is proven to be globally convergent, and to the best of our knowledge this is the first result of this kind for algorithms aiming at solving the stress MDS minimization. In addition to the theoretical findings, numerical experiments provide another evidence for the superiority of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. A novel semi data dimension reduction type weighting scheme of the multi-model ensemble for accurate assessment of twenty-first century drought.

Author: Mukhtar, Alina, Ali, Zulfiqar, Nazeer, Amna, Dhahbi, Sami, Kartal, Veysi, and Deebani, Wejdan
Subjects: *CLIMATE change models, *DROUGHT management, *MARKOV processes, *TWENTY-first century, *DATA reduction
Abstract: Accurately and reliably predicting droughts under multiple models of Global Climate Models (GCMs) is a challenging task. To address this challenge, the Multimodel Ensemble (MME) method has become a valuable tool for merging multiple models and producing more accurate forecasts. This paper aims to enhance drought monitoring modules for the twenty-first century using multiple GCMs. To achieve this goal, the research introduces a new weighing paradigm called the Multimodel Homo-min Pertinence-max Hybrid Weighted Average (MHmPmHWAR) for the accurate aggregation of multiple GCMs. Secondly, the research proposes a new drought index called the Condensed Multimodal Multi-Scalar Standardized Drought Index (CMMSDI). To assess the effectiveness of MHmPmHWAR, the research compared its findings with the Simple Model Average (SMA). In the application, eighteen different GCM models of the Coupled Model Intercomparison Project Phase 6 (CMIP6) were considered at thirty-two grid points of the Tibet Plateau region. Mann–Kendall (MK) test statistics and Steady States Probabilities (SSPs) of Markov chain were used to assess the long-term trend in drought and its classes. The analysis of trends indicated that the number of grid points demonstrating an upward trend was significantly greater than those displaying a downward trend in terms of spatial coverage, at a significance level of 0.05. When examining scenario SSP1-2.6, the probability of moderate wet and normal drought was greater in nearly all temporal scales than other categories. The outcomes of SSP2-4.5 demonstrated that the likelihoods of moderate drought and normal drought were higher than other classifications. Additionally, the results of SSP5-8.5 were comparable to those of SSP2-4.5, underscoring the importance of taking effective actions to alleviate drought impacts in the future. The results demonstrate the effectiveness of the MHmPmHWAR and CMMSDI approaches in predicting droughts under multiple GCMs, which can contribute to effective drought monitoring and management. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. A method for predicting photovoltaic output power based on PCC-GRA-PCA meteorological elements dimensionality reduction method.

Author: Yang, Lingsheng, Cui, Xiangyu, and Li, Wei
Subjects: DIMENSION reduction (Statistics), PEARSON correlation (Statistics), PRINCIPAL components analysis, PHOTOVOLTAIC power generation, DATA reduction, PHOTOVOLTAIC power systems
Abstract: Photovoltaic (PV) power generation forecasting models require a large amount of meteorological data, which may include irrelevant and redundant information. As the volume of data increases, the dataset is likely to contain a significant amount of irrelevant and redundant information. This paper proposes a method for reducing dimensionality based on PCC-GRA-PCA method, which aims to simplify the model and reduce computational complexity. Firstly, the dimension reduction method analyzes the feature importance of various meteorological elements by using Pearson Correlation Coefficient (PCC) and Grey Relation Analysis (GRA), which can achieve the preliminary dimension reduction of data by selecting the most relevant features. Next, the data is processed using Principal Component Analysis (PCA) to achieve a secondary dimension reduction of meteorological data through feature transformation. Finally, a photovoltaic power prediction model has been established using the OVMD-tSSA-LSSVM algorithm. After analysis, it was found that the prediction model showed improvements in R2, MAE, RMSE, and MAPE after PCC-GRA-PCA dimensionality reduction compared to the prediction model before dimensionality reduction, as well as the prediction model after LDA and PCA dimensionality reduction. This demonstrates the effectiveness of reducing data dimensionality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Revisiting data reduction for boolean matrix factorization algorithms based on formal concept analysis

Author: Yang, Lanzhen, Tsang, Eric C. C., Mao, Hua, Zhang, Chengling, and Wu, Jiaming
Published: 2024
Full Text: View/download PDF

44. Reduction Through Homogeneous Clustering: Variations for Categorical Data and Fast Data Reduction

Author: Ougiaroglou, Stefanos, Papadimitriou, Nikolaos, and Evangelidis, Georgios
Published: 2024
Full Text: View/download PDF

45. Sampling approaches to reduce very frequent seasonal time series.

Author: Baldo, Afonso, Ferreira, Paulo J. S., and Mendes‐Moreira, João
Subjects: *TIME series analysis, *SERVER farms (Computer network management), *DATA mining, *PROCESS capability, *TECHNOLOGICAL innovations, *MACHINE learning
Abstract: With technological advancements, much data is being captured by sensors, smartphones, wearable devices, and so forth. These vast datasets are stored in data centres and utilized to forge data‐driven models for the condition monitoring of infrastructures and systems through future data mining tasks. However, these datasets often surpass the processing capabilities of traditional information systems and methodologies due to their significant size. Additionally, not all samples within these datasets contribute valuable information during the model training phase, leading to inefficiencies. The processing and training of Machine Learning algorithms become time‐consuming, and storing all the data demands excessive space, contributing to the Big Data challenge. In this paper, we propose two novel techniques to reduce large time‐series datasets into more compact versions without undermining the predictive performance of the resulting models. These methods also aim to decrease the time required for training the models and the storage space needed for the condensed datasets. We evaluated our techniques on five public datasets, employing three Machine Learning algorithms: Holt‐Winters, SARIMA, and LSTM. The outcomes indicate that for most of the datasets examined, our techniques maintain, and in several instances enhance, the forecasting accuracy of the models. Moreover, we significantly reduced the time required to train the Machine Learning algorithms employed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Asteroid spin and shape properties from Gaia DR3 photometry.

Author: Cellino, A., Tanga, P., Muinonen, K., and Mignard, F.
Subjects: *SMALL solar system bodies, *SOLAR system, *DATA release, *ELECTRONIC data processing, *DATA reduction
Abstract: Context. The third data release of Gaia, in June 2022, included the first large sample of sparse photometric data for more than 150 000 Solar System objects (SSOs), mainly asteroids. Aims. The SSO photometric data can be processed to derive information on the physical properties for a large number of objects, including spin properties, surface photometric behaviour in a variety of illumination conditions, and overall shape. Methods. After selecting a set of 22 815 objects for which an adequate number of accurate photometric measurements had been obtained by Gaia, we applied the 'genetic' algorithm of photometric inversion developed by the Gaia Data Processing and Analysis Consortium to process SSO photometric data. Given the need to minimise the required data processing time, the algorithm was set to adopt a simple triaxial ellipsoid shape model. Results. Our results show that in spite of the limited variety of observing circumstances and the limited numbers of measurements per object at present (in the majority of cases no greater than 40 and still far from the number expected at the end of the mission of about 60–70), the proportion of correct determinations for the spin period among the observed targets is about 85%. This percentage is based on a comparison with reliable literature data following a moderate filtering procedure developed to remove dubious solutions. Conclusions. The analysis performed in this paper is important in the context of developing further improvements to the adopted data reduction procedure. This includes the possible development of better solution filtering procedures that take into account, for each object, the possible presence of multiple, equivalent spin period solutions that have not been systematically investigated in this preliminary application. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Virtual Integration of Satellite and In-situ Observation Networks (VISION) v1.0: In-Situ Observations Simulator.

Author: Russo, Maria Rosa, Bartholomew, Sadie L., Hassell, David, Mason, Alex M., Neininger, Erica, Perman, A. James, Sproson, David A. J., Watson-Parris, Duncan, and Abraham, Nathan Luke
Subjects: *BUOYS, *RESEARCH aircraft, *DATA reduction, *DATA modeling, *ATMOSPHERIC models, *MODELS & modelmaking, *FLIGHT simulators
Abstract: This work presents the first step in the development of the VISION toolkit, a set of python tools that allows for easy, efficient and more meaningful comparison between global atmospheric models and observational data. Whilst observational data and modelling capabilities are expanding in parallel, there are still barriers preventing these two data sources to be used in synergy. This arises from differences in spatial and temporal sampling between models and observational platforms: observational data from a research aircraft, for example, is sampled on specified flight trajectories at very high temporal resolution. Proper comparison with model data requires generating, storing and handling a large amount of highly temporally resolved model files, resulting in a process which is data, labour, and time intensive. In this paper we focus on comparison between model data and in-situ observations (from aircrafts, ships, buoys, sondes etc.). A stand-alone code, In-Situ Observation simulator, or ISO_simulator in short, is described here: this software reads modelled variables and observational data files and outputs model data interpolated in space and time to match observations. This model data is then written to NetCDF files that can be efficiently archived, due to their small sizes, and directly compared to observations. This method achieves a large reduction in the size of model data being produced for comparison with flight and other in-situ data. By interpolating global, gridded, hourly files onto observations locations, we reduce data output for a typical climate resolution run, from ~3 Gb per model variable per month to ~15 Mb per model variable per month (a 200 times reduction in data volume). The VISION toolkit is fast and easy to use, therefore enabling the exploitation of large observational datasets spanning decades, to be used for large scale model evaluation. Although this code has been initially tested within the Unified Model (UM) framework, which is shared by the UK Earth System Model (UKESM), it was written as a flexible tool and it can be extended to work with other models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Wavelet-based 3D Data Cube Denoising Using Three Scales of Dependency.

Author: Chen, Guang Yi and Krzyzak, Adam
Subjects: *NOISE control, *CUBES, *WAVELET transforms, *DATA reduction, *THRESHOLDING algorithms, *NOISE
Abstract: In this paper, we propose a novel method for 3D data cube denoising, where the 3D data cube is corrupted by noise with spatially varying noise levels. We perform 3D dual tree complex wavelet transform (DTCWT) to the 3D data cube, and then conduct wavelet-based thresholding based on three scales of dependency in wavelet coefficients. Instead of using the global noise level, we estimate the noise levels locally, which improve the denoising results substantially. We conduct inverse DTCWT to obtain the noise reduced data cubes. Experiments demonstrate that our proposed method outperforms block matching and 3D filtering, video block matching and 3D filtering, 2D bivariate shrinkage, and 3D bivariate shrinkage significantly for noise reduction of 3D data cubes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Enhancing Rheumatic Fever Analysis via Tritopological Approximation Spaces for Data Reduction.

Author: Nawar, A. S., Abu-Gdairi, R., El-Bably, M. K., and Atallah, H. M.
Subjects: *RHEUMATIC fever, *DATA reduction, *MEMBERSHIP functions (Fuzzy logic), *TOPOLOGICAL spaces, *SYMPTOMS, *ROUGH sets
Abstract: This paper introduces the concept of tritopological approximation space, extending conventional approximation space by drawing upon topological spaces and precisely defined binary relations within a universe of discourse. Through meticulous construction of subbases, this progressive paradigm shift facilitates a comprehensive analysis of rough sets within the domain of tritopological approximation spaces. Additionally, the study pioneer’s multiple membership functions and inclusion functions, enhancing the analytical framework and enabling more effective redefinition of rough approximations. To illustrate the practical advantages, real-life application examples are presented, focusing on the implementation of data reduction methods within the context of rheumatic fever—a prevalent disease characterized by diverse symptoms among patients, despite a consistent diagnosis. This research contributes to the advancement of rough set theory and its applications in addressing complex, real-world problems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Immediate traceability of ash based on laser-induced breakdown spectroscopy and machine learning.

Author: Cai, Yuyao, Wan, Enlai, Cai, Jinzhu, Zhang, Xinyang, Ye, Yanpeng, Yin, Yijun, and Liu, Yuzhu
Subjects: *LASER-induced breakdown spectroscopy, *MACHINE learning, *FISHER discriminant analysis, *RANDOM forest algorithms, *ENVIRONMENTAL monitoring, *DATA reduction
Abstract: This article reports on an advanced level physical laboratory experiment designed for college level undergraduate education and for scholars who need specialized training on using and interpreting Laser-Induced Breakdown Spectroscopy. The technical principle, experimental operation and sample preparation of Laser-Induced Breakdown Spectroscopy are introduced in detail. A presentation and discussion of the use of Laser-Induced Breakdown Spectroscopy in the traceability of four common samples is emphasized. Combining Laser-Induced Breakdown Spectroscopy with Machine Learning, two distinct datasets are constructed through the extraction of spectral features. Dimensionality reduction of spectral data is performed using Linear Discriminant Analysis, while the Random Forest model is employed for provenance classification. Finally, the interpretability of the Random Forest model is leveraged to explore the contributions of different spectral elements to provenance tracing. Results demonstrated the system's effectiveness in not only accurately identifying ash types but also in elucidating the influential chemical components, offering significant implications for material analysis and environmental monitoring. On an educational standpoint, this paper will allow any reader, in particular, undergraduate and graduate students, to gain a better understanding of the theory and practice of Laser-Induced Breakdown Spectroscopy and machine learning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

176 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources