26 results on '"Yu, Yi"'
Search Results
2. Identification of pathology-confirmed vulnerable atherosclerotic lesions by coronary computed tomography angiography using radiomics analysis
- Author
-
Li, Xiang-nan, Yin, Wei-hua, Sun, Yang, Kang, Han, Luo, Jie, Chen, Kuan, Hou, Zhi-hui, Gao, Yang, Ren, Xin-shuang, Yu, Yi-tong, An, Yun-qiang, Zhang, Yan, Wang, Hong-yue, and Lu, Bin
- Published
- 2022
- Full Text
- View/download PDF
3. Peri-lesion regions in differentiating suspicious breast calcification-only lesions specifically on contrast enhanced mammography.
- Author
-
Cao, Kun, Gao, Fei, Long, Rong, Zhang, Fan-Dong, Huang, Chen-Cui, Cao, Min, Yu, Yi-Zhou, and Sun, Ying-Shi
- Subjects
BREAST ,MACHINE learning ,CALCIFICATIONS of the breast ,MAMMOGRAMS ,CALCIFICATION - Abstract
PURPOSE: The explore the added value of peri-calcification regions on contrast-enhanced mammography (CEM) in the differential diagnosis of breast lesions presenting as only calcification on routine mammogram. METHODS: Patients who underwent CEM because of suspicious calcification-only lesions were included. The test set included patients between March 2017 and March 2019, while the validation set was collected between April 2019 and October 2019. The calcifications were automatically detected and grouped by a machine learning-based computer-aided system. In addition to extracting radiomic features on both low-energy (LE) and recombined (RC) images from the calcification areas, the peri-calcification regions, which is generated by extending the annotation margin radially with gradients from 1 mm to 9 mm, were attempted. Machine learning (ML) models were built to classify calcifications into malignant and benign groups. The diagnostic matrices were also evaluated by combing ML models with subjective reading. RESULTS: Models for LE (significant features: wavelet-LLL_glcm_Imc2_MLO; wavelet-HLL_firstorder_Entropy_MLO; wavelet-LHH_glcm_DifferenceVariance_CC; wavelet-HLL_glcm_SumEntropy_MLO;wavelet-HLH_glrlm_ShortRunLowGray LevelEmphasis_MLO; original_firstorder_Entropy_MLO; original_shape_Elongation_MLO) and RC (significant features: wavelet-HLH_glszm_GrayLevelNonUniformityNormalized_MLO; wavelet-LLH_firstorder_10Percentile_CC; original_firstorder_Maximum_MLO; wavelet-HHH_glcm_Autocorrelation_MLO; original_shape_Elongation_MLO; wavelet-LHL_glszm_GrayLevelNonUniformityNormalized_MLO; wavelet-LLH_firstorder_RootMeanSquared_MLO) images were set up with 7 features. Areas under the curve (AUCs) of RC models are significantly better than those of LE models with compact and expanded boundary (RC v.s. LE, compact: 0.81 v.s. 0.73, p < 0.05; expanded: 0.89 v.s. 0.81, p < 0.05) and RC models with 3 mm boundary extension yielded the best performance compared to those with other sizes (AUC = 0.89). Combining with radiologists' reading, the 3mm-boundary RC model achieved a sensitivity of 0.871 and negative predictive value of 0.937 with similar accuracy of 0.843 in predicting malignancy. CONCLUSIONS: The machine learning model integrating intra- and peri-calcification regions on CEM has the potential to aid radiologists' performance in predicting malignancy of suspicious breast calcifications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Evaluation on interactive waiting experience design of mobile internet products based on machine learning.
- Author
-
Yu, Yi
- Subjects
- *
MACHINE learning , *COST functions , *WIRELESS Internet , *USER experience , *SATISFACTION , *PRODUCT attributes - Abstract
In today's rapidly changing economy, efficient lifestyle has become the current situation of most mobile product users. With the development of performance tools and technologies, a fast lifestyle has brought more wealth and opportunities to users. The slow pace and fluctuating time are indirect income losses, which cause user anxiety to some extent. When the waiting time exceeds the user's waiting threshold, users would experience negative emotions, such as boredom, anxiety and anger, and product satisfaction would drop significantly. Therefore, by analyzing the uniqueness of mobile Internet products and the characteristics of users, this paper studied the reasons and influencing factors of product interactive waiting, and then used machine learning algorithm to analyze the cost function of interactive waiting experience. Finally, the corresponding interactive waiting experience design strategy was proposed. By comparison, the user experience after product interaction optimization design was 8.4% higher than that before product interaction optimization design, and the user frequency was also 14.7% higher after optimization design. In short, user experience plays an important role in product interaction design. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Identification of three potential novel biomarkers for early diagnosis of acute ischemic stroke via plasma lipidomics.
- Author
-
Yu, Yi, Wen, Xue, Lin, Jin-Guang, Liu, Jun, Liang, Hong-Feng, Lin, Shan-Wen, Xu, Qiu-Gui, and Li, Ji-Cheng
- Subjects
- *
ISCHEMIC stroke , *MACHINE learning , *LIQUID chromatography-mass spectrometry , *LIPIDOMICS , *RANDOM forest algorithms , *EARLY diagnosis , *BLOOD lipids - Abstract
Introduction: Acute ischemic stroke (AIS) accounts for the majority of all stroke, globally the second leading cause of death. Due to its rapid development after onset, its early diagnosis is crucial. Objectives: We aim to identify potential highly reliable blood-based biomarkers for early diagnosis of AIS using quantitative plasma lipid profiling via a machine learning approach. Methods: Lipidomics was used for quantitative plasma lipid profiling, based on ultra-performance liquid chromatography tandem mass spectrometry. Our samples were divided into a discovery and a validation set, each containing 30 AIS patients and 30 health controls (HC). Differentially expressed lipid metabolites were screened based on the criteria VIP > 1, p < 0.05, and fold change > 1.5 or < 0.67. The least absolute shrinkage and selection operator (LASSO) and random forest algorithms in machine learning were used to select differential lipid metabolites as potential biomarkers. Results: Three key differential lipid metabolites, CarnitineC10:1, CarnitineC10:1-OH and Cer(d18:0/16:0), were identified as potential biomarkers for early diagnosis of AIS. The former two, associated with thermogenesis, were down-regulated, whereas the latter, associated with necroptosis and sphingolipd metabolism, was upregulated. Univariate and multivariate logistic regressions showed that these three lipid metabolites and the resulting diagnostic model exhibited a strong ability in discriminating between AIS patients and HCs in both the discovery and validation sets, with an area under the curve above 0.9. Conclusions: Our work provides valuable information on the pathophysiology of AIS and constitutes an important step toward clinical application of blood-based biomarkers for diagnosing AIS. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. Preoperative Diagnosis of Dual‐Phenotype Hepatocellular Carcinoma Using Enhanced MRI Radiomics Models.
- Author
-
Wu, Qian, Yu, Yi‐xing, Zhang, Tao, Zhu, Wen‐jing, Fan, Yan‐fen, Wang, Xi‐ming, and Hu, Chun‐hong
- Subjects
RADIOMICS ,RECEIVER operating characteristic curves ,MACHINE learning ,CONTRAST-enhanced magnetic resonance imaging ,FEATURE extraction - Abstract
Background: Dual‐phenotype hepatocellular carcinoma (DPHCC) is highly aggressive and difficult to distinguish from hepatocellular carcinoma (HCC). Purpose: To develop and validate clinical and radiomics models based on contrast‐enhanced MRI for the preoperative diagnosis of DPHCC. Study type: Retrospective. Population: A total of 87 patients with DPHCC and 92 patients with non‐DPHCC randomly divided into a training cohort (n = 125: 64 non‐DPHCC; 61 DPHCC) and a validation cohort (n = 54: 28 non‐DPHCC; 26 DPHCC). Field Strength/Sequence: A 3.0 T; dynamic contrast‐enhanced MRI with time‐resolved T1‐weighted imaging sequence. Assessment: In the clinical model, the maximum tumor diameter and hepatitis B virus (HBV) were independent risk factors of DPHCC. In the radiomics model, a total of 1781 radiomics features were extracted from tumor volumes of interest (VOIs) in the arterial phase (AP) and portal venous phase (PP) images. For feature reduction and selection, Pearson correlation coefficient (PCC) and recursive feature elimination (RFE) were used. Clinical, AP, PP, and combined radiomics models were established using machine learning algorithms (support vector machine [SVM], logistic regression [LR], and logistic regression‐least absolute shrinkage and selection operator [LR‐LASSO]) and their discriminatory efficacy assessed and compared. Statistical Tests: The independent sample t test, Mann–Whitney U test, Chi‐square test, regression analysis, receiver operating characteristic curve (ROC) analysis, Pearson correlation analysis, the Delong test. A P value < 0.05 was considered statistically significant. Results: In the validation cohort, the combined radiomics model (area under the curve [AUC] = 0.908, 95% confidence interval [CI]: 0.831–0.985) showed the highest diagnostic performance. The AUCs of the PP (AUC = 0.879, 95% CI: 0.779–0.979) and combined radiomics models were significantly higher than that of clinical model (AUC = 0.685, 95% CI: 0.526–0.844). There were no significant differences in AUC between AP or PP radiomics model and combined radiomics model (P = 0.286, 0.180 and 0.543). Conclusion: MRI radiomics models may be useful for discriminating DPHCC from non‐DPHCC before surgery. Evidence Level: 4 Technical Efficacy: Stage 2 [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Conditional hybrid GAN for melody generation from lyrics.
- Author
-
Yu, Yi, Zhang, Zhe, Duan, Wei, Srivastava, Abhishek, Shah, Rajiv, and Ren, Yi
- Subjects
- *
MELODY , *MACHINE learning , *MOVEMENT sequences - Abstract
Conditional sequence generation aims to instruct the generation procedure by conditioning the model with additional context information, which is an interesting research issue in AI and machine learning. Unfortunately, current state-of-the-art generative models for music fail to generate good melodies due to the discrete-valued property of music attributes. In this paper, we propose a novel conditional hybrid GAN (C-Hybrid-GAN) for melody generation from lyrics. Three discrete sequences corresponding to music attributes, namely pitch, duration, and rest, are separately generated by melody generation model conditioned on the same lyrics. Gumbel-Softmax is used to approximate the distribution of discrete-valued samples so as to directly generate discrete melody attributes. Most importantly, a hybrid structure is proposed, which contains three independent branches (each for one melody attribute) in the generator and one branch for distinguishing concatenated attributes in the discriminator. Relational memory core is exploited to model not only the dependency inside each sequence of attribute during the training of the generator, but also the consistency among three sequences of attributes during the training of the discriminator. Through extensive experiments using evaluation metrics, e.g., maximum mean discrepancy, average rest value, and MIDI number transition, we demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in melody generation from lyrics. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. 4 mC site recognition algorithm based on pruned pre-trained DNABert-Pruning model and fused artificial feature encoding.
- Author
-
Xie, Guo-Bo, Yu, Yi, Lin, Zhi-Yi, Chen, Rui-Bin, Xie, Jian-Hui, and Liu, Zhen-Guo
- Subjects
- *
GENE expression , *MACHINE learning , *NUCLEOTIDE sequence , *ALGORITHMS , *SEQUENCE spaces , *DEEP learning - Abstract
DNA 4 mC plays a crucial role in the genetic expression process of organisms. However, existing deep learning algorithms have shortcomings in the ability to represent DNA sequence features. In this paper, we propose a 4 mC site identification algorithm, DNABert-4mC, based on a fusion of the pruned pre-training DNABert-Pruning model and artificial feature encoding to identify 4 mC sites. The algorithm prunes and compresses the DNABert model, resulting in the pruned pre-training model DNABert-Pruning. This model reduces the number of parameters and removes redundancy from output features, yielding more precise feature representations while upholding accuracy.Simultaneously, the algorithm constructs an artificial feature encoding module to assist the DNABert-Pruning model in feature representation, effectively supplementing the information that is missing from the pre-trained features. The algorithm also introduces the AFF-4mC fusion strategy, which combines artificial feature encoding with the DNABert-Pruning model, to improve the feature representation capability of DNA sequences in multi-semantic spaces and better extract 4 mC sites and the distribution of nucleotide importance within the sequence. In experiments on six independent test sets, the DNABert-4mC algorithm achieved an average AUC value of 93.81%, outperforming seven other advanced algorithms with improvements of 2.05%, 5.02%, 11.32%, 5.90%, 12.02%, 2.42% and 2.34%, respectively. [Display omitted] • Train a new pruning pretraining model on DNA sequences to extract feature information. • The introduction of manually assisted features has expanded the search space for sequence feature. • Attention fusion strategy better promotes the fusion output of features from different dimensions. • The complementary nature of machine features and manual features better fits the target features. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Measurement of SARS-CoV-2 Antibody Titers Improves the Prediction Accuracy of COVID-19 Maximum Severity by Machine Learning in Non-Vaccinated Patients.
- Author
-
Kurano, Makoto, Ohmiya, Hiroko, Kishi, Yoshiro, Okada, Jun, Nakano, Yuki, Yokoyama, Rin, Qian, Chungen, Xia, Fuzhen, He, Fan, Zheng, Liang, Yu, Yi, Jubishi, Daisuke, Okamoto, Koh, Moriya, Kyoji, Kodama, Tatsuhiko, and Yatomi, Yutaka
- Subjects
ANTIBODY titer ,MACHINE learning ,SARS-CoV-2 ,BREAKTHROUGH infections ,COVID-19 ,FOOT & mouth disease ,INFECTION - Abstract
Numerous studies have suggested that the titers of antibodies against SARS-CoV-2 are associated with the COVID-19 severity, however, the types of antibodies associated with the disease maximum severity and the timing at which the associations are best observed, especially within one week after symptom onset, remain controversial. We attempted to elucidate the antibody responses against SARS-CoV-2 that are associated with the maximum severity of COVID-19 in the early phase of the disease, and to investigate whether antibody testing might contribute to prediction of the disease maximum severity in COVID-19 patients. We classified the patients into four groups according to the disease maximum severity (severity group 1 (did not require oxygen supplementation), severity group 2a (required oxygen supplementation at low flow rates), severity group 2b (required oxygen supplementation at relatively high flow rates), and severity group 3 (required mechanical ventilatory support)), and serially measured the titers of IgM, IgG, and IgA against the nucleocapsid protein, spike protein, and receptor-binding domain of SARS-CoV-2 until day 12 after symptom onset. The titers of all the measured antibody responses were higher in severity group 2b and 3, especially severity group 2b, as early as at one week after symptom onset. Addition of data obtained from antibody testing improved the ability of analysis models constructed using a machine learning technique to distinguish severity group 2b and 3 from severity group 1 and 2a. These models constructed with non-vaccinated COVID-19 patients could not be applied to the cases of breakthrough infections. These results suggest that antibody testing might help physicians identify non-vaccinated COVID-19 patients who are likely to require admission to an intensive care unit. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Characteristics of Fecal Microbiota and Machine Learning Strategy for Fecal Invasive Biomarkers in Pediatric Inflammatory Bowel Disease.
- Author
-
Wang, Xinqiong, Xiao, Yuan, Xu, Xu, Guo, Li, Yu, Yi, Li, Na, and Xu, Chundi
- Abstract
Background: Early diagnosis and treatment of pediatric Inflammatory bowel disease (PIBD) is challenging due to the complexity of the disease and lack of disease specific biomarkers. The novel machine learning (ML) technique may be a useful tool to provide a new route for the identification of early biomarkers for the diagnosis of PIBD. Methods: In total, 66 treatment naive PIBD patients and 27 healthy controls were enrolled as an exploration cohort. Fecal microbiome profiling using 16S rRNA gene sequencing was performed. The correlation between microbiota and inflammatory and nutritional markers was evaluated using Spearman's correlation. A random forest model was used to set up an ML approach for the diagnosis of PIBD using 1902 markers. A validation cohort including 14 PIBD and 48 irritable bowel syndrome (IBS) was enrolled to further evaluate the sensitivity and accuracy of the model. Result: Compared with healthy subjects, PIBD patients showed a significantly lower diversity of the gut microbiome. The increased Escherichia-Shigella and Enterococcus were positively correlated with inflammatory markers and negatively correlated with nutrition markers, which indicated a more severe disease. A diagnostic ML model was successfully set up for differential diagnosis of PIBD integrating the top 11 OTUs. This diagnostic model showed outstanding performance at differentiating IBD from IBS in an independent validation cohort. Conclusion: The diagnosis penal based on the ML of the gut microbiome may be a favorable tool for the precise diagnosis and treatment of PIBD. A study of the relationship between disease status and the microbiome was an effective way to clarify the pathogenesis of PIBD. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Classification of Schizophrenia by Combination of Brain Effective and Functional Connectivity.
- Author
-
Zhao, Zongya, Li, Jun, Niu, Yanxiang, Wang, Chang, Zhao, Junqiang, Yuan, Qingli, Ren, Qiongqiong, Xu, Yongtao, and Yu, Yi
- Subjects
FUNCTIONAL connectivity ,PARIETAL lobe ,SUPPORT vector machines ,EVOKED potentials (Electrophysiology) ,VISUAL cortex - Abstract
At present, lots of studies have tried to apply machine learning to different electroencephalography (EEG) measures for diagnosing schizophrenia (SZ) patients. However, most EEG measures previously used are either a univariate measure or a single type of brain connectivity, which may not fully capture the abnormal brain changes of SZ patients. In this paper, event-related potentials were collected from 45 SZ patients and 30 healthy controls (HCs) during a learning task, and then a combination of partial directed coherence (PDC) effective and phase lag index (PLI) functional connectivity were used as features to train a support vector machine classifier with leave-one-out cross-validation for classification of SZ from HCs. Our results indicated that an excellent classification performance (accuracy = 95.16%, specificity = 94.44%, and sensitivity = 96.15%) was obtained when the combination of functional and effective connectivity features was used, and the corresponding optimal feature number was 15, which included 12 PDC and three PLI connectivity features. The selected effective connectivity features were mainly located between the frontal/temporal/central and visual/parietal lobes, and the selected functional connectivity features were mainly located between the frontal/temporal and visual cortexes of the right hemisphere. In addition, most of the selected effective connectivity abnormally enhanced in SZ patients compared with HCs, whereas all the selected functional connectivity features decreased in SZ patients. The above results showed that our proposed method has great potential to become a tool for the auxiliary diagnosis of SZ. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. Graph-Regularized Locality-Constrained Joint Dictionary and Residual Learning for Face Sketch Synthesis.
- Author
-
Jiang, Junjun, Yu, Yi, Wang, Zheng, Liu, Xianming, and Ma, Jiayi
- Subjects
- *
FACE painting , *FACE perception , *IMAGE processing , *DRAWING , *PHOTOGRAPHS - Abstract
Face sketch synthesis is a crucial issue in digital entertainment and law enforcement. It can bridge the considerable texture discrepancy between face photos and sketches. Most of the current face sketch synthesis approaches directly to learn the relationship between the photos and sketches, and it is very difficult for them to generate the individual specific features, which we call rare characteristics. In this paper, we propose a novel face sketch synthesis approach through residual learning. In contrast to traditional approaches, which aim to reconstruct a sketch image directly (i.e., learn the mapping relationship between the photo and sketch), we aim to predict the residual image by learning the mapping relationship between the photo and residual, i.e., the difference between the photo and sketch, given an observed photo. This technique will render optimizing the residual mapping easier than optimizing the original mapping and deriving rare characteristic information. We also introduce a joint dictionary learning algorithm by preserving the local geometry structure of a data space. Through the learned joint dictionary, we transform the face sketch synthesis from an image space to a new and compact space; the new and compact space is spanned by learned dictionary atoms, where the manifold assumption can be further guaranteed. Results show that the proposed method demonstrates an impressive performance in the face sketch synthesis task on three public face sketch datasets and various real-world photos. These results are derived by comparing the proposed method with several state-of-the-art techniques, including certain recently proposed deep learning-based approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
13. A new network intrusion detection algorithm: DA‐ROS‐ELM.
- Author
-
Yu, Yi, Kang, SongLin, and Qiu, He
- Subjects
- *
INTRUSION detection systems (Computer security) , *SEQUENTIAL analysis , *MACHINE learning , *TIKHONOV regularization , *PROBLEM solving - Abstract
In this paper, a novel dual adaptive regularized online sequential extreme learning machine (DA‐ROS‐ELM) is proposed to detect network intrusion. The ridge regression factor based on Tikhonov regularization is introduced to solve the over‐fitting and ill‐posed problems. According to the arrived data in each updating phase and all currently available data, dual adaptive mechanism is designed to respectively select the suitable updating mode of output weight
β and regularized parameterC . The performance of our algorithm is assessed by NSL‐KDD dataset, and the results show that the DA‐ROS‐ELM can obtain faster training speed, higher accuracy, lower rate of false positive and false negative, and greater generalization performance than other network intrusion detection algorithms. Besides, the adaptive mechanism makes this algorithm can meet the real‐time requirement of the network intrusion system. © 2018 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
14. Deep learning techniques elucidate and modify the shape factor to extend the effective medium theory beyond its original formulation.
- Author
-
Lu, Haofan, Yu, Yi, Jain, Ankit, Ang, Yee Sin, and Ong, Wee-Liat
- Subjects
- *
DEEP learning , *MEDIA studies , *MATHEMATICAL forms - Abstract
• Deep learning elucidates the shape factor in EMT for thermal conductivity estimates. • The ratio of an inclusion's projected areas is closely related to the shape factor. • Transfer learning extends the original EMT for new thermal transport problems. The effective medium theories (EMTs) can reliably approximate the property of a composite using properties of the inclusion and matrix phase. However, their inherent assumptions and the availability of mathematical forms for describing the inclusion structure limit their accuracy and applicability. In this work, we utilize the capabilities of a deep learning method to ameliorate the latter restriction for a particular EMT formulation. Our deep learning models elucidate the inclusion structure using several physics-based descriptors and can be easily adapted for other inclusion shapes through transfer learning. Using our models, we shed light on the interpretation of the shape factor in the chosen EMT. More importantly, we extend, not replace, the EMT for cases beyond its original formulation. Our proposed transfer learning approach requires relatively low computation cost and a small sample number, making it especially useful when new data is limited. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. On the Inference of Dirichlet Mixture Priors for Protein Sequence Comparison.
- Author
-
Ye, Xugang, Yu, Yi-Kuo, and Altschul, Stephen F.
- Abstract
Dirichlet mixtures provide an elegant formalism for constructing and evaluating protein multiple sequence alignments. Their use requires the inference of Dirichlet mixture priors from curated sets of accurately aligned sequences. This article addresses two questions relevant to such inference: of how many components should a Dirichlet mixture consist, and how may a maximum-likelihood mixture be derived from a given data set. To apply the Minimum Description Length principle to the first question, we extend an analytic formula for the complexity of a Dirichlet model to Dirichlet mixtures by informal argument. We apply a Gibbs-sampling based approach to the second question. Using artificial data generated by a Dirichlet mixture, we demonstrate that our methods are able to approximate well the true theory, when it exists. We apply our methods as well to real data, and infer Dirichlet mixtures that describe the data better than does a mixture derived using previous approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
16. Compositional Adjustment of Dirichlet Mixture Priors.
- Author
-
Ye, Xugang, Yu, Yi-Kuo, and Altschul, Stephen F.
- Subjects
- *
DIRICHLET problem , *BAYESIAN analysis , *LINEAR substitutions , *AMINO acid sequence , *PROBABILITY theory , *DIFFERENTIAL entropy - Abstract
Dirichlet mixture priors provide a Bayesian formalism for scoring alignments of protein profiles to individual sequences, which can be generalized to constructing scores for multiple-alignment columns. A Dirichlet mixture is a probability distribution over multinomial space, each of whose components can be thought of as modeling a type of protein position. Applied to the simplest case of pairwise sequence alignment, a Dirichlet mixture is equivalent to an implied symmetric substitution matrix. For alphabets of even size L, Dirichlet mixtures with L/2 components and symmetric substitution matrices have an identical number of free parameters. Although this suggests the possibility of a one-to-one mapping between the two formalisms, we show that there are some symmetric matrices no Dirichlet mixture can imply, and others implied by many distinct Dirichlet mixtures. Dirichlet mixtures are derived empirically from curated sets of multiple alignments. They imply 'background' amino acid frequencies characteristic of these sets, and should thus be non-optimal for comparing proteins with non-standard composition. Given a mixture ?, we seek an adjusted ?? that implies the desired composition, but that minimizes an appropriate relative-entropy-based distance function. To render the problem tractable, we fix the mixture parameter as well as the sum of the Dirichlet parameters for each component, allowing only its center of mass to vary. This linearizes the constraints on the remaining parameters. An approach to finding ?? may be based on small consecutive parameter adjustments. The relative entropy of two Dirichlet distributions separated by a small change in their parameter values implies a quadratic cost function for such changes. For a small change in implied background frequencies, this function can be minimized using the Lagrange-Newton method. We have implemented this method, and can compositionally adjust to good precision a 20-component Dirichlet mixture prior for proteins in under half a second on a standard workstation. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
17. ARFNNs under Different Types SVR for Identification of Nonlinear Magneto-Rheological Damper Systems with Outliers.
- Author
-
Pi-Yun Chen, Yu-Yi Fu, Kuo-Lan Su, and Jin-Tsong Jeng
- Subjects
FUZZY neural networks ,MACHINE learning ,SIMULATED annealing ,SUPPORT vector machines ,NONLINEAR systems ,DAMPERS (Mechanical devices) ,OUTLIERS (Statistics) ,SIMULATION methods & models - Abstract
This paper demonstrates different types support vector regression (SVR) for annealing robust fuzzy neural networks (ARFNNs) to identification of nonlinear magneto-rheological (MR) damper with outliers. A SVR has the good performances to determine the number of rule in the simplified fuzzy inference system and initial weights for the fuzzy neural networks. In this paper, we independently proposed two different types SVR for the ARFNNs. Hence, a combination model that fuses simplified fuzzy inference system, SVR and radial basis function networks is used. Based on these initial structures, and then annealing robust learning algorithm (ARLA) can be used effectively to adjust the parameters of structures. Simulation results show the superiority of the proposed method with the different types SVR for the nonlinear MR damper systems with outliers. [ABSTRACT FROM AUTHOR]
- Published
- 2010
18. CRNNTL: Convolutional Recurrent Neural Network and Transfer Learning for QSAR Modeling in Organic Drug and Material Discovery.
- Author
-
Li, Yaqin, Xu, Yongjin, and Yu, Yi
- Subjects
CONVOLUTIONAL neural networks ,QSAR models ,DATA augmentation ,DEEP learning ,MACHINE learning ,RECURRENT neural networks ,FEATURE extraction - Abstract
Molecular latent representations, derived from autoencoders (AEs), have been widely used for drug or material discovery over the past couple of years. In particular, a variety of machine learning methods based on latent representations have shown excellent performance on quantitative structure–activity relationship (QSAR) modeling. However, the sequence feature of them has not been considered in most cases. In addition, data scarcity is still the main obstacle for deep learning strategies, especially for bioactivity datasets. In this study, we propose the convolutional recurrent neural network and transfer learning (CRNNTL) method inspired by the applications of polyphonic sound detection and electrocardiogram classification. Our model takes advantage of both convolutional and recurrent neural networks for feature extraction, as well as the data augmentation method. According to QSAR modeling on 27 datasets, CRNNTL can outperform or compete with state-of-art methods in both drug and material properties. In addition, the performances on one isomers-based dataset indicate that its excellent performance results from the improved ability in global feature extraction when the ability of the local one is maintained. Then, the transfer learning results show that CRNNTL can overcome data scarcity when choosing relative source datasets. Finally, the high versatility of our model is shown by using different latent representations as inputs from other types of AEs. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
19. A Strategy to Optimize the Implementation of a Machine-Learning Scheme for Extreme Meiyu Rainfall Prediction over Southern Taiwan.
- Author
-
Chu, Jung-Lien, Chiang, Chou-Chun, Hsu, Li-Huan, Hwang, Li-Rung, Yu, Yi-Chiang, Lin, Kuan-Ling, Wang, Chieh-Ju, Su, Shih-Hao, and Yo, Ting-Shuo
- Subjects
SUPPORT vector machines ,FORECASTING - Abstract
This study aims to propose a strategy to optimize the performance of the Support Vector Machine (SVM) scheme for extreme Meiyu rainfall prediction over southern Taiwan. Variables derived from Climate Forecast System Reanalysis (CFSR) dataset are the candidates for predictor selection. A series of experiments with different combinations of predictors and domains are designed to obtain the optimal strategy for constructing the SVM scheme. The results reveal that the accuracy (ACC), positive predictive values (PPV), probability of detection (POD), and F1-score can exceed 0.6 on average. Choosing the predictors associated with the Meiyu system and determine the domain associated with the correlations between selected predictors and predictand can improve the forecast performance. Our strategy shows the potential to predict extreme Meiyu rainfall in southern Taiwan with lead times from 16 h to 64 h. The F1-score analysis further demonstrates that the forecast performance of our scheme is stable, with slight inter-annual fluctuations from 1990 to 2019. Higher performance would be expected when the north of the South China Sea is characterized by stronger southwesterly flow and abundant low-level moisture for a given year. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
20. Visualization of statistically processed LC-MS-based metabolomics data for identifying significant features in a multiple-group comparison.
- Author
-
Pan, Yu-Yi, Chen, Yuan-Chih, Chang, William Chih-Wei, Ma, Mi-Chia, and Liao, Pao-Chi
- Subjects
- *
MACHINE learning , *METABOLOMICS , *VISUALIZATION , *FISH spoilage , *KRUSKAL-Wallis Test , *LATENT structure analysis - Abstract
Analyzing and presenting data from multiple groups are much more informative than that from two groups. However, common tools such as S plot and volcano plot are only available for identifying the significant features between two groups and are restricted to multiple-group comparisons. This study proposed novel visualization plots which not only overcame the restrictions of the above methods but also utilized the p values of multiple tests as the x-axis. The novel visualization plots included a parametric method and a nonparametric method. The parametric method was a combination of an analysis of variance and Welch's analysis of variance; the nonparametric method used the Kruskal-Wallis test. During the selection of significant features, machine learning algorithms were used to determine the cutting points of the x-axis. As a proof of concept, the real data from the experiments of 4-MeO-α-PVP metabolites and fish spoilage metabolomics were illustrated via our visualization method. The results showed that the novel visualization plots were much efficiently presented to identify significant metabolites in multiple-group comparisons. Especially, the positive predicted values of the nonparametric method and the cutting points determined by logistic regression were higher than those of other machine learning algorithms in determining the cutting points for multiple groups. • New visualization plots outweigh volcano plot and S plot for multiple-group study. • Parametric method requires normality of data and Bonferroni's adjustment is suggested to utilize on cut point of x-axis. • Nonparametric method is flexible on data type and machine learning method is suggested to use on cut point of x-axis. • As proof-of-concept, two methods perform well for multiple-group comparisons. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. Cloud Detection from FY-4A's Geostationary Interferometric Infrared Sounder Using Machine Learning Approaches.
- Author
-
Zhang, Qi, Yu, Yi, Zhang, Weimin, Luo, Tengling, and Wang, Xiang
- Subjects
- *
MACHINE learning , *TYPHOONS , *SNOW cover , *LAND cover , *GEOSTATIONARY satellites , *FALSE alarms - Abstract
FengYun-4A (FY-4A)'s Geostationary Interferometric Infrared Sounder (GIIRS) is the first hyperspectral infrared sounder on board a geostationary satellite, enabling the collection of infrared detection data with high temporal and spectral resolution. As clouds have complex spectral characteristics, and the retrieval of atmospheric profiles incorporating clouds is a significant problem, it is often necessary to undertake cloud detection before further processing procedures for cloud pixels when infrared hyperspectral data is entered into assimilation system. In this study, we proposed machine-learning-based cloud detection models using two kinds of GIIRS channel observation sets (689 channels and 38 channels) as features. Due to differences in surface cover and meteorological elements between land and sea, we chose logistic regression (lr) model for the land and extremely randomized tree (et) model for the sea respectively. Six hundred and eighty-nine channels models produced slightly higher performance (Heidke skill score (HSS) of 0.780 and false alarm rate (FAR) of 16.6% on land, HSS of 0.945 and FAR of 4.7% at sea) than 38 channels models (HSSof 0.741 and FAR of 17.7% on land, HSS of 0.912 and FAR of 7.1% at sea). By comparing visualized cloud detection results with the Himawari-8 Advanced Himawari Imager (AHI) cloud images, the proposed method has a good ability to identify clouds under circumstances such as typhoons, snow covered land, and bright broken clouds. In addition, compared with the collocated Advanced Geosynchronous Radiation Imager (AGRI)-GIIRS cloud detection method, the machine learning cloud detection method has a significant advantage in time cost. This method is not effective for the detection of partially cloudy GIIRS's field of views, and there are limitations in the scope of spatial application. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
22. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.
- Author
-
Joyce, Brendan, Lee, Danny, Rubio, Alex, Ogurtsov, Aleksey, Alves, Gelio, and Yu, Yi-Kuo
- Subjects
GRAPHICAL user interfaces ,PROTEOMICS ,ISOMERS ,MOLECULAR biology ,MACHINE learning - Abstract
Objective: RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId’s core program computes accurate
E -values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. Results: We have constructed a graphical user interface to facilitate the use of RAId on users’ local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download athttps://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html . [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
23. Discovery of depression-associated factors among childhood trauma victims from a large sample size: Using machine learning and network analysis.
- Author
-
Jin, Yu, Xu, Shicun, Shao, Zhixian, Luo, Xianyu, Wang, Yinzhe, Yu, Yi, and Wang, Yuanyuan
- Subjects
- *
ANXIETY disorders , *ADVERSE childhood experiences , *SLEEP quality , *MACHINE learning , *POST-traumatic stress disorder , *SOCIAL anxiety - Abstract
Experiences of childhood trauma (CT) would lead to serious mental problems, especially depression. Therefore, it becomes crucial to identify influential factors related to depression and explore their associations. The objectives were to 1) identify critical depression-related factors using the extreme gradient boosting (XGBoost) method from a large-scale survey data; 2) explore associations between these factors for targeted interventions and treatments. A large-scale epidemiological study covering 63 universities was conducted in Jilin Province, China. The XGBoost model was trained and tested to classify young adults with CT experiences who had or did not have depression (N = 27,671). The essential factors were selected by SHapley Additive exPlanations (SHAP) value. Multiple logistic regression analyses were conducted for validation. The associations between these depression-related factors were further explored using network analysis. The XGBoost model selected the top 10 features associated with depression with satisfactory performance (AUC = 0.91; sensitivity = 0.88 and specificity = 0.76). These factors significantly differed between depression and non-depression groups (p < 0.001). There are strong positive associations between anxiety and obsessive-compulsive disorder (OCD), anxiety and post-traumatic stress disorder (PTSD), social anxiety disorder (SAD) and appearance anxiety, and negative associations between sleep quality and anxiety, sleep quality and PTSD among CT participants with depression. The cross-sectional design cannot draw causality, and biases in self-report measurements cannot be ignored. XGBoost model and network analysis were useful methods for discovering and understanding depression-related factors in this epidemiological study. Moreover, these essential factors could offer insights into future interventions and treatments for depressed young adults with CT experiences. • The XGBoost model identified the top 10 depression-related factors, including anxiety, sleep quality, PTSD, loneliness, and OCD, among others. • These selected factors exhibited variations between the depression and non-depression groups. • Specifically, anxiety emerged as the leading risk factor for depression. • Furthermore, among CT participants with depression, strong positive associations were observed between anxiety, OCD, PTSD, and SAD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Hybrid particle swarm optimization and semi-supervised extreme learning machine for cellular network localization.
- Author
-
Liu, Fagui, Qin, Hengrui, Yang, Xin, and Yu, Yi
- Subjects
MACHINE learning ,NETWORK analysis (Communication) ,CELL phone systems ,CELL phone tracking ,PARTICLE swarm optimization - Abstract
The research of localization technology based on received signal strength and machine learning has recently attracted a lot of attentions, since with the help of enough labeled training data this technology is able to achieve high positioning accuracy. However, it is an expensive job to collect enough labeled training data in the broad outdoor space. In order to reduce the cost of building and maintaining training database, semi-supervised extreme learning machine is applied to solve the cellular network localization in this article. However, the performance of this algorithm is sensitive to the values of the hyper parameters. Without any systematic guidance, the optimal hyper parameters can only be selected by experienced workers through trial and error. To address this problem, we propose a novel algorithm by combining particle swarm optimization and semi-supervised extreme learning machine to automatically select the optimal hyper parameters of semi-supervised extreme learning machine in this article. The experiments demonstrate that applying particle swarm optimization in our optimization framework makes the hyper parameters of semi-supervised extreme learning machine algorithm self-adaptive in different conditions. Moreover, the proposed method is more stable than the general semi-supervised extreme learning machine and outperforms other compared methods. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
25. Experimental study and Random Forest prediction model of microbiome cell surface hydrophobicity.
- Author
-
Liu, Yong, Tang, Shaoxun, Fernandez-Lozano, Carlos, Munteanu, Cristian R., Pazos, Alejandro, Yu, Yi-zun, Tan, Zhiliang, and González-Díaz, Humberto
- Subjects
- *
RANDOM forest algorithms , *PREDICTION models , *CELL membranes , *MICROBIAL adhesion , *BIOFILMS , *BIOINFORMATICS - Abstract
The cell surface hydrophobicity (CSH) is an assessable physicochemical property used to evaluate the microbial adhesion to the surface of biomaterials, which is an essential step in the microbial biofilm formation and pathogenesis. For the present in vitro fermentation experiment, the CSH of ruminal mixed microbes was considered, along with other data records of pH, ammonia-nitrogen concentration, and neutral detergent fibre digestibility, conditions of surface tension and specific surface area in two different time scales. A dataset of 170,707 perturbations of input variables, grouped into two blocks of data, was constructed. Next, Expected Measurement Moving Average – Machine Learning (EMMA-ML) models were developed in order to predict CSH after perturbations of all input variables. EMMA-ML is a Perturbation Theory method that combines the ideas of Expected Measurement, Box-Jenkins Operators/Moving Average, and Time Series Analysis. Seven regression methods have been tested: Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, Elastic Net regression, Neural Networks regression, and Random Forests (RF). The best regression performance has been obtained with RF (EMMA-RF model) with an R-squared of 0.992. The model analysis has shown that CSH values were highly dependent on the in vitro fermentation parameters of detergent fibre digestibility, ammonia – nitrogen concentration, and the expected values of cell surface hydrophobicity in the first time scale. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
26. Intelligent short-term forecasting for mud concentration in CSD dredging construction.
- Author
-
Han, Shuai, Li, Heng, Li, Mingchao, Tian, Huijing, Qin, Liang, Yu, Yi, and Ma, Jie
- Subjects
- *
MUD , *LOAD forecasting (Electric power systems) , *DREDGING , *STATISTICAL smoothing , *FEATURE selection , *FORECASTING - Abstract
It has long been a challenge in the design of cutter suction dredgers (CSD) that the measurement of mud concentration lags behind other indicators like vacuum and flow rate. This phenomenon is named the time-lag effect. It significantly hinders the optimization and automation of CSD operation, for the mud concentration is a primary reference during CSD operation. As such, this study presented a data mining-based solution for short-term forecasting of mud concentrations. To accomplish this goal, first, an integration method based on hydraulics principles was proposed to align mud concentrations with other indicators over time. Second, normal construction identification, data filtering, and data smoothing were used to preprocess data. Third, a feature selection process, named "indicator class", was applied regarding the construction technology and the characteristics of the pump-pipeline system of CSDs. Last, a short-term forecasting algorithm was presented based on a hybrid ensemble strategy; this established the relationship between the actual mud concentration and the selected indicators. The validity of this algorithm was tested by a case study that collected a group of raw data logged by a CSD monitoring system. The result showed that the real-time prediction could reach a high accuracy where the coefficient of determination (R2) is 0.886 and the mean square error (MSE) is 4.708, and the short-time forecasting result within 12s is consistent with the actual trend of the mud concentration. The proposed approach has a significant deal of potential to enhance the quality and efficiency of operations by providing operators with essential references to real-time mud concentrations. • The "time-lag effect" of the measured mud concentration of CSD construction is addressed. • A framework using realtime monitored indicators to forecast mud concentration is proposed. • The key techniques include indicator selection, time alignment, and algorithm forecasting. • A hybrid algorithm is proposed to fit the relationship between mud concentration and key indicators. • The effectivenesses of the real-time mode and short-term mode of the proposed method are verified. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.