521 results on '"pre-trained model"'
Search Results
2. PLPMpro: Enhancing promoter sequence prediction with prompt-learning based pre-trained language model
- Author
-
Li, Zhongshen, Jin, Junru, Long, Wentao, and Wei, Leyi
- Published
- 2023
- Full Text
- View/download PDF
3. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications
- Author
-
Liang, Sirui, Zhao, Yanxi, Jin, Junru, Qiao, Jianbo, Wang, Ding, Wang, Yu, and Wei, Leyi
- Published
- 2023
- Full Text
- View/download PDF
4. HyPRETo: Hybrid Pre-trained Ontology Approach for Contextual Relation Classification on Mosquito Vector Biocontrol Agents
- Author
-
Jeyakodi, G., Bala, P. Shanthi, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Carette, Jacques, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, M. Davison, Robert, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Chandrabose, Aravindan, editor, and Fernando, Xavier, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Acoustic Classification of Bird Species Using Improved Pre-trained Models
- Author
-
Xie, Jie, Zhu, Mingying, Colonna, Juan Gabriel, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hadfi, Rafik, editor, Anthony, Patricia, editor, Sharma, Alok, editor, Ito, Takayuki, editor, and Bai, Quan, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Bridging Modality Gap for Visual Grounding with Effecitve Cross-Modal Distillation
- Author
-
Wang, Jiaxi, Hu, Wenhui, Liu, Xueyang, Wu, Beihu, Qiu, Yuting, Cai, YingYing, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
7. MRI Image-Based Brain Tumor Classification Using Transfer Learning and XAI
- Author
-
Rayhan, Masum, Mondal, Saykat, Pinki, Farhana Tazmim, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Mahmud, Mufti, editor, Kaiser, M. Shamim, editor, Bandyopadhyay, Anirban, editor, Ray, Kanad, editor, and Al Mamun, Shamim, editor
- Published
- 2025
- Full Text
- View/download PDF
8. Ensemble Learning Approaches for Alzheimer’s Disease Classification in Brain Imaging Data
- Author
-
Mahmud, Tanjim, Aziz, Mohammad Tarek, Uddin, Mohammad Kamal, Barua, Koushick, Rahman, Taohidur, Sharmen, Nahed, Shamim Kaiser, M., Sazzad Hossain, Md., Hossain, Mohammad Shahadat, Andersson, Karl, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Mahmud, Mufti, editor, Kaiser, M. Shamim, editor, Bandyopadhyay, Anirban, editor, Ray, Kanad, editor, and Al Mamun, Shamim, editor
- Published
- 2025
- Full Text
- View/download PDF
9. Anatomical Embedding-Based Training Method for Medical Image Segmentation Foundation Models
- Author
-
Zhuang, Mingrui, Xu, Rui, Zhang, Qinhe, Liu, Ailian, Fan, Xin, Wang, Hongkai, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Deng, Zhongying, editor, Shen, Yiqing, editor, Kim, Hyunwoo J., editor, Jeong, Won-Ki, editor, Aviles-Rivero, Angelica I., editor, He, Junjun, editor, and Zhang, Shaoting, editor
- Published
- 2025
- Full Text
- View/download PDF
10. FPLGen: A Personalized Dialogue System Based on Feature Prompt Learning
- Author
-
Chu, Yuxing, Huang, Ke, Li, Yichen, Zhu, Hao, Li, Peiran, Zhang, Menghua, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
11. Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition
- Author
-
Dong, Yihang, Chen, Xuhang, Shen, Yanyan, Ng, Michael Kwok-Po, Qian, Tao, Wang, Shuqiang, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
12. Clothes Image Retrieval via Learnable FashionCLIP
- Author
-
Sun, Yuan, Zhao, Mingbo, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
13. Surface defect prediction on printed circuit boards using a novel deep learning model with spatial and channel attention-based DenseNet.
- Author
-
Samuthiram, Muppudathi Sutha and Vanamamalai, Rama Subra Mani
- Abstract
The electronic components are connected using printed circuit boards (PCBs), which is the most essential stage in electronic product manufacturing. It makes the final product inoperable when there is a minor defect in the PCB. Hence, in the manufacturing process of PCB, careful and meticulous defect detection stages are essential and indispensable. An optimal deep learning (DL) system with an effective pre-trained feature learning mechanism is proposed in this paper to find out PCB's surface defects. The system primarily performs preprocessing that includes contrast enhancement by contrast-limited adaptive histogram equalization (CLAHE) and noise removal by adaptive median filter (AMF) to enhance the contrast of the images and suppress the noise present in the image. Then, the class imbalance problem is solved by using the k-means synthetic minority over-sampling technique (KM-SMOTE). After that, the important discriminative features are extracted by using the spatial and channel attention-based DenseNet-21 (SCDSNT121). Finally, the defect classes are classified by using the reptile optimized gated recurrent unit (ROGRU). The six classes of PCB images are trained from a publicly available deep PCB dataset, and the system achieved the results with an accuracy of 99.12%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. Information extraction from green channel textual records on expressways using hybrid deep learning.
- Author
-
Chen, Jiaona, Zhang, Jing, Tao, Weijun, Jin, Yinli, and Fan, Heng
- Subjects
- *
MACHINE learning , *NATURAL language processing , *DEEP learning , *TRANSPORTATION policy , *DATA mining , *FARM produce - Abstract
The expressway green channel is an essential transportation policy for moving fresh agricultural products in China. In order to extract knowledge from various records, this study presents a cutting-edge approach to extract information from textual records of failure cases in the vertical field of expressway green channel. We proposed a hybrid approach based on BIO labeling, pre-trained model, deep learning and CRF to build a named entity recognition (NER) model with the optimal prediction performance. Eight entities are designed and proposed in the NER processing for the expressway green channel. three typical pre-trained natural language processing models are utilized and compared to recognize entities and obtain feature vectors, including bidirectional encoder representations from transformer (BERT), ALBERT, and RoBERTa. An ablation experiment is performed to analyze the influence of each factor on the proposed models. Used the survey data from the expressway green channel management system in Shaanxi Province of China, the experimental results show that the precision, recall, and F1-score of the RoBERTa-BiGRU-CRF model are 93.04%, 92.99%, and 92.99%, respectively. As the results, it is discovered that the text features extracted from pre-training substantially enhance the prediction accuracy of deep learning algorithms. Surprisingly, the RoBERTa model is highly effective in the task for the expressway green channel NER. This study provides a timely and necessary knowledge extraction on the Expressway Green Channel in terms of textual data, offering a systematical explanation of failure cases and valuable insights for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Patch-Wise-Based Self-Supervised Learning for Anomaly Detection on Multivariate Time Series Data.
- Author
-
Oh, Seungmin, Anh, Le Hoang, Vu, Dang Thanh, Yu, Gwang Hyun, Hahn, Minsoo, and Kim, Jinsul
- Subjects
- *
TIME series analysis - Abstract
Multivariate time series anomaly detection is a crucial technology to prevent unexpected errors from causing critical impacts. Effective anomaly detection in such data requires accurately capturing temporal patterns and ensuring the availability of adequate data. This study proposes a patch-wise framework for anomaly detection. The proposed approach comprises four key components: (i) maintaining continuous features through patching, (ii) incorporating various temporal information by learning channel dependencies and adding relative positional bias, (iii) achieving feature representation learning through self-supervised learning, and (iv) supervised learning based on anomaly augmentation for downstream tasks. The proposed method demonstrates strong anomaly detection performance by leveraging patching to maintain temporal continuity while effectively learning data representations and handling downstream tasks. Additionally, it mitigates the issue of insufficient anomaly data by supporting the learning of diverse types of anomalies. The experimental results show that our model achieved a 23% to 205% improvement in the F1 score compared to existing methods on datasets such as MSL, which has a relatively small amount of training data. Furthermore, the model also delivered a competitive performance on the SMAP dataset. By systematically learning both local and global dependencies, the proposed method strikes an effective balance between feature representation and anomaly detection accuracy, making it a valuable tool for real-world multivariate time series applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Ulcerative Colitis, LAIR1 and TOX2 Expression, and Colorectal Cancer Deep Learning Image Classification Using Convolutional Neural Networks.
- Author
-
Carreras, Joaquim, Roncador, Giovanna, and Hamoudi, Rifat
- Abstract
Simple Summary: Inflammatory bowel disease includes ulcerative colitis and Crohn's disease. Ulcerative colitis affects the colon; its pathogenesis involves genetic susceptibility, microbes, and immune dysregulation, and a higher risk of colorectal cancer. This study classified images of ulcerative colitis using deep learning. A dataset was created to process images of the large intestine capturing the three diagnoses of ulcerative colitis, colorectal cancer (adenocarcinoma), and normal colon. The convolutional neural network (CNN) was trained to classify the images into three diagnostic classes, and the performance was tested on an independent dataset. The gradient-weighted class activation mapping (Grad-CAM) heatmap technique was used to understand the classification decisions. Finally, LAIR1 and TOX2 expressions were analyzed in the ulcerative colitis cases. In conclusion, the network classified the three diagnoses with high performance, and LAIR1 and TOX2 were found to correlate with the severity of ulcerative colitis. Background: Ulcerative colitis is a chronic inflammatory bowel disease of the colon mucosa associated with a higher risk of colorectal cancer. Objective: This study classified hematoxylin and eosin (H&E) histological images of ulcerative colitis, normal colon, and colorectal cancer using artificial intelligence (deep learning). Methods: A convolutional neural network (CNN) was designed and trained to classify the three types of diagnosis, including 35 cases of ulcerative colitis (n = 9281 patches), 21 colon control (n = 12,246), and 18 colorectal cancer (n = 63,725). The data were partitioned into training (70%) and validation sets (10%) for training the network, and a test set (20%) to test the performance on the new data. The CNNs included transfer learning from ResNet-18, and a comparison with other CNN models was performed. Explainable artificial intelligence for computer vision was used with the Grad-CAM technique, and additional LAIR1 and TOX2 immunohistochemistry was performed in ulcerative colitis to analyze the immune microenvironment. Results: Conventional clinicopathological analysis showed that steroid-requiring ulcerative colitis was characterized by higher endoscopic Baron and histologic Geboes scores and LAIR1 expression in the lamina propria, but lower TOX2 expression in isolated lymphoid follicles (all p values < 0.05) compared to mesalazine-responsive ulcerative colitis. The CNN classification accuracy was 99.1% for ulcerative colitis, 99.8% for colorectal cancer, and 99.1% for colon control. The Grad-CAM heatmap confirmed which regions of the images were the most important. The CNNs also differentiated between steroid-requiring and mesalazine-responsive ulcerative colitis based on H&E, LAIR1, and TOX2 staining. Additional classification of 10 new cases of colorectal cancer (adenocarcinoma) were correctly classified. Conclusions: CNNs are especially suited for image classification in conditions such as ulcerative colitis and colorectal cancer; LAIR1 and TOX2 are relevant immuno-oncology markers in ulcerative colitis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Substation Abnormal Scene Recognition Based on Two-Stage Contrastive Learning.
- Author
-
Liu, Shanfeng, Su, Haitao, Mao, Wandeng, Li, Miaomiao, Zhang, Jun, and Bao, Hua
- Subjects
- *
MACHINE learning , *IMAGE recognition (Computer vision) , *DATA augmentation , *CLASSIFICATION algorithms , *WORKPIECES - Abstract
Substations are an important part of the power system, and the classification of abnormal substation scenes needs to be comprehensive and reliable. The abnormal scenes include multiple workpieces such as the main transformer body, insulators, dials, box doors, etc. In this research field, the scarcity of abnormal scene data in substations poses a significant challenge. To address this, we propose a few-show learning algorithm based on two-stage contrastive learning. In the first stage of model training, global and local contrastive learning losses are introduced, and images are transformed through extensive data augmentation to build a pre-trained model. On the basis of the built pre-trained model, the model is fine-tuned based on the contrast and classification losses of image pairs to identify the abnormal scene of the substation. By collecting abnormal substation images in real scenes, we create a few-shot learning dataset for abnormal substation scenes. Experimental results on the dataset demonstrate that our proposed method outperforms State-of-the-Art few-shot learning algorithms in classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. TExCNN: Leveraging Pre-Trained Models to Predict Gene Expression from Genomic Sequences.
- Author
-
Dong, Guohao, Wu, Yuqian, Huang, Lan, Li, Fei, and Zhou, Fengfeng
- Subjects
- *
CONVOLUTIONAL neural networks , *GENE expression , *NUCLEOTIDE sequence , *DNA sequencing , *PREDICTION models , *DEEP learning - Abstract
Background/Objectives: Understanding the relationship between DNA sequences and gene expression levels is of significant biological importance. Recent advancements have demonstrated the ability of deep learning to predict gene expression levels directly from genomic data. However, traditional methods are limited by basic word encoding techniques, which fail to capture the inherent features and patterns of DNA sequences. Methods: We introduce TExCNN, a novel framework that integrates the pre-trained models DNABERT and DNABERT-2 to generate word embeddings for DNA sequences. We partitioned the DNA sequences into manageable segments and computed their respective embeddings using the pre-trained models. These embeddings were then utilized as inputs to our deep learning framework, which was based on convolutional neural network. Results: TExCNN outperformed current state-of-the-art models, achieving an average R2 score of 0.622, compared to the 0.596 score achieved by the DeepLncLoc model, which is based on the Word2Vec model and a text convolutional neural network. Furthermore, when the sequence length was extended from 10,500 bp to 50,000 bp, TExCNN achieved an even higher average R2 score of 0.639. The prediction accuracy improved further when additional biological features were incorporated. Conclusions: Our experimental results demonstrate that the use of pre-trained models for word embedding generation significantly improves the accuracy of predicting gene expression. The proposed TExCNN pipeline performes optimally with longer DNA sequences and is adaptable for both cell-type-independent and cell-type-dependent predictions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. 面向社会媒体的立场检测研究综述.
- Author
-
赵小兵, 尹召宁, 王子豪, 张袁硕, and 陈波
- Subjects
- *
MACHINE learning , *DECISION making , *INTERNET - Abstract
With the continuous development of the Internet, people widely use social media platforms such as Weibo and Twitter, resulting in a huge amount of user generated content every day. It is important to analyze the user s attitude based on the user generated content for hot/focused topics as it can help relevant stakeholders make informed decisions. Therefore, the goal of the stance detection task is to determine the user s stance (favor/against/neutral) on the target according to the specified target and the given content. This paper described the stance detection tasks, applications, related data resources and related works. For stance detection tasks, in addition to the previous single/multiple/cross target stance detection tasks, this paper also organized the works related to zero/small sample stance detection. For data resources, this paper gave a detailed introduction to the data resources published in recent years. For stance detection methods, in addition to traditional machine learning methods, neural networks and other methods, this paper also sorted out the methods based on the pre training model. Finally, it summarized the development status of stance detection, and looked forward to the possible research hotspots in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Transgaze: exploring plain vision transformers for gaze estimation.
- Author
-
Ye, Lang, Wang, Xinggang, Yao, Jingfeng, and Liu, Wenyu
- Abstract
Recently, plain vision transformers (ViTs) have shown impressive performance in various computer vision tasks due to their powerful modeling capabilities and large-scale pre-training. However, they have yet to show excellent results in gaze estimation tasks. In this paper, we take the advanced Vision Transformers further into the task of Gaze Estimation (TransGaze). Our framework adeptly integrates the distinctive local features of the eyes while maintaining a simple and flexible structure. It can seamlessly adapt to various large-scale pre-trained models, enhancing its versatility and applicability in different contexts. It first demonstrates the pre-trained ViTs could also show strong capabilities on gaze estimation tasks. Our approach employs the following strategies: (i) Enhancing the self-attention module among facial feature maps through straightforward token manipulation, effectively achieving complex feature fusion, a feat previously requiring more intricate methods; (ii) Leveraging the plain of TransGaze and the inherent adaptability of Plain ViT, we introduce a pre-trained model for gaze estimation. This model reduces training time by over 50 % and exhibits strong generalization performance. We evaluate our TransGaze on GazeCapture and MPIIFaceGaze datasets and achieve state-of-the-art performance with less training costs. Our models and codes will be available. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Classification of Monkeypox Using Greylag Goose Optimization (GGO) Algorithm.
- Author
-
Eslam, Ahmed, Abdelfattah, Mohamed G., El-Kenawy, El-Sayed M., and El-Din Moustafa, Hossam
- Subjects
METAHEURISTIC algorithms ,OPTIMIZATION algorithms ,GREY Wolf Optimizer algorithm ,CONVOLUTIONAL neural networks ,MONKEYPOX - Abstract
After the COVID-19 epidemic, public health awareness increased. A skin viral disease known as monkeypox sparked an emergency alert, leading to numerous reports of infections across numerous European countries. Common symptoms of this disease are fever, high temperatures, and water-filled blisters. This paper presents one of the recent algorithms based on a metaheuristic framework. To improve the performance of monkeypox classification, we introduce the GGO algorithm. Firstly, we employ four pre-trained models (AlexNet, GoogleNet, Resnet-50, and VGG-19) to extract the most common features of monkeypox skin image disease (MSID). Then, we reduce the number of extracted features to select the most distinguishing features for the disease. We make it by using GGO in binary form, which has an average fitness of 0.60068 and a best fitness of 0.50248. Lastly, we apply various optimization algorithms, including the (WWPA) waterwheel plant algorithm, the (DTO) Boosted Dipper Throated Optimization, the (PSO) particle swarm optimizer, the (WAO) whale optimization algorithm, the (GWO) gray wolf optimizer, the (FA) firefly algorithm, and the GGO algorithm, all based on the Convolution Neural Network (CNN), to achieve the best performance. Best Performance is indicated in accuracy and sensitivity; it reached 0.9919 and 0.9895 by GGO. A rigorous statistical analysis test was applied to confirm the validity of our findings. We applied Analysis of Variance ANOVA, and Wilcoxon signed tests, and the results indicated that the value of p was less than 0.005, which strongly supports our hypothesis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Information extraction from green channel textual records on expressways using hybrid deep learning
- Author
-
Jiaona Chen, Jing Zhang, Weijun Tao, Yinli Jin, and Heng Fan
- Subjects
Expressway green channel ,Named entity recognition ,BIO labeling ,Pre-trained model ,Deep learning ,Medicine ,Science - Abstract
Abstract The expressway green channel is an essential transportation policy for moving fresh agricultural products in China. In order to extract knowledge from various records, this study presents a cutting-edge approach to extract information from textual records of failure cases in the vertical field of expressway green channel. We proposed a hybrid approach based on BIO labeling, pre-trained model, deep learning and CRF to build a named entity recognition (NER) model with the optimal prediction performance. Eight entities are designed and proposed in the NER processing for the expressway green channel. three typical pre-trained natural language processing models are utilized and compared to recognize entities and obtain feature vectors, including bidirectional encoder representations from transformer (BERT), ALBERT, and RoBERTa. An ablation experiment is performed to analyze the influence of each factor on the proposed models. Used the survey data from the expressway green channel management system in Shaanxi Province of China, the experimental results show that the precision, recall, and F1-score of the RoBERTa-BiGRU-CRF model are 93.04%, 92.99%, and 92.99%, respectively. As the results, it is discovered that the text features extracted from pre-training substantially enhance the prediction accuracy of deep learning algorithms. Surprisingly, the RoBERTa model is highly effective in the task for the expressway green channel NER. This study provides a timely and necessary knowledge extraction on the Expressway Green Channel in terms of textual data, offering a systematical explanation of failure cases and valuable insights for future research.
- Published
- 2024
- Full Text
- View/download PDF
23. Comparative Analysis of Pre-trained CNN Models on Retinal Diseases Classification
- Author
-
Theodore Alvin Hartanto and Seng Hansun
- Subjects
classification ,comparative analysis ,deep learning ,optical coherence tomography ,pre-trained model ,retinal.introduction. ,Technology - Abstract
One method to diagnose retinal diseases is by using the Optical Coherence Tomography (OCT) scans. Annually, it is estimated that around 30 million OCT scans are performed worldwide. However, the process of analyzing and diagnosing OCT scan results by an ophthalmologist requires a long time so machine learning, especially deep learning, can be utilized to shorten the diagnosis process and speed up the treatment process. In this study, several pre-trained deep learning models are compared, including EfficientNet-B0, ResNet-50V2, Inception-V3, and DenseNet-169. These models will be fine-tuned and trained with a dataset containing OCT scanned images to classify four retinal conditions, namely Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME), Drusen, and Normal. The models that have been trained are then tested to classify the test set and the results are evaluated using a confusion matrix in terms of accuracy, recall, precision, and F1-score. The results show that the model with the best classification results in the batch size of 32 scenario is the ResNet-50V2 model with an accuracy value of 98.24%, precision of 98.25%, recall of 98.24%, and F1-score of 98.24%. While for the batch size of 64, the EfficientNet-B0 model is the model with the best classification results with an accuracy value of 96.59%, precision of 96.84%, recall of 96.59%, and F1-score of 96.59%.
- Published
- 2024
24. PVII: A pedestrian-vehicle interactive and iterative prediction framework for pedestrian's trajectory.
- Author
-
Shen, Qianwen, Huang, Shien, Sun, Baixi, Chen, Xinyu, Tao, Dingwen, Wan, Huaiyu, and Bao, Ergude
- Subjects
TRAFFIC safety ,TRAFFIC accidents ,CONTINUOUS processing ,PREDICTION models ,ACQUISITION of data ,PEDESTRIANS ,PEDESTRIAN accidents - Abstract
Advanced Driving Assistance System (ADAS) can predict pedestrian's trajectory, in order to avoid traffic accidents and guarantee driving safety. A few current pedestrian trajectory prediction methods use a pedestrian's historical motion to predict the future trajectory, but the pedestrian's trajectory is also affected by the vehicle using the ADAS for prediction (target vehicle). Other studies predict the pedestrian's and vehicle's trajectories separately, and use the latter to adjust the former, but their interaction is a continuous process and should be considered during prediction rather than after. Therefore, we propose PVII, a pedestrian-vehicle interactive and iterative prediction framework for pedestrian's trajectory. It makes prediction for one iteration based on the results from previous iteration, which essentially models the vehicle-pedestrian interaction. In this iterative framework, to avoid accumulation of prediction errors along with the increased iterations, we design a bi-layer Bayesian en/decoder. For each iteration, it not only uses inaccurate results from previous iteration but also accurate historical data for prediction, and calculates Bayesian uncertainty values to evaluate the results. In addition, the pedestrian's trajectory is affected by both target vehicle and other vehicles around it (surrounding vehicle), so we include into the framework a pre-trained speed estimation module for surrounding vehicles (SE module). It estimates the speed based on pedestrian's motion and we collect data from pedestrian's view for training. In experiments, PVII can achieve the highest prediction accuracy compared to the current methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. 针对大语言模型的偏见性研究综述.
- Author
-
徐 磊, 胡亚豪, and 潘志松
- Subjects
- *
LANGUAGE models , *NATURAL language processing , *PROCESS capability , *ORIGIN of languages , *NATURAL languages - Abstract
The phenomenon of bias existed widely in human society, and typically manifested through natural language. Traditional bias studies have mainly focused on static word embedding models, but with the continuous evolution of natural language processing technology, research has gradually shifted towards pre-trained models with stronger contextual processing capabilities. As a further development of pre-trained models, although large language models have been widely deployed in multiple applications due to their remarkable performance and broad prospects, they may still capture social biases from unprocessed training data and propagate these biases to downstream tasks. Biased large language model systems can cause adverse social impacts and other potential harm. Therefore, there is an urgent need for further exploration of bias in large language models. This paper discussed the origins of bias in natural language processing and provided an analysis and summary of the development of bias evaluation and mitigation methods from word embedding models to the current large language models, aiming to provide valuable references for future related research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. BERT-like Pre-training for Symbolic Piano Music Classification Tasks.
- Author
-
Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, and Yi-Hsuan Yang
- Subjects
LANGUAGE models ,TRANSFORMER models ,RECURRENT neural networks ,PIANO music ,MUSIC scores - Abstract
This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grids notated by their composers and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
27. Multi-Label Classification of Pure Code.
- Author
-
Gao, Bin, Qin, Hongwu, and Ma, Xiuqin
- Subjects
PROGRAMMING languages ,SOURCE code ,CLASSIFICATION ,INSTITUTIONAL repositories ,ENCODING - Abstract
Currently, there is a significant amount of public code in the IT communities, programming forums and code repositories. Many of these codes lack classification labels, or have imprecise labels, which causes inconvenience to code management and retrieval. Some classification methods have been proposed to automatically assign labels to the code. However, these methods mainly rely on code comments or surrounding text, and the classification effect is limited by the quality of them. So far, there are a few methods that rely solely on the code itself to assign labels to the code. In this paper, an encoder-only method is proposed to assign multiple labels to the code of an algorithmic problem, in which UniXcoder is employed to encode the input code and the encoding results correspond to the output labels through the classification heads. The proposed method relies only on the code itself. We construct a dataset to evaluate the proposed method, which consists of source code in three programming languages (C + + , Java, Python) with a total size of approximately 120 K. The results of the comparative experiment show that the proposed method has better performance in multi-label classification task of pure code than encoder–decoder methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Improving ROUGE‐1 by 6%: A novel multilingual transformer for abstractive news summarization.
- Author
-
Kumar, Sandeep and Solanki, Arun
- Subjects
NATURAL language processing ,TEXT summarization ,HIDDEN Markov models ,SUPERVISED learning ,DEEP learning - Abstract
Summary: Natural language processing (NLP) has undergone a significant transformation, evolving from manually crafted rules to powerful deep learning techniques such as transformers. These advancements have revolutionized various domains including summarization, question answering, and more. Statistical models like hidden Markov models (HMMs) and supervised learning have played crucial roles in laying the foundation for this progress. Recent breakthroughs in transfer learning and the emergence of large‐scale models like BERT and GPT have further pushed the boundaries of NLP research. However, news summarization remains a challenging task in NLP, often resulting in factual inaccuracies or the loss of the article's essence. In this study, we propose a novel approach to news summarization utilizing a fine‐tuned Transformer architecture pre‐trained on Google's mt‐small tokenizer. Our model demonstrates significant performance improvements over previous methods on the Inshorts English News dataset, achieving a 6% enhancement in the ROUGE‐1 score and reducing training loss by 50%. This breakthrough facilitates the generation of reliable and concise news summaries, thereby enhancing information accessibility and user experience. Additionally, we conduct a comprehensive evaluation of our model's performance using popular metrics such as ROUGE scores, with our proposed model achieving ROUGE‐1: 54.6130, ROUGE‐2: 31.1543, ROUGE‐L: 50.7709, and ROUGE‐LSum: 50.7907. Furthermore, we observe a substantial reduction in training and validation losses, underscoring the effectiveness of our proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. 基于 BERT 迁移学习模型的地震灾害社交媒体 信息分类研究.
- Author
-
林 森, 刘蓓蓓, 李建文, 刘 旭, 秦 昆, and 郭桂祯
- Subjects
- *
EMERGENCY management , *PUBLIC opinion , *SOCIAL media , *EARTHQUAKES , *DISASTER relief , *EMERGENCY medical services , *DEEP learning - Abstract
Objectives: With the rapid development of the Internet, social media has become an important information source of emergency events. However, there are a lot of duplication, errors and even malicious contents in social media, which need to be effectively classified to provide more accurate information for disaster emergency response. Methods: Deep learning has greatly improved the accuracy and efficiency of text classification. This paper takes earthquake disaster as an example, and builds a multi-label classification model based on bidirectional encoder representation from transformers (BERT) transfer learning. Over 50 000 posts about 5 earthquakes are collected as training samples from SINA Weibo, which is a very popular social media in China. Each sample is manually marked as one or more labels, such as hazards information, loss information, rescue information, public opinion information and useless information. Results: By fine-tune training, the classification accuracies of the proposed model on training dataset and test dataset reach 97% and 92%, respectively. The area under curve score of each label ranges from 0.952 to 0.998. Conclusions: The results prove that the multi-label classification using BERT transfer learning is of high reliability. The proposed model can be applied to the emergency management services for earthquake events, which is beneficial for the rapid disaster rescue and relief. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Mechanical strength recognition and classification of thermal protective fabric images after thermal aging based on deep learning.
- Author
-
Liu, Xiaohan, Tian, Miao, and Wang, Yunyi
- Subjects
BURNS & scalds prevention ,WORK-related injuries risk factors ,MATERIALS testing ,RESEARCH funding ,HEAT ,EXPERIMENTAL design ,PROTECTIVE clothing ,TENSILE strength ,DEEP learning ,TEXTILES ,DIGITAL image processing ,INDUSTRIAL safety - Abstract
Objectives. Currently, numerous studies have focused on testing or modeling to evaluate the safe service life of thermal protective clothing after thermal aging, reducing the risk to occupational personnel. However, testing will render the garment unsuitable for subsequent use and a series of input parameters for modeling are not readily available. In this study, a novel image recognition strategy was proposed to discriminate the mechanical strength of thermal protective fabric after thermal aging based on transfer learning. Methods. Data augmentation was used to overcome the shortcoming of insufficient training samples. Four pre-trained models were used to explore their performance in three sample classification modes. Results. The experimental results show that the VGG-19 model achieves the best performance in the three-classification mode (accuracy = 91%). The model was more accurate in identifying fabric samples in the early and late stages of strength decline. For fabric samples in the middle stage of strength decline, the three-classification mode was better than the four-classification and six-classification modes. Conclusions. The findings provide novel insights into the image-based mechanical strength evaluation of thermal protective fabrics after aging. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Predicting social media users' indirect aggression through pre-trained models.
- Author
-
Zhou, Zhenkun, Yu, Mengli, Peng, Xingyu, and He, Yuxin
- Subjects
SOCIAL media ,SOCIAL marginality ,AGGRESSION (Psychology) ,PREDICTION models ,SOCIAL support - Abstract
Indirect aggression has become a prevalent phenomenon that erodes the social media environment. Due to the expense and the difficulty in determining objectively what constitutes indirect aggression, the traditional self-reporting questionnaire is hard to be employed in the current cyber area. In this study, we present a model for predicting indirect aggression online based on pre-trained models. Building on Weibo users' social media activities, we constructed basic, dynamic, and content features and classified indirect aggression into three subtypes: social exclusion, malicious humour, and guilt induction. We then built the prediction model by combining it with large-scale pre-trained models. The empirical evidence shows that this prediction model (ERNIE) outperforms the pre-trained models and predicts indirect aggression online much better than the models without extra pre-trained information. This study offers a practical model to predict users' indirect aggression. Furthermore, this work contributes to a better understanding of indirect aggression behaviors and can support social media platforms' organization and management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. A Domain-Specific Lexicon for Improving Emergency Management in Gas Pipeline Networks through Knowledge Fusing.
- Author
-
Zhao, Xinghao, Hu, Yanzhu, Qin, Tingxin, Wan, Wang, and Wang, Yudi
- Subjects
LANGUAGE models ,EMERGENCY management ,CHINESE language ,INVESTIGATION reports ,LEXICON - Abstract
Emergencies in gas pipeline networks can lead to significant loss of life and property, necessitating extensive professional knowledge for effective response and management. Effective emergency response depends on specialized knowledge, which can be captured efficiently through domain-specific lexicons. The goal of this research is to develop a specialized lexicon that integrates domain-specific knowledge to improve emergency management in gas pipeline networks. The process starts with an enhanced version of Term Frequency–Inverse Document Frequency (TF-IDF), a statistical method used in information retrieval, combined with filtering logic to extract candidate words from investigation reports. Simultaneously, we fine tune the Chinese Bidirectional Encoder Representations from Transformers (BERT) model, a state-of-the-art language model, with domain-specific data to enhance semantic capture and integrate domain knowledge. Next, words with similar meanings are identified through word similarity analysis based on standard terminology and risk inventories, facilitating lexicon expansion. Finally, the domain-specific lexicon is formed by amalgamating these words. Validation shows that this method, which integrates domain knowledge, outperforms models that lack such integration. The resulting lexicon not only assigns domain-specific weights to terms but also deeply embeds domain knowledge, offering robust support for cause analysis and emergency management in gas pipeline networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. CLIP-SP: Vision-language model with adaptive prompting for scene parsing.
- Author
-
Li, Jiaao, Huang, Yixiang, Wu, Ming, Zhang, Bin, Ji, Xu, and Zhang, Chuang
- Subjects
IMAGE segmentation ,SPINE ,CLASSIFICATION ,NOISE ,PROBABILITY theory - Abstract
We present a novel framework, CLIP-SP, and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing. Our approach addresses the limitations of DenseCLIP, which demonstrates the superior image segmentation provided by CLIP pre-trained models over ImageNet pre-trained models, but struggles with rough pixel-text score maps for complex scene parsing. We argue that, as they contain all textual information in a dataset, the pixel-text score maps, i.e., dense prompts, are inevitably mixed with noise. To overcome this challenge, we propose a two-step method. Firstly, we extract visual and language features and perform multi-label classification to identify the most likely categories in the input images. Secondly, based on the top-k categories and confidence scores, our method generates scene tokens which can be treated as adaptive prompts for implicit modeling of scenes, and incorporates them into the visual features fed into the decoder for segmentation. Our method imposes a constraint on prompts and suppresses the probability of irrelevant categories appearing in the scene parsing results. Our method achieves competitive performance, limited by the available visual-language pre-trained models. Our CLIP-SP performs 1.14% better (in terms of mIoU) than DenseCLIP on ADE20K, using a ResNet-50 backbone. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Tibetan Speech Synthesis Based on Pre-Traind Mixture Alignment FastSpeech2.
- Author
-
Zhou, Qing, Xu, Xiaona, and Zhao, Yue
- Subjects
INTELLIGIBILITY of speech ,AUTOREGRESSIVE models ,ACOUSTIC models ,DEEP learning ,SPEECH - Abstract
Most current research in Tibetan speech synthesis relies primarily on autoregressive models in deep learning. However, these models face challenges such as slow inference, skipped readings, and repetitions. To overcome these issues, we propose an enhanced non-autoregressive acoustic model combined with a vocoder for Tibetan speech synthesis. Specifically, we introduce the mixture alignment FastSpeech2 method to correct errors caused by hard alignment in the original FastSpeech2 method. This new method employs soft alignment at the level of Latin letters and hard alignment at the level of Tibetan characters, thereby improving alignment accuracy between text and speech and enhancing the naturalness and intelligibility of the synthesized speech. Additionally, we integrate pitch and energy information into the model, further enhancing overall synthesis quality. Furthermore, Tibetan has relatively smaller text-to-audio datasets compared to widely studied languages. To address these limited resources, we employ a transfer learning approach to pre-train the model with data from resource-rich languages. Subsequently, this pre-trained mixture alignment FastSpeech2 model is fine-tuned for Tibetan speech synthesis. Experimental results demonstrate that the mixture alignment FastSpeech2 model produces higher-quality speech compared to the original FastSpeech2 model, particularly when pre-trained on an English dataset, resulting in further improvements in clarity and naturalness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Y-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning.
- Author
-
Liu, Yitao, An, Chenxin, and Qiu, Xipeng
- Abstract
With current success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph. In this paper, we propose Y -Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. Y -Tuning learns dense representations for labels Y defined in a given task and aligns them to fixed feature representation. Without computing the gradients of text encoder at training phrase, Y -Tuning is not only parameter-efficient but also training-efficient. Experimental results show that for DeBERTa
XXL with 1.6 billion parameters, Y -Tuning achieves performance more than 96% of full fine-tuning on GLUE Benchmark with only 2% tunable parameters and much fewer training costs. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
36. Qadg: Generating question–answer-distractors pairs for real examination
- Author
-
Zhou, Hao and Li, Li
- Published
- 2024
- Full Text
- View/download PDF
37. msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths
- Author
-
Yazi Li, Xiaoman Wei, Qinglin Yang, An Xiong, Xingfeng Li, Quan Zou, Feifei Cui, and Zilong Zhang
- Subjects
DNA promoters ,Pre-trained model ,BERT ,Soft voting ,Two-stage predictor ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background A promoter is a specific sequence in DNA that has transcriptional regulatory functions, playing a role in initiating gene expression. Identifying promoters and their strengths can provide valuable information related to human diseases. In recent years, computational methods have gained prominence as an effective means for identifying promoter, offering a more efficient alternative to labor-intensive biological approaches. Results In this study, a two-stage integrated predictor called “msBERT-Promoter” is proposed for identifying promoters and predicting their strengths. The model incorporates multi-scale sequence information through a tokenization strategy and fine-tunes the DNABERT model. Soft voting is then used to fuse the multi-scale information, effectively addressing the issue of insufficient DNA sequence information extraction in traditional models. To the best of our knowledge, this is the first time an integrated approach has been used in the DNABERT model for promoter identification and strength prediction. Our model achieves accuracy rates of 96.2% for promoter identification and 79.8% for promoter strength prediction, significantly outperforming existing methods. Furthermore, through attention mechanism analysis, we demonstrate that our model can effectively combine local and global sequence information, enhancing its interpretability. Conclusions msBERT-Promoter provides an effective tool that successfully captures sequence-related attributes of DNA promoters and can accurately identify promoters and predict their strengths. This work paves a new path for the application of artificial intelligence in traditional biology.
- Published
- 2024
- Full Text
- View/download PDF
38. 基于自监督对比学习与方面级情感分析的联合微调模型.
- Author
-
狄广义, 陈见飞, 杨世军, 高军, 王耀坤, and 余本功
- Abstract
The way of fine-tuning pre-trained models to complete aspect-based sentiment analysis tasks has been widely used and has achieved significant improvement. However, most of the existing studies use complex downstream structures, and even coincide with some hidden layer structures of pre-trained models, which limits the overall model performance. Since the contrastive learning helps to improve the representation of pre-trained models at the word level and sentence level, a joint fine-tuning framework combining self-supervised contrastive learning aspect-based sentiment analysis( SSCL-ABSA) was designed. The framework combines two learning tasks with a concise downstream structure to fine-tune the pre-trained bidirectional encoder representations from Transformers( BERT) model from different angles, which effectively promotes the improvement of the effect of aspect-level sentiment analysis. Specifically, two segments of text and aspect words were spliced and entered into the BERT encoder as samples. After encoding, pooling operations were adopted for the different word representations according to the downstream structure requirements. On the one hand, pooling all word representations was used for aspect-level sentiment analysis, and on the other hand, pooling of aspect word representations of two segments was used for self-supervised comparative learning. Finally, the two tasks were combined to fine-tune the BERT encoder in a joint learning manner. Experimental evaluation on three publicly available datasets shows that the SSCL-ABSA method is superior to other similar comparison methods. With the help of the t-distributed stochastic neighbor embedding( t-SNE) method, SSCL-ABSA is visualized to effectively improve the entity representation effect of the BERT model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Integrating Knowledge Graph and Machine Learning Methods for Landslide Susceptibility Assessment.
- Author
-
Wu, Qirui, Xie, Zhong, Tian, Miao, Qiu, Qinjun, Chen, Jianguo, Tao, Liufeng, and Zhao, Yifan
- Subjects
- *
LANDSLIDES , *LANDSLIDE hazard analysis , *KNOWLEDGE graphs , *MACHINE learning , *DATA mining - Abstract
The suddenness of landslide disasters often causes significant loss of life and property. Accurate assessment of landslide disaster susceptibility is of great significance in enhancing the ability of accurate disaster prevention. To address the problems of strong subjectivity in the selection of assessment indicators and low efficiency of the assessment process caused by the insufficient application of a priori knowledge in landslide susceptibility assessment, in this paper, we propose a novel landslide susceptibility assessment framework by combing domain knowledge graph and machine learning algorithms. Firstly, we combine unstructured data, extract priori knowledge based on the Unified Structure Generation for Universal Information Extraction Pre-trained model (UIE) fine-tuned with a small amount of labeled data to construct a landslide susceptibility knowledge graph. We use Paired Relation Vectors (PairRE) to characterize the knowledge graph, then construct a target area characterization factor recommendation model by calculating spatial correlation, attribute similarity, Term Frequency–Inverse Document Frequency (TF-IDF) metrics. We select the optimal model and optimal feature combination among six typical machine learning (ML) models to construct interpretable landslide disaster susceptibility assessment mapping. Experimental validation and analysis are carried out on the three gorges area (TGA), and the results show the effectiveness of the feature factors recommended by the knowledge graph characterization learning, with the overall accuracy of the model after adding associated disaster factors reaching 87.2%. The methodology proposed in this research is a better contribution to the knowledge and data-driven assessment of landslide disaster susceptibility. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. 基于强化正则的小样本自动摘要方法.
- Author
-
李清 and 万卫兵
- Abstract
Automatic text summarization aims to extract the main statements from text information for the purpose of compressing information. Existing generative automatic summarization methods do not take full advantage of the pre-trained model to learn the semantics of the original text, resulting in the loss of important information in the generated content, when the data set with a small number of samples is often prone to overfitting. In order to solve such problems and obtain better fine-tuning performance, the pre-trained model mT5(multilingual T5) is used as a baseline to improve the learning ability of the model by combining R-drop(Regularized dropout) with reinforced regularity for model fine-tuning, and Sparse softmax is used to reduce the ambiguity of prediction generation to ensure the accuracy of the output. The model calculates BLEU(Bilingual Evaluation Understudy) for hyperparameter test on Chinese data sets LCSTS and CSL, and uses Rouge as evaluation index to evaluate data sets of different orders of magnitude. The experimental results show that the optimized pre-trained model can better learn the semantic representation of the original text, and the model can maintain a good fit in the small samples and generate more practical results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. BERTIVITS: The Posterior Encoder Fusion of Pre-Trained Models and Residual Skip Connections for End-to-End Speech Synthesis.
- Author
-
Wang, Zirui, Song, Minqi, and Zhou, Dongbo
- Subjects
SPEECH synthesis ,INTELLIGENT tutoring systems ,FEATURE extraction ,TELEPHONES ,CUSTOMER services - Abstract
Enhancing the naturalness and rhythmicity of generated audio in end-to-end speech synthesis is crucial. The current state-of-the-art (SOTA) model, VITS, utilizes a conditional variational autoencoder architecture. However, it faces challenges, such as limited robustness, due to training solely on text and spectrum data from the training set. Particularly, the posterior encoder struggles with mid- and high-frequency feature extraction, impacting waveform reconstruction. Existing efforts mainly focus on prior encoder enhancements or alignment algorithms, neglecting improvements to spectrum feature extraction. In response, we propose BERTIVITS, a novel model integrating BERT into VITS. Our model features a redesigned posterior encoder with residual connections and utilizes pre-trained models to enhance spectrum feature extraction. Compared to VITS, BERTIVITS shows significant subjective MOS score improvements (0.16 in English, 0.36 in Chinese) and objective Mel-Cepstral coefficient reductions (0.52 in English, 0.49 in Chinese). BERTIVITS is tailored for single-speaker scenarios, improving speech synthesis technology for applications like post-class tutoring or telephone customer service. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. A psychological evaluation method incorporating noisy label correction mechanism.
- Author
-
Jin, Zhigang, Su, Renjun, Liu, Yuhong, and Duan, Chenxu
- Subjects
- *
GRAPH neural networks , *PSYCHOLOGICAL techniques , *EVALUATION methodology , *DEEP learning , *GAUSSIAN mixture models , *NAIVE Bayes classification , *MACHINE learning - Abstract
Using machine learning and deep learning methods to analyze text data from social media can effectively explore hidden emotional tendencies and evaluate the psychological state of social media account owners. However, the label noise caused by mislabeling may significantly influence the training and prediction results of traditional supervised models. To resolve this problem, this paper proposes a psychological evaluation method that incorporates a noisy label correction mechanism and designs an evaluation framework that consists of a primary classification model and a noisy label correction mechanism. Firstly, the social media text data are transformed into heterogeneous text graphs, and a classification model combining a pre-trained model with a graph neural network is constructed to extract semantic features and structural features, respectively. After that, the Gaussian mixture model is used to select the samples that are likely to be mislabeled. Then, soft labels are generated for them to enable noisy label correction without prior knowledge of the noise distribution information. Finally, the corrected and clean samples are composed into a new data set and re-input into the primary model for mental state classification. Results of experiments on three real data sets indicate that the proposed method outperforms current advanced models in classification accuracy and noise robustness under different noise ratio settings, and can efficiently explore the potential sentiment tendencies and users' psychological states in social media text data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Factitious or fact? Learning textual representations for fake online review detection.
- Author
-
Mohawesh, Rami, Al-Hawawreh, Muna, Maqsood, Sumbal, and Alqudah, Omar
- Subjects
- *
DEEP learning , *MACHINE learning , *SUPERVISED learning , *CONSUMERS' reviews , *LANGUAGE models , *BUSINESS enterprises - Abstract
User reviews can play a big part in deciding a company's income in the e-commerce industry. Before making selections regarding any product or service, online users rely on reviews. As a result, the trustworthiness of online evaluations is vital for organisations and can directly impact their reputation and revenue. Because of this, some firms pay spammers to publish false reviews. Most recent studies to detect fake reviews utilise supervised learning. However, neural network techniques, a recent form of advanced technology, have been utilised extensively to detect fake reviews and have demonstrated their ability to do so. Thus, this paper first provides a benchmark study to analyse the performance of various machine learning algorithms with different feature extraction methods on five fake review datasets to present our results. Second, we propose three advanced language models for embedding reviews into the classifiers. Third, we conduct an exhaustive feature set evaluation study to find the best features in detecting fake reviews. Fourth, we analyse the performance of traditional machine learning, deep learning, and advanced deep learning models using different feature extraction methods on five fake review datasets. Finally, we integrate the ELECTRA model with CNN which can identify real or fake reviews. Our proposed technique utilises accuracy, precision, recall, and F1 score as assessment criteria to determine the leniency of the proposed model. For deep contextualised representation and neural classification, we integrate Single-Layer Perceptron (SLP), Multi-Layer Perceptron (MLP), and Convolutional Neural Networks (CNN) following the embedding layer of unique pre-trained models like ELMo, ELECTRA, and GPT2. The experimental results indicate that our proposed model outperforms state-of-the-art methods with improvements ranging from 1 to 7% in terms of the accuracy, F1 score. To the best of our knowledge, no prior work has evaluated such advanced pre-trained models' efficiency in detecting fake reviews. Further, this research comprehensively evaluates several machine-learning approaches and feature extraction strategies for fake online review detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Enhancing coffee bean classification: a comparative analysis of pre-trained deep learning models.
- Author
-
Hassan, Esraa
- Subjects
- *
DEEP learning , *COFFEE beans , *ARCHITECTURAL models , *COFFEE growing , *COFFEE manufacturing , *CLASSIFICATION , *COMPARATIVE studies , *ECONOMIC equilibrium - Abstract
Coffee bean production can encounter challenges due to fluctuations in global coffee prices, impacting the economic stability of some countries that heavily depend on coffee production. The primary objective is to evaluate how effectively various pre-trained models can predict coffee types using advanced deep learning techniques. The selection of an optimal pre-trained model is crucial, given the growing popularity of specialty coffee and the necessity for precise classification. We conducted a comprehensive comparison of several pre-trained models, including AlexNet, LeNet, HRNet, Google Net, Mobile V2 Net, ResNet (50), VGG, Efficient, Darknet, and DenseNet, utilizing a coffee-type dataset. By leveraging transfer learning and fine-tuning, we assess the generalization capabilities of the models for the coffee classification task. Our findings emphasize the substantial impact of the pre-trained model choice on the model's performance, with certain models demonstrating higher accuracy and faster convergence than conventional alternatives. This study offers a thorough evaluation of pre-trained architectural models regarding their effectiveness in coffee classification. Through the evaluation of result metrics, including sensitivity (1.0000), specificity (0.9917), precision (0.9924), negative predictive value (1.0000), accuracy (1.0000), and F1 score (0.9962), our analysis provides nuanced insights into the intricate landscape of pre-trained models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Hybrid Feature Extraction Technique-based Alzheimer’s Disease Detection Model Using MRI Images.
- Author
-
Al-Rawashdeh, Hazim Saleh, Usman, Aminu, Dutta, Ashit Kumar, and Wahab Sait, Abdul Rahaman
- Subjects
- *
ALZHEIMER'S disease , *FEATURE extraction , *MAGNETIC resonance imaging , *OPTIMAL stopping (Mathematical statistics) , *MATHEMATICAL optimization - Abstract
Detecting Alzheimer’s disease (AD) using magnetic resonance imaging (MRI) is essential for early diagnosis and management. This study introduces a new method for detecting AD by combining three robust models: DenseNet201, EfficientNet B7, and extremely randomized trees (ERT). We improve the ability to extract features in DenseNet201 by including a self-attention mechanism. Additionally, we use early stopping techniques on EfficientNet B7 to address the issue of overfitting. In addition, Bayesian Optimization and Hyperband optimization techniques are used to adjust the hyperparameters of extra-trees to differentiate normal and abnormal MRI images. In addition, the authors used SHapley Additive exPlanations to understand the model’s decision. With minimal computer resources, the proposed model achieved a remarkable accuracy of 98.9% in detecting AD. The findings highlight the effectiveness of recommended feature extraction and ERT models and optimization methods to accurately identify AD using MRI images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion.
- Author
-
Lei, Yu, Qu, Keshuai, Zhao, Yifan, Han, Qing, and Wang, Xuguang
- Subjects
- *
SENTIMENT analysis , *DATA extraction , *INFORMATION superhighway , *DATA integrity , *CONVOLUTIONAL neural networks - Abstract
In the field of multimodal sentiment analysis, it is an important research task to fully extract modal features and perform efficient fusion. In response to the problems of insufficient semantic information and poor cross-modal fusion effect of traditional sentiment classification models, this paper proposes a composite hierarchical feature fusion method combined with prior knowledge. Firstly, the ALBERT (A Lite BERT) model and the improved ResNet model are constructed for feature extraction of text and image, respectively, and high-dimensional feature vectors are obtained. Secondly, to solve the problem of insufficient semantic information expression in cross-scene, a prior knowledge enhancement model is proposed to enrich the data characteristics of each modality. Finally, to solve the problem of poor cross-modal fusion effect, a composite hierarchical fusion model is proposed, which combines the temporal convolutional network and the attention mechanism to fuse the sequence features of each modality information and realizes the information interaction between different modalities. Experiments on MVSA-Single and MVSA-Multi datasets show that the proposed model is superior to a series of comparison models and has good adaptability in new scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Research on cross-lingual multi-label patent classification based on pre-trained model.
- Author
-
Lu, Yonghe, Chen, Lehua, Tong, Xinyu, Peng, Yongxin, and Zhu, Hou
- Abstract
Patent classification is an important part of the patent examination and management process. Using efficient and accurate automatic patent classification can significantly improve patent retrieval performance. Current monolingual patent classification models, on the other hand, are insufficient for cross-lingual patent tasks. Therefore, research into cross-lingual patent categorization is crucial. In this paper, we proposed a cross-lingual patent classification model based on the pre-trained model named XLM-R–CNN. Besides, we constructed a large patent dataset called XLPatent including Chinese, English, and German. We conducted experiments to evaluate model performance with several metrics. The experimental results showed that XLM-R–CNN achieved a classification accuracy of 73% and average precision of 94%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths.
- Author
-
Li, Yazi, Wei, Xiaoman, Yang, Qinglin, Xiong, An, Li, Xingfeng, Zou, Quan, Cui, Feifei, and Zhang, Zilong
- Abstract
Background: A promoter is a specific sequence in DNA that has transcriptional regulatory functions, playing a role in initiating gene expression. Identifying promoters and their strengths can provide valuable information related to human diseases. In recent years, computational methods have gained prominence as an effective means for identifying promoter, offering a more efficient alternative to labor-intensive biological approaches. Results: In this study, a two-stage integrated predictor called “msBERT-Promoter” is proposed for identifying promoters and predicting their strengths. The model incorporates multi-scale sequence information through a tokenization strategy and fine-tunes the DNABERT model. Soft voting is then used to fuse the multi-scale information, effectively addressing the issue of insufficient DNA sequence information extraction in traditional models. To the best of our knowledge, this is the first time an integrated approach has been used in the DNABERT model for promoter identification and strength prediction. Our model achieves accuracy rates of 96.2% for promoter identification and 79.8% for promoter strength prediction, significantly outperforming existing methods. Furthermore, through attention mechanism analysis, we demonstrate that our model can effectively combine local and global sequence information, enhancing its interpretability. Conclusions: msBERT-Promoter provides an effective tool that successfully captures sequence-related attributes of DNA promoters and can accurately identify promoters and predict their strengths. This work paves a new path for the application of artificial intelligence in traditional biology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. RQ-OSPTrans: A Semantic Classification Method Based on Transformer That Combines Overall Semantic Perception and "Repeated Questioning" Learning Mechanism.
- Author
-
Tan, Yuanjun, Liu, Quanling, Liu, Tingting, Liu, Hai, Wang, Shengming, and Chen, Zengzhao
- Subjects
TRANSFORMER models ,LANGUAGE models ,SPEECH perception ,CLASSIFICATION ,LEARNING modules ,LANGUAGE ability - Abstract
The pre-trained language model based on Transformers possesses exceptional general text-understanding capabilities, empowering it to adeptly manage a variety of tasks. However, the topic classification ability of the pre-trained language model will be seriously affected in the face of long colloquial texts, expressions with similar semantics but completely different expressions, and text errors caused by partial speech recognition. We propose a long-text topic classification method called RQ-OSPTrans to effectively address these challenges. To this end, two parallel learning modules are proposed to learn long texts, namely, the repeat question module and the overall semantic perception module. The overall semantic perception module will conduct average pooling on the semantic embeddings produced by BERT, in addition to multi-layer perceptron learning. The repeat question module will learn the text-embedding matrix, extracting detailed clues for classification based on words as fundamental elements. Comprehensive experiments demonstrate that RQ-OSPTrans can achieve a generalization performance of 98.5% on the Chinese dataset THUCNews. Moreover, RQ-OSPTrans can achieve state-of-the-art performance on the arXiv-10 dataset (84.4%) and has a comparable performance with other state-of-the-art pre-trained models on the AG's News dataset. Finally, the results indicate that our method exhibits a superior performance compared with the baseline methods on small-scale domain-specific datasets by validating RQ-OSPTrans on a specific task scenario by using our custom-built dataset CCIPC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation.
- Author
-
Shao, Yunfan, Geng, Zhichao, Liu, Yitao, Dai, Junqi, Yan, Hang, Yang, Fei, Li, Zhe, Bao, Hujun, and Qiu, Xipeng
- Abstract
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.