50 results
Search Results
2. Recent Advances in Computer Vision: Technologies and Applications.
- Author
-
Gao, Mingliang, Zou, Guofeng, Li, Yun, and Guo, Xiangyu
- Subjects
IMAGE recognition (Computer vision) ,COMPUTER vision ,NATURAL language processing ,ARTIFICIAL intelligence ,MACHINE learning ,DEEP learning ,IMAGE segmentation - Abstract
This document is a special issue of the journal "Electronics" that focuses on recent advances in computer vision. The introduction explains how computer vision has transformed various industries and daily life by enabling machines to interpret and understand visual information. It also highlights the challenges that still exist in the field, such as model robustness and interpretability. The future of computer vision is discussed, including the development of multimodal models and advancements in areas like self-supervised learning. The special issue includes 10 papers that cover a range of topics, including stereo matching, low-light image enhancement, automated test grading, image segmentation, virtual clothing design, large-scale learning, camera pose estimation, few-shot segmentation, and image-to-audio conversion. The papers present novel studies, approaches, and reviews that contribute to the advancement of computer vision. The document concludes by emphasizing the importance of computer vision in addressing various challenges and the need for ongoing research and interdisciplinary collaboration to tackle complex real-world problems. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
3. Time Series Dataset Survey for Forecasting with Deep Learning.
- Author
-
Hahn, Yannik, Langer, Tristan, Meyes, Richard, and Meisen, Tobias
- Subjects
FORECASTING ,TIME series analysis ,DEEP learning ,COMPUTER vision ,NATURAL language processing ,ACQUISITION of data - Abstract
Deep learning models have revolutionized research fields like computer vision and natural language processing by outperforming traditional models in multiple tasks. However, the field of time series analysis, especially time series forecasting, has not seen a similar revolution, despite forecasting being one of the most prominent tasks of predictive data analytics. One crucial problem for time series forecasting is the lack of large, domain-independent benchmark datasets and a competitive research environment, e.g., annual large-scale challenges, that would spur the development of new models, as was the case for CV and NLP. Furthermore, the focus of time series forecasting research is primarily domain-driven, resulting in many highly individual and domain-specific datasets. Consequently, the progress in the entire field is slowed down due to a lack of comparability across models trained on a single benchmark dataset and on a variety of different forecasting challenges. In this paper, we first explore this problem in more detail and derive the need for a comprehensive, domain-unspecific overview of the state-of-the-art of commonly used datasets for prediction tasks. In doing so, we provide an overview of these datasets and improve comparability in time series forecasting by introducing a method to find similar datasets which can be utilized to test a newly developed model. Ultimately, our survey paves the way towards developing a single widely used and accepted benchmark dataset for time series data, built on the various frequently used datasets surveyed in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Supervised Deep Learning Techniques for Image Description: A Systematic Review.
- Author
-
López-Sánchez, Marco, Hernández-Ocaña, Betania, Chávez-Bosquez, Oscar, and Hernández-Torruco, José
- Subjects
SUPERVISED learning ,NATURAL language processing ,DEEP learning ,COMPUTER vision ,RECURRENT neural networks ,CONVOLUTIONAL neural networks - Abstract
Automatic image description, also known as image captioning, aims to describe the elements included in an image and their relationships. This task involves two research fields: computer vision and natural language processing; thus, it has received much attention in computer science. In this review paper, we follow the Kitchenham review methodology to present the most relevant approaches to image description methodologies based on deep learning. We focused on works using convolutional neural networks (CNN) to extract the characteristics of images and recurrent neural networks (RNN) for automatic sentence generation. As a result, 53 research articles using the encoder-decoder approach were selected, focusing only on supervised learning. The main contributions of this systematic review are: (i) to describe the most relevant image description papers implementing an encoder-decoder approach from 2014 to 2022 and (ii) to determine the main architectures, datasets, and metrics that have been applied to image description. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Training Acceleration Method Based on Parameter Freezing.
- Author
-
Tang, Hongwei, Chen, Jialiang, Zhang, Wenkai, and Guo, Zhi
- Subjects
ARTIFICIAL neural networks ,DEEP learning ,NATURAL language processing ,COMPUTER vision ,FREEZING ,OBJECT recognition (Computer vision) - Abstract
As deep learning has evolved, larger and deeper neural networks are currently a popular trend in both natural language processing tasks and computer vision tasks. With the increasing parameter size and model complexity in deep neural networks, it is also necessary to have more data available for training to avoid overfitting and to achieve better results. These facts demonstrate that training deep neural networks takes more and more time. In this paper, we propose a training acceleration method based on gradually freezing the parameters during the training process. Specifically, by observing the convergence trend during the training of deep neural networks, we freeze part of the parameters so that they are no longer involved in subsequent training and reduce the time cost of training. Furthermore, an adaptive freezing algorithm for the control of freezing speed is proposed in accordance with the information reflected by the gradient of the parameters. Concretely, a larger gradient indicates that the loss function changes more drastically at that position, implying that there is more room for improvement with the parameter involved; a smaller gradient indicates that the loss function changes less and the learning of that part is close to saturation, with less benefit from further training. We use ViTDet as our baseline and conduct experiments on three remote sensing target detection datasets to verify the effectiveness of the method. Our method provides a minimum speedup ratio of 1.38×, while maintaining a maximum accuracy loss of only 2.5%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks.
- Author
-
Shafique, Muhammad Ali, Munir, Arslan, and Kong, Joonho
- Subjects
DEEP learning ,NATURAL language processing ,COMPUTER vision ,GRAPHICS processing units ,MNEMONICS ,RECOMMENDER systems - Abstract
Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Deep Learning in Diverse Intelligent Sensor Based Systems.
- Author
-
Zhu, Yanming, Wang, Min, Yin, Xuefei, Zhang, Jue, Meijering, Erik, and Hu, Jiankun
- Subjects
DEEP learning ,INTELLIGENT sensors ,NATURAL language processing ,COMPUTER vision - Abstract
Deep learning has become a predominant method for solving data analysis problems in virtually all fields of science and engineering. The increasing complexity and the large volume of data collected by diverse sensor systems have spurred the development of deep learning methods and have fundamentally transformed the way the data are acquired, processed, analyzed, and interpreted. With the rapid development of deep learning technology and its ever-increasing range of successful applications across diverse sensor systems, there is an urgent need to provide a comprehensive investigation of deep learning in this domain from a holistic view. This survey paper aims to contribute to this by systematically investigating deep learning models/methods and their applications across diverse sensor systems. It also provides a comprehensive summary of deep learning implementation tips and links to tutorials, open-source codes, and pretrained models, which can serve as an excellent self-contained reference for deep learning practitioners and those seeking to innovate deep learning in this space. In addition, this paper provides insights into research topics in diverse sensor systems where deep learning has not yet been well-developed, and highlights challenges and future opportunities. This survey serves as a catalyst to accelerate the application and transformation of deep learning in diverse sensor systems. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. ReliaMatch: Semi-Supervised Classification with Reliable Match.
- Author
-
Jiang, Tao, Chen, Luyao, Chen, Wanqing, Meng, Wenjuan, and Qi, Peihan
- Subjects
DEEP learning ,SUPERVISED learning ,NATURAL language processing ,COMPUTER vision ,RECOMMENDER systems - Abstract
Deep learning has been widely used in various tasks such as computer vision, natural language processing, predictive analysis, and recommendation systems in the past decade. However, practical scenarios often lack labeled data, posing challenges for traditional supervised methods. Semi-supervised classification methods address this by leveraging both labeled and unlabeled data to enhance model performance, but they face challenges in effectively utilizing unlabeled data and distinguishing reliable information from unreliable sources. This paper introduced ReliaMatch, a semi-supervised classification method that addresses these challenges by using a confidence threshold. It incorporates a curriculum learning stage, feature filtering, and pseudo-label filtering to improve classification accuracy and reliability. The feature filtering module eliminates ambiguous semantic features by comparing labeled and unlabeled data in the feature space. The pseudo-label filtering module removes unreliable pseudo-labels with low confidence, enhancing algorithm reliability. ReliaMatch employs a curriculum learning training mode, gradually increasing training dataset difficulty by combining selected samples and pseudo-labels with labeled data. This supervised approach enhances classification performance. Experimental results show that ReliaMatch effectively overcomes challenges associated with the underutilization of unlabeled data and the introduction of error information, outperforming the pseudo-label strategy in semi-supervised classification. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. The Potential of Visual ChatGPT for Remote Sensing.
- Author
-
Osco, Lucas Prado, Lemos, Eduardo Lopes de, Gonçalves, Wesley Nunes, Ramos, Ana Paula Marques, and Marcato Junior, José
- Subjects
REMOTE sensing ,DEEP learning ,CHATGPT ,LANGUAGE models ,NATURAL language processing ,COMPUTER vision ,IMAGE processing ,EDGE detection (Image processing) - Abstract
Recent advancements in Natural Language Processing (NLP), particularly in Large Language Models (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a variety of tasks. These are known as Visual LLMs and one notable model is Visual ChatGPT, which combines ChatGPT's LLM capabilities with visual computation to enable effective image analysis. These models' abilities to process images based on textual inputs can revolutionize diverse fields, and while their application in the remote sensing domain remains unexplored, it is important to acknowledge that novel implementations are to be expected. Thus, this is the first paper to examine the potential of Visual ChatGPT, a cutting-edge LLM founded on the GPT architecture, to tackle the aspects of image processing related to the remote sensing domain. Among its current capabilities, Visual ChatGPT can generate textual descriptions of images, perform canny edge and straight line detection, and conduct image segmentation. These offer valuable insights into image content and facilitate the interpretation and extraction of information. By exploring the applicability of these techniques within publicly available datasets of satellite images, we demonstrate the current model's limitations in dealing with remote sensing images, highlighting its challenges and future prospects. Although still in early development, we believe that the combination of LLMs and visual models holds a significant potential to transform remote sensing image processing, creating accessible and practical application opportunities in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. A Survey on Sparsity Exploration in Transformer-Based Accelerators.
- Author
-
Fuad, Kazi Ahmed Asif and Chen, Lizhong
- Subjects
NATURAL language processing ,TRANSFORMER models ,COMPUTER vision ,DEEP learning ,PARALLEL processing ,APPLICATION software - Abstract
Transformer models have emerged as the state-of-the-art in many natural language processing and computer vision applications due to their capability of attending to longer sequences of tokens and supporting parallel processing more efficiently. Nevertheless, the training and inference of transformer models are computationally expensive and memory intensive. Meanwhile, utilizing the sparsity in deep learning models has proven to be an effective approach to alleviate the computation challenge as well as help to fit large models in edge devices. As high-performance CPUs and GPUs are generally not flexible enough to explore low-level sparsity, a number of specialized hardware accelerators have been proposed for transformer models. This paper provides a comprehensive review of hardware transformer accelerators that have been proposed to explore sparsity for computation and memory optimizations. We classify existing works based on the strategies of utilizing sparsity and identify their pros and cons in those strategies. Based on our analysis, we point out promising directions and recommendations for future works on improving the effective sparse execution of transformer hardware accelerators. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Borno-Net: A Real-Time Bengali Sign-Character Detection and Sentence Generation System Using Quantized Yolov4-Tiny and LSTMs.
- Author
-
Begum, Nasima, Rahman, Rashik, Jahan, Nusrat, Khan, Saqib Sizan, Helaly, Tanjina, Haque, Ashraful, and Khatun, Nipa
- Subjects
LANGUAGE models ,NATURAL language processing ,SIGN language ,DEAF children - Abstract
Sign language is the most commonly used form of communication for persons with disabilities who have hearing or speech difficulties. However, persons without hearing impairment cannot understand these signs in many cases. As a consequence, persons with disabilities experience difficulties while expressing their emotions or needs. Thus, a sign character detection and text generation system is necessary to mitigate this issue. In this paper, we propose an end-to-end system that can detect Bengali sign characters from input images or video frames and generate meaningful sentences. The proposed system consists of two phases. In the first phase, a quantization technique for the YoloV4-Tiny detection model is proposed for detecting 49 different sign characters, including 36 Bengali alphabet characters, 10 numeric characters, and 3 special characters. Here, the detection model localizes hand signs and predicts the corresponding character. The second phase generates text from the predicted characters by a detection model. The Long Short-Term Memory (LSTM) model is utilized to generate meaningful text from the character signs detected in the previous phase. To train the proposed system, the BdSL 49 dataset is used, which has approximately 14,745 images of 49 different classes. The proposed quantized YoloV4-Tiny model achieves a mAP of 99.7%, and the proposed language model achieves an overall accuracy of 99.12%. In addition, performance analysis among YoloV4, YoloV4 Tiny, and YoloV7 models is provided in this research. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Arabic Captioning for Images of Clothing Using Deep Learning.
- Author
-
Al-Malki, Rasha Saleh and Al-Aama, Arwa Yousuf
- Subjects
NATURAL language processing ,DEEP learning ,MACHINE learning ,OPTICAL character recognition ,LANGUAGE models ,COMPUTER vision ,ARABIC language - Abstract
Fashion is one of the many fields of application that image captioning is being used in. For e-commerce websites holding tens of thousands of images of clothing, automated item descriptions are quite desirable. This paper addresses captioning images of clothing in the Arabic language using deep learning. Image captioning systems are based on Computer Vision and Natural Language Processing techniques because visual and textual understanding is needed for these systems. Many approaches have been proposed to build such systems. The most widely used methods are deep learning methods which use the image model to analyze the visual content of the image, and the language model to generate the caption. Generating the caption in the English language using deep learning algorithms received great attention from many researchers in their research, but there is still a gap in generating the caption in the Arabic language because public datasets are often not available in the Arabic language. In this work, we created an Arabic dataset for captioning images of clothing which we named "ArabicFashionData" because this model is the first model for captioning images of clothing in the Arabic language. Moreover, we classified the attributes of the images of clothing and used them as inputs to the decoder of our image captioning model to enhance Arabic caption quality. In addition, we used the attention mechanism. Our approach achieved a BLEU-1 score of 88.52. The experiment findings are encouraging and suggest that, with a bigger dataset, the attributes-based image captioning model can achieve excellent results for Arabic image captioning. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. A Review of the Evaluation System for Curriculum Learning.
- Author
-
Liu, Fengchun, Zhang, Tong, Zhang, Chunying, Liu, Lu, Wang, Liya, and Liu, Bin
- Subjects
DEEP learning ,CURRICULUM evaluation ,NATURAL language processing ,ARTIFICIAL intelligence ,COMPUTER vision ,MACHINE learning - Abstract
In recent years, deep learning models have been more and more widely used in various fields and have become a research hotspot for various tasks in artificial intelligence, but there are significant limitations in non-convex optimization problems. As a model training strategy for non-convex optimization, curriculum learning advocates that models learn in the order of easier to more difficult data, mimicking the basic idea of gradual human learning as they learn curriculum. This strategy has been widely used in the fields of computer vision, natural language processing, and reinforcement learning; it can effectively solve the non-convex optimization problem and improve the generalization ability and convergence speed of models. This paper first introduces the application of curriculum learning at three major levels: data, task, and model, and summarizes the evaluators designed using curriculum learning methods in various domains, including difficulty evaluators, training schedulers, and loss evaluators, which correspond to the three stages of difficulty evaluation, training schedule, and loss evaluation in the application of curriculum learning to model training. We also discuss how to choose an appropriate evaluation system and the differences between terms used in different types of research. Finally, we summarize five methods similar to curriculum learning in the field of machine learning and provide a summary and outlook of the curriculum learning evaluation system. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. A Survey on RISC-V-Based Machine Learning Ecosystem.
- Author
-
Kalapothas, Stavros, Galetakis, Manolis, Flamis, Georgios, Plessas, Fotis, and Kitsos, Paris
- Subjects
MACHINE learning ,SYSTEMS on a chip ,NATURAL language processing ,COMPUTER vision ,SOFTWARE frameworks ,ARTIFICIAL intelligence - Abstract
In recent years, the advancements in specialized hardware architectures have supported the industry and the research community to address the computation power needed for more enhanced and compute intensive artificial intelligence (AI) algorithms and applications that have already reached a substantial growth, such as in natural language processing (NLP) and computer vision (CV). The developments of open-source hardware (OSH) and the contribution towards the creation of hardware-based accelerators with implication mainly in machine learning (ML), has also been significant. In particular, the reduced instruction-set computer-five (RISC-V) open standard architecture has been widely adopted by a community of researchers and commercial users, worldwide, in numerous openly available implementations. The selection through a plethora of RISC-V processor cores and the mix of architectures and configurations combined with the proliferation of ML software frameworks for ML workloads, is not trivial. In order to facilitate this process, this paper presents a survey focused on the assessment of the ecosystem that entails RISC-V based hardware for creating a classification of system-on-chip (SoC) and CPU cores, along with an inclusive arrangement of the latest released frameworks that have supported open hardware integration for ML applications. Moreover, part of this work is devoted to the challenges that are concerned, such as power efficiency and reliability, when designing and building application with OSH in the AI/ML domain. This study presents a quantitative taxonomy of RISC-V SoC and reveals the opportunities in future research in machine learning with RISC-V open-source hardware architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?
- Author
-
Moutik, Oumaima, Sekkat, Hiba, Tigani, Smail, Chehri, Abdellah, Saadane, Rachid, Tchakoucht, Taha Ait, and Paul, Anand
- Subjects
CONVOLUTIONAL neural networks ,NATURAL language processing ,COMPUTER vision ,VISION ,DEEP learning - Abstract
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades. Convolutional neural networks (CNN) are a significant component of this topic and play a crucial role in the renown of Deep Learning. Inspired by the human vision system, CNN has been applied to visual data exploitation and has solved various challenges in various computer vision tasks and video/image analysis, including action recognition (AR). However, not long ago, along with the achievement of the transformer in natural language processing (NLP), it began to set new trends in vision tasks, which has created a discussion around whether the Vision Transformer models (ViT) will replace CNN in action recognition in video clips. This paper conducts this trending topic in detail, the study of CNN and Transformer for Action Recognition separately and a comparative study of the accuracy-complexity trade-off. Finally, based on the performance analysis's outcome, the question of whether CNN or Vision Transformers will win the race will be discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Leverage Boosting and Transformer on Text-Image Matching for Cheap Fakes Detection †.
- Author
-
La, Tuan-Vinh, Dao, Minh-Son, Le, Duy-Dong, Thai, Kim-Phung, Nguyen, Quoc-Hung, and Phan-Thi, Thuy-Kieu
- Subjects
DEEP learning ,FAKE news ,LOCAL mass media ,NATURAL language processing ,IMAGE registration ,COMPUTER science - Abstract
The explosive growth of the social media community has increased many kinds of misinformation and is attracting tremendous attention from the research community. One of the most prevalent ways of misleading news is cheapfakes. Cheapfakes utilize non-AI techniques such as unaltered images with false context news to create false news, which makes it easy and "cheap" to create and leads to an abundant amount in the social media community. Moreover, the development of deep learning also opens and invents many domains relevant to news such as fake news detection, rumour detection, fact-checking, and verification of claimed images. Nevertheless, despite the impact on and harmfulness of cheapfakes for the social community and the real world, there is little research on detecting cheapfakes in the computer science domain. It is challenging to detect misused/false/out-of-context pairs of images and captions, even with human effort, because of the complex correlation between the attached image and the veracity of the caption content. Existing research focuses mostly on training and evaluating on given dataset, which makes the proposal limited in terms of categories, semantics and situations based on the characteristics of the dataset. In this paper, to address these issues, we aimed to leverage textual semantics understanding from the large corpus and integrated with different combinations of text-image matching and image captioning methods via ANN/Transformer boosting schema to classify a triple of (image, caption
1 , caption2 ) into OOC (out-of-context) and NOOC (no out-of-context) labels. We customized these combinations according to various exceptional cases that we observed during data analysis. We evaluate our approach using the dataset and evaluation metrics provided by the COSMOS baseline. Compared to other methods, including the baseline, our method achieves the highest Accuracy, Recall, and F1 scores. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
17. Special Issue on Intelligent Computing for Big Data.
- Author
-
Wang, Wei and Man, Ka Lok
- Subjects
BIG data ,DEEP learning ,INFORMATION storage & retrieval systems ,NATURAL language processing ,COMPUTER vision - Abstract
The paper by Jinah Kim and Nammee Moon [[6]] proposes a deep neural network for fusing multimodal data, e.g., video and sensor data, for dog behaviour recognition. Applications for Intelligent Computing for Big Data Recent advances in AI research have the potential to move current big data research one step further. Passion for a classic research area of computer science, artificial intelligence (AI), has experienced new momentum in recent years. [Extracted from the article]
- Published
- 2022
- Full Text
- View/download PDF
18. Deep Learning for Chondrogenic Tumor Classification through Wavelet Transform of Raman Spectra.
- Author
-
Manganelli Conforti, Pietro, D'Acunto, Mario, and Russo, Paolo
- Subjects
TUMOR classification ,DEEP learning ,WAVELET transforms ,RAMAN spectroscopy ,NATURAL language processing ,COMPUTER vision ,TUMOR grading - Abstract
The grading of cancer tissues is still one of the main challenges for pathologists. The development of enhanced analysis strategies hence becomes crucial to accurately identify and further deal with each individual case. Raman spectroscopy (RS) is a promising tool for the classification of tumor tissues as it allows us to obtain the biochemical maps of the tissues under analysis and to observe their evolution in terms of biomolecules, proteins, lipid structures, DNA, vitamins, and so on. However, its potential could be further improved by providing a classification system which would be able to recognize the sample tumor category by taking as input the raw Raman spectroscopy signal; this could provide more reliable responses in shorter time scales and could reduce or eliminate false-positive or -negative diagnoses. Deep Learning techniques have become ubiquitous in recent years, with models able to perform classification with high accuracy in most diverse fields of research, e.g., natural language processing, computer vision, medical imaging. However, deep models often rely on huge labeled datasets to produce reasonable accuracy, otherwise occurring in overfitting issues when the training data is insufficient. In this paper, we propose a chondrogenic tumor CLAssification through wavelet transform of RAman spectra (CLARA), which is able to classify with high accuracy Raman spectra obtained from bone tissues. CLARA recognizes and grades the tumors in the evaluated dataset with 97 % accuracy by exploiting a classification pipeline consisting of the division of the original task in two binary classification steps, where the first is performed on the original RS signals while the latter is accomplished through the use of a hybrid temporal-frequency 2D transform. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement.
- Author
-
Shachar, Eran, Cohen, Israel, and Berdugo, Baruch
- Subjects
SPECTROGRAMS ,DEEP learning ,COMPUTER vision ,NATURAL language processing ,SHORT-term memory - Abstract
Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. This paper proposes a two-stage deep-learning approach to residual echo suppression focused on the low SER scenario. The first stage consists of a speech spectrogram masking model integrated with a double-talk detector (DTD). The second stage consists of a spectrogram refinement model optimized for speech quality by minimizing a perceptual evaluation of speech quality (PESQ) related loss function. The proposed integration of DTD with the masking model outperforms several other configurations based on previous studies. We conduct an ablation study that shows the contribution of each part of the proposed system. We evaluate the proposed system in several SERs and demonstrate its efficiency in the challenging setting of a very low SER. Finally, the proposed approach outperforms competing methods in several residual echo suppression metrics. We conclude that the proposed system is well-suited for the task of low SER residual echo suppression. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. The Advances in Computer Vision That Are Enabling More Autonomous Actions in Surgery: A Systematic Review of the Literature.
- Author
-
Gumbs, Andrew A., Grasso, Vincent, Bourdel, Nicolas, Croner, Roland, Spolverato, Gaya, Frigerio, Isabella, Illanes, Alfredo, Abu Hilal, Mohammad, Park, Adrian, and Elyan, Eyad
- Subjects
COMPUTER vision ,DEEP learning ,NATURAL language processing ,ARTIFICIAL intelligence ,MACHINE learning - Abstract
This is a review focused on advances and current limitations of computer vision (CV) and how CV can help us obtain to more autonomous actions in surgery. It is a follow-up article to one that we previously published in Sensors entitled, "Artificial Intelligence Surgery: How Do We Get to Autonomous Actions in Surgery?" As opposed to that article that also discussed issues of machine learning, deep learning and natural language processing, this review will delve deeper into the field of CV. Additionally, non-visual forms of data that can aid computerized robots in the performance of more autonomous actions, such as instrument priors and audio haptics, will also be highlighted. Furthermore, the current existential crisis for surgeons, endoscopists and interventional radiologists regarding more autonomy during procedures will be discussed. In summary, this paper will discuss how to harness the power of CV to keep doctors who do interventions in the loop. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance.
- Author
-
Nam, Seong-Guk, Jang, Yonghun, Lee, Dong-Gun, and Seo, Yeong-Seok
- Subjects
SPAM email ,INFORMATION & communication technologies for development ,OPTICAL character recognition ,FALSE advertising ,QUALITY of service - Abstract
The development of information and communication technology has created many positive outcomes, including convenience for people; however, cases of unsolicited communication, such as spam, also occur frequently. Spam is the indiscriminate transmission of unwanted information by anonymous users, called spammers. Spam content is indiscriminately transmitted to users in various forms, such as SMS, e-mail, and social network service posts, causing negative experiences for users of the service, while also creating costs, such as unnecessarily large amounts of network traffic. In addition, spam content includes phishing, hype or false advertising, and illegal content. Recently, spammers have also used images that contain stimulating content to effectively attract users' curiosity and attention. Image spam contains more complex information than text, making it more difficult to analyze and to generalize its properties compared to text. Therefore, existing text-based spam detectors are vulnerable to spam image attacks, resulting in a decline in service quality. In this paper, a "hybrid features by combining visual and text information to improve spam filtering performance" method is proposed to reduce the occurrence of misclassification. The proposed method employs three sub-models to extract features from spam images and a classifier model to output the results using the features. Each sub-model extracts topic-, word-, and image-embedding-based features from spam images. In addition, the sub-models use optical character recognition, latent Dirichlet allocation, and word2Vec techniques to extract features from images. To evaluate spam image classification performance, the spam classifiers were trained using the extracted features and the results were measured using a confusion matrix. Our model achieved an accuracy of 0.9814 and a macro-F1 score of 0.9813. In addition, the application of OCR evasion techniques resulted in a decrease in recognition performance. Using the proposed model, a mean macro-F1 score of 0.9607 was obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Deep Learning-Based Diagnosis of Alzheimer's Disease.
- Author
-
Saleem, Tausifa Jan, Zahra, Syed Rameem, Wu, Fan, Alwakeel, Ahmed, Alwakeel, Mohammed, Jeribi, Fathe, and Hijji, Mohammad
- Subjects
DEEP learning ,ALZHEIMER'S disease ,BRAIN degeneration ,NATURAL language processing ,COMPUTER vision ,DIAGNOSIS - Abstract
Alzheimer's disease (AD), the most familiar type of dementia, is a severe concern in modern healthcare. Around 5.5 million people aged 65 and above have AD, and it is the sixth leading cause of mortality in the US. AD is an irreversible, degenerative brain disorder characterized by a loss of cognitive function and has no proven cure. Deep learning techniques have gained popularity in recent years, particularly in the domains of natural language processing and computer vision. Since 2014, these techniques have begun to achieve substantial consideration in AD diagnosis research, and the number of papers published in this arena is rising drastically. Deep learning techniques have been reported to be more accurate for AD diagnosis in comparison to conventional machine learning models. Motivated to explore the potential of deep learning in AD diagnosis, this study reviews the current state-of-the-art in AD diagnosis using deep learning. We summarize the most recent trends and findings using a thorough literature review. The study also explores the different biomarkers and datasets for AD diagnosis. Even though deep learning has shown promise in AD diagnosis, there are still several challenges that need to be addressed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Attention Map-Guided Visual Explanations for Deep Neural Networks.
- Author
-
An, Junkang and Joe, Inwhee
- Subjects
ARTIFICIAL neural networks ,COMPUTER vision ,RECOMMENDER systems ,ARTIFICIAL intelligence ,NATURAL language processing ,FEATURE extraction ,DEEP learning - Abstract
Deep neural network models perform well in a variety of domains, such as computer vision, recommender systems, natural language processing, and defect detection. In contrast, in areas such as healthcare, finance, and defense, deep neural network models, due to their lack of explainability, are not trusted by users. In this paper, we focus on attention-map-guided visual explanations for deep neural networks. We employ an attention mechanism to find the most important region of an input image. The Grad-CAM method is used to extract the feature map for deep neural networks, and then the attention mechanism is used to extract the high-level attention maps. The attention map, which highlights the important region in the image for the target class, can be seen as a visual explanation of a deep neural network. We evaluate our method using two common metrics: average drop and percentage increase. For a more effective experiment, we also propose a new metric to evaluate our method. The experiments were carried out to show that the proposed method works better than the state-of-the-art explainable artificial intelligence method. Our approach can provide a lower average drop and higher percent increase when compared to other methods and find a more explanatory region, especially in the first twenty percent region of the input image. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Trinity: Neural Network Adaptive Distributed Parallel Training Method Based on Reinforcement Learning.
- Author
-
Zeng, Yan, Wu, Jiyang, Zhang, Jilin, Ren, Yongjian, and Zhang, Yunquan
- Subjects
REINFORCEMENT learning ,DEEP learning ,ARTIFICIAL neural networks ,NATURAL language processing ,COMPUTER vision - Abstract
Deep learning, with increasingly large datasets and complex neural networks, is widely used in computer vision and natural language processing. A resulting trend is to split and train large-scale neural network models across multiple devices in parallel, known as parallel model training. Existing parallel methods are mainly based on expert design, which is inefficient and requires specialized knowledge. Although automatically implemented parallel methods have been proposed to solve these problems, these methods only consider a single optimization aspect of run time. In this paper, we present Trinity, an adaptive distributed parallel training method based on reinforcement learning, to automate the search and tuning of parallel strategies. We build a multidimensional performance evaluation model and use proximal policy optimization to co-optimize multiple optimization aspects. Our experiment used the CIFAR10 and PTB datasets based on InceptionV3, NMT, NASNet and PNASNet models. Compared with Google's Hierarchical method, Trinity achieves up to 5% reductions in runtime, communication, and memory overhead, and up to a 40% increase in parallel strategy search speeds. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Comparing Human Activity Recognition Models Based on Complexity and Resource Usage.
- Author
-
Angerbauer, Simon, Palmanshofer, Alexander, Selinger, Stephan, and Kurz, Marc
- Subjects
HUMAN activity recognition ,DEEP learning ,NATURAL language processing ,COMPUTER vision ,MACHINE learning ,CONGREGATE housing - Abstract
Human Activity Recognition (HAR) is a field with many contrasting application domains, from medical applications to ambient assisted living and sports applications. With ever-changing use cases and devices also comes a need for newer and better HAR approaches. Machine learning has long been one of the predominant techniques to recognize activities from extracted features. With the advent of deep learning techniques that push state of the art results in many different domains like natural language processing or computer vision, researchers have also started to build deep neural nets for HAR. With this increase in complexity, there also comes a necessity to compare the newer approaches to the previous state of the art algorithms. Not everything that is new is also better. Therefore, this paper aims to compare typical machine learning models like a Random Forest (RF) or a Support Vector Machine (SVM) to two commonly used deep neural net architectures, Convolutional Neural Nets (CNNs) and Recurrent Neural Nets (RNNs). Not only in regards to performance but also in regards to the complexity of the models. We measure complexity as the memory consumption, the mean prediction time and the number of trainable parameters of the models. To achieve comparable results, the models are all tested on the same publicly available dataset, the UCI HAR Smartphone dataset. With this combination of prediction performance and model complexity, we look for the models achieving the best possible performance/complexity tradeoff and therefore being the most favourable to be used in an application. According to our findings, the best model for a strictly memory limited use case is the Random Forest with an F1-Score of 88.34%, memory consumption of only 0.1 MB and mean prediction time of 0.22 ms. The overall best model in terms of complexity and performance is the SVM with a linear kernel with an F1-Score of 95.62%, memory consumption of 2 MB and a mean prediction time of 0.47 ms. The two deep neural nets are on par in terms of performance, but their increased complexity makes them less favourable to be used. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. Translating Videos into Synthetic Training Data for Wearable Sensor-Based Activity Recognition Systems Using Residual Deep Convolutional Networks.
- Author
-
Fortes Rey, Vitor, Garewal, Kamalveer Kaur, Lukowicz, Paul, and Lee, Hyo Jong
- Subjects
ACCELEROMETERS ,HUMAN activity recognition ,COMPUTER vision ,MOTION detectors ,STREAMING video & television ,SIGNAL convolution ,DEEP learning ,NATURAL language processing - Abstract
Human activity recognition (HAR) using wearable sensors has benefited much less from recent advances in Deep Learning than fields such as computer vision and natural language processing. This is, to a large extent, due to the lack of large scale (as compared to computer vision) repositories of labeled training data for sensor-based HAR tasks. Thus, for example, ImageNet has images for around 100,000 categories (based on WordNet) with on average 1000 images per category (therefore up to 100,000,000 samples). The Kinetics-700 video activity data set has 650,000 video clips covering 700 different human activities (in total over 1800 h). By contrast, the total length of all sensor-based HAR data sets in the popular UCI machine learning repository is less than 63 h, with around 38 of those consisting of simple mode of locomotion activities like walking, standing or cycling. In our research we aim to facilitate the use of online videos, which exist in ample quantities for most activities and are much easier to label than sensor data, to simulate labeled wearable motion sensor data. In previous work we already demonstrated some preliminary results in this direction, focusing on very simple, activity specific simulation models and a single sensor modality (acceleration norm). In this paper, we show how we can train a regression model on generic motions for both accelerometer and gyro signals and then apply it to videos of the target activities to generate synthetic Inertial Measurement Units (IMU) data (acceleration and gyro norms) that can be used to train and/or improve HAR models. We demonstrate that systems trained on simulated data generated by our regression model can come to within around 10% of the mean F1 score of a system trained on real sensor data. Furthermore, we show that by either including a small amount of real sensor data for model calibration or simply leveraging the fact that (in general) we can easily generate much more simulated data from video than we can collect its real version, the advantage of the latter can eventually be equalized. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
27. Digital Image Processing: Advanced Technologies and Applications.
- Author
-
Mahmood, Zahid
- Subjects
ARTIFICIAL intelligence ,NATURAL language processing ,MACHINE learning ,CONVOLUTIONAL neural networks ,COMPUTER vision ,DEEP learning ,HANDWRITING recognition (Computer science) - Abstract
This document is an editorial from the journal "Applied Sciences" that discusses the advancements and applications of digital image processing. It highlights the use of deep learning-based algorithms in various fields such as animal behavior analysis, license plate recognition, and Urdu numeral classification. The editorial emphasizes the need for further research to develop more robust and explainable AI models and explores the potential of quantum computing in image processing. The document provides a summary of contributions in the field, focusing on topics like object detection, recognition, and image manipulations, and suggests future directions for research and integration of deep learning algorithms with conventional methods. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
28. An Analysis of Radio Frequency Transfer Learning Behavior.
- Author
-
Wong, Lauren J., Muller, Braeden, McPherson, Sean, and Michaels, Alan J.
- Subjects
NATURAL language processing ,RADIO frequency ,COMPUTER vision ,SIGNAL-to-noise ratio ,SEQUENTIAL learning ,MACHINE learning ,CHANNEL estimation - Abstract
Transfer learning (TL) techniques, which leverage prior knowledge gained from data with different distributions to achieve higher performance and reduced training time, are often used in computer vision (CV) and natural language processing (NLP), but have yet to be fully utilized in the field of radio frequency machine learning (RFML). This work systematically evaluates how the training domain and task, characterized by the transmitter (Tx)/receiver (Rx) hardware and channel environment, impact radio frequency (RF) TL performance for example automatic modulation classification (AMC) and specific emitter identification (SEI) use-cases. Through exhaustive experimentation using carefully curated synthetic and captured datasets with varying signal types, channel types, signal to noise ratios (SNRs), carrier/center frequencys (CFs), frequency offsets (FOs), and Tx and Rx devices, actionable and generalized conclusions are drawn regarding how best to use RF TL techniques for domain adaptation and sequential learning. Consistent with trends identified in other modalities, our results show that RF TL performance is highly dependent on the similarity between the source and target domains/tasks, but also on the relative difficulty of the source and target domains/tasks. Results also discuss the impacts of channel environment and hardware variations on RF TL performance and compare RF TL performance using head re-training and model fine-tuning methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Source File Tracking Localization: A Fault Localization Method for Deep Learning Frameworks.
- Author
-
Ma, Zhenshu, Yang, Bo, and Zhang, Yuhang
- Subjects
DEEP learning ,NATURAL language processing ,FAULT location (Engineering) ,COMPUTER vision - Abstract
Deep learning has been widely used in computer vision, natural language processing, speech recognition, and other fields. If there are errors in deep learning frameworks, such as missing module errors and GPU/CPU result discrepancy errors, it will cause many application problems. We propose a source-based fault location method, SFTL (Source File Tracking Localization), to improve the fault location efficiency of these two types of errors in deep learning frameworks. We screened 3410 crash reports on GitHub and conducted fault location experiments based on those reports. The experimental results show that the SFTL method has a high accuracy, which can help deep learning framework developers quickly locate faults and improve the stability and reliability of models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Transfer Learning for Radio Frequency Machine Learning: A Taxonomy and Survey.
- Author
-
Wong, Lauren J. and Michaels, Alan J.
- Subjects
DEEP learning ,RADIO frequency ,MACHINE learning ,NATURAL language processing ,COMPUTER vision ,COMPUTER engineering - Abstract
Transfer learning is a pervasive technology in computer vision and natural language processing fields, yielding exponential performance improvements by leveraging prior knowledge gained from data with different distributions. However, while recent works seek to mature machine learning and deep learning techniques in applications related to wireless communications, a field loosely termed radio frequency machine learning, few have demonstrated the use of transfer learning techniques for yielding performance gains, improved generalization, or to address concerns of training data costs. With modifications to existing transfer learning taxonomies constructed to support transfer learning in other modalities, this paper presents a tailored taxonomy for radio frequency applications, yielding a consistent framework that can be used to compare and contrast existing and future works. This work offers such a taxonomy, discusses the small body of existing works in transfer learning for radio frequency machine learning, and outlines directions where future research is needed to mature the field. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning.
- Author
-
Omri, Mohamed, Abdel-Khalek, Sayed, Khalil, Eied M., Bouslimi, Jamel, and Joshi, Gyanendra Prasad
- Subjects
DEEP learning ,COMPUTER vision ,NATURAL language processing ,IMAGE processing ,VISUAL fields ,NATURAL languages - Abstract
Image processing remains a hot research topic among research communities due to its applicability in several areas. An important application of image processing is the automatic image captioning technique, which intends to generate a proper description of an image in a natural language automated. Image captioning is a recently developed hot research topic, and it started to receive significant attention in the field of computer vision and natural language processing (NLP). Since image captioning is considered a challenging task, the recently developed deep learning (DL) models have attained significant performance with increased complexity and computational cost. Keeping these issues in mind, in this paper, a novel hyperparameter tuned DL for automated image captioning (HPTDL-AIC) technique is proposed. The HPTDL-AIC technique encompasses two major parts, namely encoder and decoder. The encoder part utilizes Faster SqueezNet with the RMSProp model to generate an effective depiction of the input image via insertion into a predefined length vector. At the same time, the decoder unit employs a bird swarm algorithm (BSA) with long short-term memory (LSTM) model to concentrate on the generation of description sentences. The design of RMSProp and BSA for the hyperparameter tuning process of the Faster SqueezeNet and LSTM models for image captioning shows the novelty of the work, which helps to accomplish enhanced image captioning performance. The experimental validation of the HPTDL-AIC technique is carried out against two benchmark datasets, and the extensive comparative study pointed out the improved performance of the HPTDL-AIC technique over recent approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. A State-of-the-Art Survey on Deep Learning Theory and Architectures.
- Author
-
Alom, Md Zahangir, Taha, Tarek M., Yakopcic, Chris, Westberg, Stefan, Sidike, Paheding, Nasrin, Mst Shamima, Hasan, Mahmudul, Van Essen, Brian C., Awwal, Abdul A. S., and Asari, Vijayan K.
- Subjects
DEEP learning ,NATURAL language processing ,RECURRENT neural networks ,COMPUTER vision ,REINFORCEMENT learning ,ARCHITECTURE - Abstract
In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
33. Neural Collaborative Filtering with Ontologies for Integrated Recommendation Systems.
- Author
-
Alaa El-deen Ahmed, Rana, Fernández-Veiga, Manuel, and Gawich, Mariam
- Subjects
ONTOLOGIES (Information retrieval) ,RECOMMENDER systems ,PROBABILISTIC generative models ,NATURAL language processing ,COMPUTER vision ,IMAGE recognition (Computer vision) - Abstract
Machine learning (ML) and especially deep learning (DL) with neural networks have demonstrated an amazing success in all sorts of AI problems, from computer vision to game playing, from natural language processing to speech and image recognition. In many ways, the approach of ML toward solving a class of problems is fundamentally different than the one followed in classical engineering, or with ontologies. While the latter rely on detailed domain knowledge and almost exhaustive search by means of static inference rules, ML adopts the view of collecting large datasets and processes this massive information through a generic learning algorithm that builds up tentative solutions. Combining the capabilities of ontology-based recommendation and ML-based techniques in a hybrid system is thus a natural and promising method to enhance semantic knowledge with statistical models. This merge could alleviate the burden of creating large, narrowly focused ontologies for complicated domains, by using probabilistic or generative models to enhance the predictions without attempting to provide a semantic support for them. In this paper, we present a novel hybrid recommendation system that blends a single architecture of classical knowledge-driven recommendations arising from a tailored ontology with recommendations generated by a data-driven approach, specifically with classifiers and a neural collaborative filtering. We show that bringing together these knowledge-driven and data-driven worlds provides some measurable improvement, enabling the transfer of semantic information to ML and, in the opposite direction, statistical knowledge to the ontology. Moreover, the novel proposed system enables the extraction of the reasoning recommendation results after updating the standard ontology with the new products and user behaviors, thus capturing the dynamic behavior of the environment of our interest. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. A Novel Bi-Dual Inference Approach for Detecting Six-Element Emotions.
- Author
-
Huang, Xiaoping, Zhou, Yujian, and Du, Yajun
- Subjects
NATURAL language processing ,DUAL-task paradigm ,ARTIFICIAL intelligence ,COMPUTER vision ,INFERENCE (Logic) - Abstract
In recent years, there has been rapid development in machine learning for solving artificial intelligence tasks in various fields, including translation, speech, and image processing. These AI tasks are often interconnected rather than independent. One specific type of relationship is known as structural duality, which exists between multiple pairs of artificial intelligence tasks. The concept of dual learning has gained significant attention in the fields of machine learning, computer vision, and natural language processing. Dual learning involves using primitive tasks (mapping from domains X to Y) and dual tasks (mapping from domains Y to X) to enhance the performance of both tasks. In this study, we propose a general framework called Bi-Dual Inference by combining the principles of dual inference and dual learning. Our framework generates multiple dual models and a primal model by utilizing two dual tasks: sentiment analysis of input text and sentence generation of sentiment labels. We create these model pairs (primal model f, dual model g) by employing different initialization seeds and data access sequences. Each primal and dual model is considered as a distinct LSTM model. By reasoning about a single task with multiple similar models in the same direction, our framework achieves improved classification results. To validate the effectiveness of our proposed model, we conduct experiments on two datasets, namely NLPCC2013 and NLPCC2014. The results demonstrate that our model outperforms the optimal baseline model in terms of the F1 score, achieving an improvement of approximately 5%. Additionally, we provide parameter values for our proposed model, including model iteration analysis, α parameter analysis, λ parameter analysis, batch size analysis, training sentence length analysis, and hidden layer size setting. These experimental results further confirm the effectiveness of our proposed model. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Prototype-Based Support Example Miner and Triplet Loss for Deep Metric Learning.
- Author
-
Yang, Shan, Zhang, Yongfei, Zhao, Qinghua, Pu, Yanglin, and Yang, Hangyuan
- Subjects
DEEP learning ,NATURAL language processing ,COMPUTER vision ,MINERS - Abstract
Deep metric learning aims to learn a mapping function that projects input data into a high-dimensional embedding space, facilitating the clustering of similar data points while ensuring dissimilar ones are far apart. The most recent studies focus on designing a batch sampler and mining online triplets to achieve this purpose. Conventionally, hard negative mining schemes serve as the preferred batch sampler. However, most hard negative mining schemes search for hard examples in randomly selected mini-batches at each epoch, which often results in less-optimal hard examples and thus sub-optimal performances. Furthermore, Triplet Loss is commonly adopted to perform online triplet mining by pulling the hard positives close to and pushing the negatives away from the anchor. However, when the anchor in a triplet is an outlier, the positive example will be pulled away from the centroid of the cluster, thus resulting in a loose cluster and inferior performance. To address the above challenges, we propose the Prototype-based Support Example Miner (pSEM) and Triplet Loss (pTriplet Loss). First, we present a support example miner designed to mine the support classes on the prototype-based nearest neighbor graph of classes. Following this, we locate the support examples by searching for instances at the intersection between clusters of these support classes. Second, we develop a variant of Triplet Loss, referred to as a Prototype-based Triplet Loss. In our approach, a dynamically updated prototype is used to rectify outlier anchors, thus reducing their detrimental effects and facilitating a more robust formulation for Triplet Loss. Extensive experiments on typical Computer Vision (CV) and Natural Language Processing (NLP) tasks, namely person re-identification and few-shot relation extraction, demonstrated the effectiveness and generalizability of the proposed scheme, which consistently outperforms the state-of-the-art models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. A Survey on Adversarial Deep Learning Robustness in Medical Image Analysis.
- Author
-
Apostolidis, Kyriakos D. and Papakostas, George A.
- Subjects
COMPUTER-assisted image analysis (Medicine) ,MAGNETIC resonance imaging ,DIAGNOSTIC imaging ,DEEP learning ,CONVOLUTIONAL neural networks ,NATURAL language processing ,DIAGNOSIS - Abstract
In the past years, deep neural networks (DNN) have become popular in many disciplines such as computer vision (CV), natural language processing (NLP), etc. The evolution of hardware has helped researchers to develop many powerful Deep Learning (DL) models to face numerous challenging problems. One of the most important challenges in the CV area is Medical Image Analysis in which DL models process medical images—such as magnetic resonance imaging (MRI), X-ray, computed tomography (CT), etc.—using convolutional neural networks (CNN) for diagnosis or detection of several diseases. The proper function of these models can significantly upgrade the health systems. However, recent studies have shown that CNN models are vulnerable under adversarial attacks with imperceptible perturbations. In this paper, we summarize existing methods for adversarial attacks, detections and defenses on medical imaging. Finally, we show that many attacks, which are undetectable by the human eye, can degrade the performance of the models, significantly. Nevertheless, some effective defense and attack detection methods keep the models safe to an extent. We end with a discussion on the current state-of-the-art and future challenges. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. Predicting Choices Driven by Emotional Stimuli Using EEG-Based Analysis and Deep Learning.
- Author
-
Aldayel, Mashael, Kharrat, Amira, and Al-Nafjan, Abeer
- Subjects
DEEP learning ,EMOTIONAL conditioning ,CONVOLUTIONAL neural networks ,NATURAL language processing ,ARTIFICIAL intelligence ,COMPUTER vision - Abstract
Individual choices and preferences are important factors that impact decision making. Artificial intelligence can predict decisions by objectively detecting individual choices and preferences using natural language processing, computer vision, and machine learning. Brain–computer interfaces can measure emotional reactions and identify brain activity changes linked to positive or negative emotions, enabling more accurate prediction models. This research aims to build an individual choice prediction system using electroencephalography (EEG) signals from the Shanghai Jiao Tong University emotion and EEG dataset (SEED). Using EEG, we built different deep learning models, such as a convolutional neural network, long short-term memory (LSTM), and a hybrid model to predict choices driven by emotional stimuli. We also compared their performance with different classical classifiers, such as k-nearest neighbors, support vector machines, and logistic regression. We also utilized ensemble classifiers such as random forest, adaptive boosting, and extreme gradient boosting. We evaluated our proposed models and compared them with previous studies on SEED. Our proposed LSTM model achieved good results, with an accuracy of 96%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. A Lightweight 1-D Convolution Augmented Transformer with Metric Learning for Hyperspectral Image Classification.
- Author
-
Hu, Xiang, Yang, Wenjing, Wen, Hao, Liu, Yu, Peng, Yuanxi, and Ben-Dor, Eyal
- Subjects
DEEP learning ,COMPUTER vision ,NATURAL language processing ,CONVOLUTIONAL neural networks ,SIGNAL convolution ,IMAGE recognition (Computer vision) ,REMOTE sensing - Abstract
Hyperspectral image (HSI) classification is the subject of intense research in remote sensing. The tremendous success of deep learning in computer vision has recently sparked the interest in applying deep learning in hyperspectral image classification. However, most deep learning methods for hyperspectral image classification are based on convolutional neural networks (CNN). Those methods require heavy GPU memory resources and run time. Recently, another deep learning model, the transformer, has been applied for image recognition, and the study result demonstrates the great potential of the transformer network for computer vision tasks. In this paper, we propose a model for hyperspectral image classification based on the transformer, which is widely used in natural language processing. Besides, we believe we are the first to combine the metric learning and the transformer model in hyperspectral image classification. Moreover, to improve the model classification performance when the available training samples are limited, we use the 1-D convolution and Mish activation function. The experimental results on three widely used hyperspectral image data sets demonstrate the proposed model's advantages in accuracy, GPU memory cost, and running time. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. Transformers in Remote Sensing: A Survey.
- Author
-
Aleissaee, Abdulaziz Amer, Kumar, Amandeep, Anwer, Rao Muhammad, Khan, Salman, Cholakkal, Hisham, Xia, Gui-Song, and Khan, Fahad Shahbaz
- Subjects
DEEP learning ,TRANSFORMER models ,NATURAL language processing ,SYNTHETIC aperture radar ,REMOTE sensing ,COMPUTER vision ,IMAGE analysis - Abstract
Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade. Recently, transformer-based architectures, originally introduced in natural language processing, have pervaded computer vision field where the self-attention mechanism has been utilized as a replacement to the popular convolution operator for capturing long-range dependencies. Inspired by recent advances in computer vision, the remote sensing community has also witnessed an increased exploration of vision transformers for a diverse set of tasks. Although a number of surveys have focused on transformers in computer vision in general, to the best of our knowledge we are the first to present a systematic review of recent advances based on transformers in remote sensing. Our survey covers more than 60 recent transformer-based methods for different remote sensing problems in sub-areas of remote sensing: very high-resolution (VHR), hyperspectral (HSI) and synthetic aperture radar (SAR) imagery. We conclude the survey by discussing different challenges and open issues of transformers in remote sensing. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach.
- Author
-
Cho, Seongkuk, Moon, Jihoon, Bae, Junhyeok, Kang, Jiwon, and Lee, Sangwook
- Subjects
ROBOTIC process automation ,NATURAL language processing ,DEEP learning ,MULTIMODAL user interfaces ,COMPUTER vision ,DATA mining ,IMAGE analysis - Abstract
The financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining computer vision (CV) and natural language processing (NLP) methods. These solutions are capable of image analysis, such as key information extraction and document classification. However, they could improve on text-rich document images and require much training data for processing multilingual documents. This study proposes a multimodal approach-based intelligent document processing framework that combines a pre-trained deep learning model with traditional RPA used in banks to automate business processes from real-world financial document images. The proposed framework can perform classification and key information extraction on a small amount of training data and analyze multilingual documents. In order to evaluate the effectiveness of the proposed framework, extensive experiments were conducted using Korean financial document images. The experimental results show the superiority of the multimodal approach for understanding financial documents and demonstrate that adequate labeling can improve performance by up to about 15%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. An Effective Dense Co-Attention Networks for Visual Question Answering.
- Author
-
He, Shirong and Han, Dezhi
- Subjects
QUESTION answering systems ,NATURAL language processing - Abstract
At present, the state-of-the-art approaches of Visual Question Answering (VQA) mainly use the co-attention model to relate each visual object with text objects, which can achieve the coarse interactions between multimodalities. However, they ignore the dense self-attention within question modality. In order to solve this problem and improve the accuracy of VQA tasks, in the present paper, an effective Dense Co-Attention Networks (DCAN) is proposed. First, to better capture the relationship between words that are relatively far apart and make the extracted semantics more robust, the Bidirectional Long Short-Term Memory (Bi-LSTM) neural network is introduced to encode questions and answers; second, to realize the fine-grained interactions between the question words and image regions, a dense multimodal co-attention model is proposed. The model's basic components include the self-attention unit and the guided-attention unit, which are cascaded in depth to form a hierarchical structure. The experimental results on the VQA-v2 dataset show that DCAN has obvious performance advantages, which makes VQA applicable to a wider range of AI scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
42. Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets.
- Author
-
Deng, Ran and Duzhin, Fedor
- Subjects
DEEP learning ,DATA analysis ,FAKE news ,NATURAL language processing ,COMPUTER vision ,PROTEIN folding - Abstract
Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural language processing task: fake news detection. We have found that deep learning models are more accurate in this task than topological data analysis. However, assembling a deep learning model with topological data analysis significantly improves the model's accuracy if the available training set is very small. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. Nemo: An Open-Source Transformer-Supercharged Benchmark for Fine-Grained Wildfire Smoke Detection.
- Author
-
Yazdi, Amirhessam, Qin, Heyang, Jordan, Connor B., Yang, Lei, and Yan, Feng
- Subjects
FIRE detectors ,SWARM intelligence ,OBJECT recognition (Computer vision) ,DEEP learning ,SMOKE ,WILDFIRES ,NATURAL language processing ,FIREFIGHTING - Abstract
Deep-learning (DL)-based object detection algorithms can greatly benefit the community at large in fighting fires, advancing climate intelligence, and reducing health complications caused by hazardous smoke particles. Existing DL-based techniques, which are mostly based on convolutional networks, have proven to be effective in wildfire detection. However, there is still room for improvement. First, existing methods tend to have some commercial aspects, with limited publicly available data and models. In addition, studies aiming at the detection of wildfires at the incipient stage are rare. Smoke columns at this stage tend to be small, shallow, and often far from view, with low visibility. This makes finding and labeling enough data to train an efficient deep learning model very challenging. Finally, the inherent locality of convolution operators limits their ability to model long-range correlations between objects in an image. Recently, encoder–decoder transformers have emerged as interesting solutions beyond natural language processing to help capture global dependencies via self- and inter-attention mechanisms. We propose Nemo: a set of evolving, free, and open-source datasets, processed in standard COCO format, and wildfire smoke and fine-grained smoke density detectors, for use by the research community. We adapt Facebook's DEtection TRansformer (DETR) to wildfire detection, which results in a much simpler technique, where the detection does not rely on convolution filters and anchors. Nemo is the first open-source benchmark for wildfire smoke density detection and Transformer-based wildfire smoke detection tailored to the early incipient stage. Two popular object detection algorithms (Faster R-CNN and RetinaNet) are used as alternatives and baselines for extensive evaluation. Our results confirm the superior performance of the transformer-based method in wildfire smoke detection across different object sizes. Moreover, we tested our model with 95 video sequences of wildfire starts from the public HPWREN database. Our model detected 97.9% of the fires in the incipient stage and 80% within 5 min from the start. On average, our model detected wildfire smoke within 3.6 min from the start, outperforming the baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Attention-Based RU-BiLSTM Sentiment Analysis Model for Roman Urdu.
- Author
-
Chandio, Bilal Ahmed, Imran, Ali Shariq, Bakhtyar, Maheen, Daudpota, Sher Muhammad, and Baber, Junaid
- Subjects
SENTIMENT analysis ,NATURAL language processing ,COMPUTER vision ,IMAGE processing - Abstract
Deep neural networks have emerged as a leading approach towards handling many natural language processing (NLP) tasks. Deep networks initially conquered the problems of computer vision. However, dealing with sequential data such as text and sound was a nightmare for such networks as traditional deep networks are not reliable in preserving contextual information. This may not harm the results in the case of image processing where we do not care about the sequence, but when we consider the data collected from text for processing, such networks may trigger disastrous results. Moreover, establishing sentence semantics in a colloquial text such as Roman Urdu is a challenge. Additionally, the sparsity and high dimensionality of data in such informal text have encountered a significant challenge for building sentence semantics. To overcome this problem, we propose a deep recurrent architecture RU-BiLSTM based on bidirectional LSTM (BiLSTM) coupled with word embedding and an attention mechanism for sentiment analysis of Roman Urdu. Our proposed model uses the bidirectional LSTM to preserve the context in both directions and the attention mechanism to concentrate on more important features. Eventually, the last dense softmax output layer is used to acquire the binary and ternary classification results. We empirically evaluated our model on two available datasets of Roman Urdu, i.e., RUECD and RUSA-19. Our proposed model outperformed the baseline models on many grounds, and a significant improvement of 6% to 8% is achieved over baseline models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Towards Generating and Evaluating Iconographic Image Captions of Artworks.
- Author
-
Cetinic, Eva
- Subjects
DEEP learning ,PHOTOGRAPH captions ,CONTENT analysis ,NATURAL language processing ,COMPUTER vision - Abstract
To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is most commonly addressed using datasets of natural images, while not many contributions have been made in the domain of artwork images. One of the main reasons for that is the lack of large-scale art datasets of adequate image-text pairs. Another reason is the fact that generating accurate descriptions of artwork images is particularly challenging because descriptions of artworks are more complex and can include multiple levels of interpretation. It is therefore also especially difficult to effectively evaluate generated captions of artwork images. The aim of this work is to address some of those challenges by utilizing a large-scale dataset of artwork images annotated with concepts from the Iconclass classification system. Using this dataset, a captioning model is developed by fine-tuning a transformer-based vision-language pretrained model. Due to the complex relations between image and text pairs in the domain of artwork images, the generated captions are evaluated using several quantitative and qualitative approaches. The performance is assessed using standard image captioning metrics and a recently introduced reference-free metric. The quality of the generated captions and the model's capacity to generalize to new data is explored by employing the model to another art dataset to compare the relation between commonly generated captions and the genre of artworks. The overall results suggest that the model can generate meaningful captions that indicate a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. Special Issue on Advances in Deep Learning.
- Author
-
Gragnaniello, Diego, Bottino, Andrea, Cumani, Sandro, and Kim, Wonjoon
- Subjects
DEEP learning ,REINFORCEMENT learning ,NATURAL language processing ,APPLIED sciences ,CONVOLUTIONAL neural networks ,COMPUTER vision ,ARTIFICIAL neural networks - Abstract
Nowadays, deep learning is the fastest growing research field in machine learning and has a tremendous impact on a plethora of daily life applications, ranging from security and surveillance to autonomous driving, automatic indexing and retrieval of media content, text analysis, speech recognition, automatic translation, and many others. Inspired by Faster R-CNN [[5]], object detection in infrared streetscape images is addressed in [[6]] by exploiting both fine- and coarse-grained image features. In particular, the authors of [[10]] propose a CNN compression method that exploits kernel density estimation to perform a fast 4-bit quantization of the network weights without impairing classification performance. As an example, in [[22]], CNN and particle swarm optimization are jointly employed to predict large-scale wind power, while in [[23]], a self-attention CNN is used to classify the heartbeat signal of hatching eggs in order to tell apart dead from alive ones in commercial poultry breeding. [Extracted from the article]
- Published
- 2020
- Full Text
- View/download PDF
47. NICE: Noise Injection and Clamping Estimation for Neural Network Quantization.
- Author
-
Baskin, Chaim, Zheltonozhkii, Evgenii, Rozen, Tal, Liss, Natan, Chai, Yoav, Schwartz, Eli, Giryes, Raja, Bronstein, Alexander M., and Mendelson, Avi
- Subjects
NATURAL language processing ,DEEP learning ,CONVOLUTIONAL neural networks ,COMPUTER vision ,NOISE - Abstract
Convolutional Neural Networks (CNNs) are very popular in many fields including computer vision, speech recognition, natural language processing, etc. Though deep learning leads to groundbreaking performance in those domains, the networks used are very computationally demanding and are far from being able to perform in real-time applications even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error unless spatial adjustments are carried out. The method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with as low as 3 bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low-power real-time applications. The quantization code will become publicly available upon acceptance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
48. A Review of Deep Learning Methods for Antibodies.
- Author
-
Graves, Jordan, Byerly, Jacob, Priego, Eduardo, Makkapati, Naren, Parish, S. Vince, Medellin, Brenda, and Berrondo, Monica
- Subjects
DEEP learning ,NATURAL language processing ,COMPUTER vision ,IMMUNOGLOBULINS ,SMALL molecules ,SIGNAL convolution - Abstract
Driven by its successes across domains such as computer vision and natural language processing, deep learning has recently entered the field of biology by aiding in cellular image classification, finding genomic connections, and advancing drug discovery. In drug discovery and protein engineering, a major goal is to design a molecule that will perform a useful function as a therapeutic drug. Typically, the focus has been on small molecules, but new approaches have been developed to apply these same principles of deep learning to biologics, such as antibodies. Here we give a brief background of deep learning as it applies to antibody drug development, and an in-depth explanation of several deep learning algorithms that have been proposed to solve aspects of both protein design in general, and antibody design in particular. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
49. Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review.
- Author
-
Zhang, Qian, Liu, Yeqi, Gong, Chuanyang, Chen, Yingyi, and Yu, Huihui
- Subjects
NATURAL language processing ,IMAGE analysis ,COMPUTER performance ,IMAGE processing ,AGRICULTURE ,DEEP learning ,COMPUTER vision - Abstract
Deep Learning (DL) is the state-of-the-art machine learning technology, which shows superior performance in computer vision, bioinformatics, natural language processing, and other areas. Especially as a modern image processing technology, DL has been successfully applied in various tasks, such as object detection, semantic segmentation, and scene analysis. However, with the increase of dense scenes in reality, due to severe occlusions, and small size of objects, the analysis of dense scenes becomes particularly challenging. To overcome these problems, DL recently has been increasingly applied to dense scenes and has begun to be used in dense agricultural scenes. The purpose of this review is to explore the applications of DL for dense scenes analysis in agriculture. In order to better elaborate the topic, we first describe the types of dense scenes in agriculture, as well as the challenges. Next, we introduce various popular deep neural networks used in these dense scenes. Then, the applications of these structures in various agricultural tasks are comprehensively introduced in this review, including recognition and classification, detection, counting and yield estimation. Finally, the surveyed DL applications, limitations and the future work for analysis of dense images in agriculture are summarized. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
50. Dense Model for Automatic Image Description Generation with Game Theoretic Optimization.
- Author
-
S R, Sreela and Idicula, Sumam Mary
- Subjects
ARTIFICIAL neural networks ,NATURAL language processing ,LONG-term memory ,SHORT-term memory ,DEEP learning ,COMPUTER vision ,TELEVISION game programs - Abstract
Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guidance for visually impaired people. Currently, deep neural networks play a vital role in computer vision and natural language processing tasks. The main objective of the work is to generate the grammatically correct description of the image using the semantics of the trained captions. An encoder-decoder framework using the deep neural system is used to implement an image description generation task. The encoder is an image parsing module, and the decoder is a surface realization module. The framework uses Densely connected convolutional neural networks (Densenet) for image encoding and Bidirectional Long Short Term Memory (BLSTM) for language modeling, and the outputs are given to bidirectional LSTM in the caption generator, which is trained to optimize the log-likelihood of the target description of the image. Most of the existing image captioning works use RNN and LSTM for language modeling. RNNs are computationally expensive with limited memory. LSTM checks the inputs in one direction. BLSTM is used in practice, which avoids the problem of RNN and LSTM. In this work, the selection of the best combination of words in caption generation is made using beam search and game theoretic search. The results show the game theoretic search outperforms beam search. The model was evaluated with the standard benchmark dataset Flickr8k. The Bilingual Evaluation Understudy (BLEU) score is taken as the evaluation measure of the system. A new evaluation measure called GCorrectwas used to check the grammatical correctness of the description. The performance of the proposed model achieves greater improvements over previous methods on the Flickr8k dataset. The proposed model produces grammatically correct sentences for images with a GCorrect of 0.040625 and a BLEU score of 69.96% [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.