Author: "Song, Shezheng" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Song, Shezheng"' showing total 13 results

Start Over Author "Song, Shezheng"

13 results on '"Song, Shezheng"'

1. Identifying Knowledge Editing Types in Large Language Models

Author: Li, Xiaopeng, Wang, Shangwen, Song, Shezheng, Ji, Bin, Liu, Huijun, Li, Shasha, Ma, Jun, and Yu, Jie
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Knowledge editing has emerged as an efficient technology for updating the knowledge of large language models (LLMs), attracting increasing attention in recent years. However, there is a lack of effective measures to prevent the malicious misuse of this technology, which could lead to harmful edits in LLMs. These malicious modifications could cause LLMs to generate toxic content, misleading users into inappropriate actions. In front of this risk, we introduce a new task, Knowledge Editing Type Identification (KETI), aimed at identifying different types of edits in LLMs, thereby providing timely alerts to users when encountering illicit edits. As part of this task, we propose KETIBench, which includes five types of harmful edits covering most popular toxic types, as well as one benign factual edit. We develop four classical classification models and three BERT-based models as baseline identifiers for both open-source and closed-source LLMs. Our experimental results, across 42 trials involving two models and three knowledge editing methods, demonstrate that all seven baseline identifiers achieve decent identification performance, highlighting the feasibility of identifying malicious edits in LLMs. Additional analyses reveal that the performance of the identifiers is independent of the reliability of the knowledge editing methods and exhibits cross-domain generalization, enabling the identification of edits from unknown sources. All data and code are available in https://github.com/xpq-tech/KETI. Warning: This paper contains examples of toxic text., Comment: Under review
Published: 2024

2. DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

Author: Song, Shezheng, Li, Shasha, Yu, Jie, Zhao, Shan, Li, Xiaopeng, Ma, Jun, Liu, Xiaodong, Li, Zhuo, and Mao, Xiaoguang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dynamically Integrate Multimodal information with knowledge base (DIM), employing the capability of the Large Language Model (LLM) for visual understanding. The LLM, such as BLIP-2, extracts information relevant to entities in the image, which can facilitate improved extraction of entity features and linking them with the dynamic entity representations provided by ChatGPT. The experiments demonstrate that our proposed DIM method outperforms the majority of existing methods on the three original datasets, and achieves state-of-the-art (SOTA) on the dynamically enhanced datasets (Wiki+, Rich+, Diverse+). For reproducibility, our code and collected datasets are released on \url{https://github.com/season1blue/DIM}., Comment: Published on PRCV24
Published: 2024

3. PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

Author: Song, Shezheng, Li, Shasha, Zhao, Shan, Wang, Chengyu, Li, Xiaopeng, Yu, Jie, Wan, Qian, Ma, Jun, Yan, Tianwei, Ma, Wentao, and Mao, Xiaoguang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Multimedia
Abstract: Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text tokens with image patches, leading to misalignment and ineffective image utilization. In contrast, a pipeline framework first identifies aspects through MATE (Multimodal Aspect Term Extraction) and then aligns these aspects with image patches for sentiment classification (MASC: Multimodal Aspect-Oriented Sentiment Classification). This method is better suited for multimodal scenarios where effective image use is crucial. We present three key observations: (a) MATE and MASC have different feature requirements, with MATE focusing on token-level features and MASC on sequence-level features; (b) the aspect identified by MATE is crucial for effective image utilization; and (c) images play a trivial role in previous MABSA methods due to high noise. Based on these observations, we propose a pipeline framework that first predicts the aspect and then uses translation-based alignment (TBA) to enhance multimodal semantic consistency for better image utilization. Our method achieves state-of-the-art (SOTA) performance on widely used MABSA datasets Twitter-15 and Twitter-17. This demonstrates the effectiveness of the pipeline approach and its potential to provide valuable insights for future MABSA research. For reproducibility, the code and checkpoint will be released., Comment: Code will be released upon publication
Published: 2024

4. DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking

Author: Song, Shezheng, Li, Shasha, Zhao, Shan, Li, Xiaopeng, Wang, Chengyu, Yu, Jie, Ma, Jun, Yan, Tianwei, Ji, Bin, and Mao, Xiaoguang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) to link ambiguous mentions to unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such as attributes in images. (3)semantic inconsistency between the entity in knowledge base and its representation. To this end, we propose DWE+ for multimodal entity linking. DWE+ could capture finer semantics and dynamically maintain semantic consistency with entities. This is achieved by three aspects: (a)we introduce a method for extracting fine-grained image features by partitioning the image into multiple local objects. Then, hierarchical contrastive learning is used to further align semantics between coarse-grained information(text and image) and fine-grained (mention and visual objects). (b)we explore ways to extract visual attributes from images to enhance fusion feature such as facial features and identity. (c)we leverage Wikipedia and ChatGPT to capture the entity representation, achieving semantic enrichment from both static and dynamic perspectives, which better reflects the real-world entity semantics. Experiments on Wikimel, Richpedia, and Wikidiverse datasets demonstrate the effectiveness of DWE+ in improving MEL performance. Specifically, we optimize these datasets and achieve state-of-the-art performance on the enhanced datasets. The code and enhanced datasets are released on https://github.com/season1blue/DWET, Comment: under review on TOIS. arXiv admin note: substantial text overlap with arXiv:2312.11816
Published: 2024

5. SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

Author: Li, Xiaopeng, Li, Shasha, Song, Shezheng, Liu, Huijun, Ji, Bin, Wang, Xi, Ma, Jun, Yu, Jie, Liu, Xiaodong, Wang, Jing, and Zhang, Weimin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique for efficiently updating a small amount of knowledge of LLMs and has attracted much attention. In particular, local editing methods, which directly update model parameters, are more suitable for updating a small amount of knowledge. Local editing methods update weights by computing least squares closed-form solutions and identify edited knowledge by vector-level matching in inference, which achieve promising results. However, these methods still require a lot of time and resources to complete the computation. Moreover, vector-level matching lacks reliability, and such updates disrupt the original organization of the model's parameters. To address these issues, we propose an detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching and adds them to the subject word embeddings in Transformer input. To get these editing embeddings, we propose optimizing then suppressing fusion method, which first optimizes learnable embedding vectors for the editing target and then suppresses the Knowledge Embedding Dimensions (KEDs) to obtain final editing embeddings. We thus propose SWEA$\oplus$OS method for editing factual knowledge in LLMs. We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$\oplus$OS on the \textsc{CounterFact} and zsRE datasets. To further validate the reasoning ability of SWEA$\oplus$OS in editing knowledge, we evaluate it on the more complex \textsc{RippleEdits} benchmark. The results demonstrate that SWEA$\oplus$OS possesses SOTA reasoning ability., Comment: Under review; Our code is available at https://github.com/xpq-tech/SWEA
Published: 2024

6. A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

Author: Song, Shezheng, Zhao, Shan, Wang, Chengyu, Yan, Tianwei, Li, Shasha, Mao, Xiaoguang, and Wang, Meng
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role in many applications. However, existing methods suffer from shortcomings, including modality impurity such as noise in raw image and ambiguous textual entity representation, which puts obstacles to MEL. We formulate multimodal entity linking as a neural text matching problem where each multimodal information (text and image) is treated as a query, and the model learns the mapping from each query to the relevant entity from candidate entities. This paper introduces a dual-way enhanced (DWE) framework for MEL: (1) our model refines queries with multimodal data and addresses semantic gaps using cross-modal enhancers between text and image information. Besides, DWE innovatively leverages fine-grained image attributes, including facial characteristic and scene feature, to enhance and refine visual features. (2)By using Wikipedia descriptions, DWE enriches entity semantics and obtains more comprehensive textual representation, which reduces between textual representation and the entities in KG. Extensive experiments on three public benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance, indicating the superiority of our model. The code is released on https://github.com/season1blue/DWE, Comment: AAAI23 Accept
Published: 2023

7. How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model

Author: Song, Shezheng, Li, Xiaopeng, Li, Shasha, Zhao, Shan, Yu, Jie, Ma, Jun, Mao, Xiaoguang, and Zhang, Weimin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: This review paper explores Multimodal Large Language Models (MLLMs), which integrate Large Language Models (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities like generating image narratives and answering image-based questions, bridging the gap towards real-world human-computer interactions and hinting at a potential pathway to artificial general intelligence. However, MLLMs still face challenges in processing the semantic gap in multimodality, which may lead to erroneous generation, posing potential risks to society. Choosing the appropriate modality alignment method is crucial, as improper methods might require more parameters with limited performance improvement. This paper aims to explore modality alignment methods for LLMs and their existing capabilities. Implementing modality alignment allows LLMs to address environmental issues and enhance accessibility. The study surveys existing modal alignment methods in MLLMs into four groups: (1) Multimodal Converters that change data into something LLMs can understand; (2) Multimodal Perceivers to improve how LLMs perceive different types of data; (3) Tools Assistance for changing data into one common format, usually text; and (4) Data-Driven methods that teach LLMs to understand specific types of data in a dataset. This field is still in a phase of exploration and experimentation, and we will organize and update various existing research methods for multimodal information alignment.
Published: 2023

8. DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

Author: Song, Shezheng, Li, Shasha, Yu, Jie, Zhao, Shan, Li, Xiaopeng, Ma, Jun, Liu, Xiaodong, Li, Zhuo, Mao, Xiaoguang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
Published: 2025
Full Text: View/download PDF

9. PMET: Precise Model Editing in a Transformer

Author: Li, Xiaopeng, Li, Shasha, Song, Shezheng, Yang, Jing, Ma, Jun, and Yu, Jie
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the COUNTERFACT and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at https://github.com/xpq-tech/PMET., Comment: AAAI24
Published: 2023

10. A Modular Hierarchical Model for Paper Quality Evaluation

Author: Deng, Xi, primary, Li, Shasha, additional, Yu, Jie, additional, Ma, Jun, additional, Ji, Bin, additional, Lin, Wuhang, additional, Song, Shezheng, additional, and Yi, Zibo, additional
Published: 2023
Full Text: View/download PDF

11. SEED: A Cross-Layer Semantic Enhanced SLU Model With Role Context Differentiated Fusion

Author: Wang, Changjian, primary, Zhang, Dongsong, additional, Song, Shezheng, additional, Huang, Zhen, additional, and Peng, Yuxing, additional
Published: 2021
Full Text: View/download PDF

12. T-Mask: An Active and Accurate Dialogue State Tracking with Token Mask Prediction

Author: Song, Shezheng, primary, Wang, Changjian, additional, Zhang, Dongsong, additional, Huang, Zhen, additional, and Peng, Yuxing, additional
Published: 2021
Full Text: View/download PDF

13. USET : A network based on Utterance hidden State transfEr for Task-oriented dialogue

Author: Song, Shezheng, primary, Wang, Changjian, additional, Zhang, Dongsong, additional, Peng, Yuxing, additional, Yuan, Yuan, additional, and Luo, Li, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Song, Shezheng"'

1. Identifying Knowledge Editing Types in Large Language Models

2. DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

3. PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

4. DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking

5. SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

6. A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

7. How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model

8. DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

9. PMET: Precise Model Editing in a Transformer

10. A Modular Hierarchical Model for Paper Quality Evaluation

11. SEED: A Cross-Layer Semantic Enhanced SLU Model With Role Context Differentiated Fusion

12. T-Mask: An Active and Accurate Dialogue State Tracking with Token Mask Prediction

13. USET : A network based on Utterance hidden State transfEr for Task-oriented dialogue

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Database

Publisher

13 results on '"Song, Shezheng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources