Author: "Lv, Tangjie" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lv, Tangjie"' showing total 92 results

Start Over Author "Lv, Tangjie"

92 results on '"Lv, Tangjie"'

1. CharacterBench: Benchmarking Character Customization of Large Language Models

Author: Zhou, Jinfeng, Huang, Yongkang, Wen, Bosi, Bi, Guanqun, Chen, Yuxuan, Ke, Pei, Chen, Zhuang, Xiao, Xiyao, Peng, Libiao, Tang, Kuntian, Zhang, Rongsheng, Zhang, Le, Lv, Tangjie, Hu, Zhipeng, Wang, Hongning, and Huang, Minlie
Subjects: Computer Science - Computation and Language
Abstract: Character-based dialogue (aka role-playing) enables users to freely customize characters for interaction, which often relies on LLMs, raising the need to evaluate LLMs' character customization capability. However, existing benchmarks fail to ensure a robust evaluation as they often only involve a single character category or evaluate limited dimensions. Moreover, the sparsity of character features in responses makes feature-focused generative evaluation both ineffective and inefficient. To address these issues, we propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters from 25 detailed character categories. We define 11 dimensions of 6 aspects, classified as sparse and dense dimensions based on whether character features evaluated by specific dimensions manifest in each response. We enable effective and efficient evaluation by crafting tailored queries for each dimension to induce characters' responses related to specific dimensions. Further, we develop CharacterJudge model for cost-effective and stable evaluations. Experiments show its superiority over SOTA automatic judges (e.g., GPT-4) and our benchmark's potential to optimize LLMs' character customization. Our repository is at https://github.com/thu-coai/CharacterBench., Comment: AAAI 2025
Published: 2024

2. StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

Author: Zhang, Jinlu, Tang, Jiji, Zhang, Rongsheng, Lv, Tangjie, and Sun, Xiaoshuai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Story visualization has gained increasing attention in artificial intelligence. However, existing methods still struggle with maintaining a balance between character identity preservation and text-semantics alignment, largely due to a lack of detailed semantic modeling of the story scene. To tackle this challenge, we propose a novel knowledge graph, namely Character Graph (\textbf{CG}), which comprehensively represents various story-related knowledge, including the characters, the attributes related to characters, and the relationship between characters. We then introduce StoryWeaver, an image generator that achieve Customization via Character Graph (\textbf{C-CG}), capable of consistent story visualization with rich text semantics. To further improve the multi-character generation performance, we incorporate knowledge-enhanced spatial guidance (\textbf{KE-SG}) into StoryWeaver to precisely inject character semantics into generation. To validate the effectiveness of our proposed method, extensive experiments are conducted using a new benchmark called TBC-Bench. The experiments confirm that our StoryWeaver excels not only in creating vivid visual story plots but also in accurately conveying character identities across various scenarios with considerable storage efficiency, \emph{e.g.}, achieving an average increase of +9.03\% DINO-I and +13.44\% CLIP-T. Furthermore, ablation experiments are conducted to verify the superiority of the proposed module. Codes and datasets are released at https://github.com/Aria-Zhangjl/StoryWeaver.
Published: 2024

3. Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards

Author: Jiang, Zhaohui, Feng, Xuening, Weng, Paul, Zhu, Yifei, Song, Yan, Zhou, Tianze, Hu, Yujing, Lv, Tangjie, and Fan, Changjie
Subjects: Computer Science - Machine Learning
Abstract: In practice, reinforcement learning (RL) agents are often trained with a possibly imperfect proxy reward function, which may lead to a human-agent alignment issue (i.e., the learned policy either converges to non-optimal performance with low cumulative rewards, or achieves high cumulative rewards but in undesired manner). To tackle this issue, we consider a framework where a human labeler can provide additional feedback in the form of corrective actions, which expresses the labeler's action preferences although this feedback may possibly be imperfect as well. In this setting, to obtain a better-aligned policy guided by both learning signals, we propose a novel value-based deep RL algorithm called Iterative learning from Corrective actions and Proxy rewards (ICoPro), which cycles through three phases: (1) Solicit sparse corrective actions from a human labeler on the agent's demonstrated trajectories; (2) Incorporate these corrective actions into the Q-function using a margin loss to enforce adherence to labeler's preferences; (3) Train the agent with standard RL losses regularized with a margin loss to learn from proxy rewards and propagate the Q-values learned from human feedback. Moreover, another novel design in our approach is to integrate pseudo-labels from the target Q-network to reduce human labor and further stabilize training. We experimentally validate our proposition on a variety of tasks (Atari games and autonomous driving on highway). On the one hand, using proxy rewards with different levels of imperfection, our method can better align with human preferences and is more sample-efficient than baseline methods. On the other hand, facing corrective actions with different types of imperfection, our method can overcome the non-optimality of this feedback thanks to the guidance from proxy reward.
Published: 2024

4. StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads

Author: Wang, Suzhen, Ma, Yifeng, Ding, Yu, Hu, Zhipeng, Fan, Changjie, Lv, Tangjie, Deng, Zhidong, and Yu, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Individuals have unique facial expression and head pose styles that reflect their personalized speaking styles. Existing one-shot talking head methods cannot capture such personalized characteristics and therefore fail to produce diverse speaking styles in the final videos. To address this challenge, we propose a one-shot style-controllable talking face generation method that can obtain speaking styles from reference speaking videos and drive the one-shot portrait to speak with the reference speaking styles and another piece of audio. Our method aims to synthesize the style-controllable coefficients of a 3D Morphable Model (3DMM), including facial expressions and head movements, in a unified framework. Specifically, the proposed framework first leverages a style encoder to extract the desired speaking styles from the reference videos and transform them into style codes. Then, the framework uses a style-aware decoder to synthesize the coefficients of 3DMM from the audio input and style codes. During decoding, our framework adopts a two-branch architecture, which generates the stylized facial expression coefficients and stylized head movement coefficients, respectively. After obtaining the coefficients of 3DMM, an image renderer renders the expression coefficients into a specific person's talking-head video. Extensive experiments demonstrate that our method generates visually authentic talking head videos with diverse speaking styles from only one portrait image and an audio clip., Comment: TPAMI 2024. arXiv admin note: text overlap with arXiv:2301.01081
Published: 2024

5. Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Author: Hu, Hao, Yang, Yiqin, Ye, Jianing, Wu, Chengjie, Mai, Ziqing, Hu, Yujing, Lv, Tangjie, Fan, Changjie, Zhao, Qianchuan, and Zhang, Chongjie
Subjects: Computer Science - Machine Learning
Abstract: Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data., Comment: Forty-first International Conference on Machine Learning (ICML), 2024
Published: 2024

6. vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement

Author: Zhu, Yiwen, Liu, Jinyi, Wei, Wenya, Fu, Qianyi, Hu, Yujing, Fang, Zhou, An, Bo, Hao, Jianye, Lv, Tangjie, and Fan, Changjie
Subjects: Computer Science - Machine Learning
Abstract: Reinforcement Learning (RL) is a widely employed technique in decision-making problems, encompassing two fundamental operations -- policy evaluation and policy improvement. Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency. However, when using multiple critics, the actor in the policy improvement process can obtain different gradients. Previous studies have combined these gradients without considering their disagreements. Therefore, optimizing the policy improvement process is crucial to enhance learning efficiency. This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement. We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process. Through measuring the disagreement among gradients, we find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process. Building on this analysis, we propose a method called von Mises-Fisher Experience Resampling (vMFER), which optimizes the policy improvement process by resampling transitions and assigning higher confidence to transitions with lower uncertainty of gradient directions. Our experiments demonstrate that vMFER significantly outperforms the benchmark and is particularly well-suited for ensemble structures in RL., Comment: Accepted by IJCAI 2024, with appendix
Published: 2024

7. Preconditioned Nonlinear Conjugate Gradient Method for Real-time Interior-point Hyperelasticity

Author: Shen, Xing, Cai, Runyuan, Bi, Mengxiao, and Lv, Tangjie
Subjects: Mathematics - Optimization and Control, Computer Science - Graphics
Abstract: The linear conjugate gradient method is widely used in physical simulation, particularly for solving large-scale linear systems derived from Newton's method. The nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization, which is extensively utilized in solving practical large-scale unconstrained optimization problems. However, it is rarely discussed in physical simulation due to the requirement of multiple vector-vector dot products. Fortunately, with the advancement of GPU-parallel acceleration techniques, it is no longer a bottleneck. In this paper, we propose a Jacobi preconditioned nonlinear conjugate gradient method for elastic deformation using interior-point methods. Our method is straightforward, GPU-parallelizable, and exhibits fast convergence and robustness against large time steps. The employment of the barrier function in interior-point methods necessitates continuous collision detection per iteration to obtain a penetration-free step size, which is computationally expensive and challenging to parallelize on GPUs. To address this issue, we introduce a line search strategy that deduces an appropriate step size in a single pass, eliminating the need for additional collision detection. Furthermore, we simplify and accelerate the computations of Jacobi preconditioning and Hessian-vector product for hyperelasticity and barrier function. Our method can accurately simulate objects comprising over 100,000 tetrahedra in complex self-collision scenarios at real-time speeds.
Published: 2024

8. Learning a compact embedding for fine-grained few-shot static gesture recognition

Author: Hu, Zhipeng, Qiu, Feng, Sun, Haodong, Zhang, Wei, Ding, Yu, Lv, Tangjie, and Fan, Changjie
Published: 2024
Full Text: View/download PDF

9. Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller

Author: Zang, Chuanqi, Tang, Jiji, Zhang, Rongsheng, Zhao, Zeng, Lv, Tangjie, Pei, Mingtao, and Liang, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Storytelling aims to generate reasonable and vivid narratives based on an ordered image stream. The fidelity to the image story theme and the divergence of story plots attract readers to keep reading. Previous works iteratively improved the alignment of multiple modalities but ultimately resulted in the generation of simplistic storylines for image streams. In this work, we propose a new pipeline, termed LLaMS, to generate multimodal human-level stories that are embodied in expressiveness and consistency. Specifically, by fully exploiting the commonsense knowledge within the LLM, we first employ a sequence data auto-enhancement strategy to enhance factual content expression and leverage a textual reasoning architecture for expressive story generation and prediction. Secondly, we propose SQ-Adatpter module for story illustration generation which can maintain sequence consistency. Numerical results are conducted through human evaluation to verify the superiority of proposed LLaMS. Evaluations show that LLaMS achieves state-of-the-art storytelling performance and 86% correlation and 100% consistency win rate as compared with previous SOTA methods. Furthermore, ablation experiments are conducted to verify the effectiveness of proposed sequence data enhancement and SQ-Adapter.
Published: 2024

10. A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment

Author: Wang, Fei, Liu, Haoyu, Bi, Haoyang, Shen, Xiangzhuang, Zhu, Renyu, Wu, Runze, Lin, Minmin, Lv, Tangjie, Fan, Changjie, Liu, Qi, Huang, Zhenya, and Chen, Enhong
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: For the purpose of efficient and cost-effective large-scale data labeling, crowdsourcing is increasingly being utilized. To guarantee the quality of data labeling, multiple annotations need to be collected for each data sample, and truth inference algorithms have been developed to accurately infer the true labels. Despite previous studies having released public datasets to evaluate the efficacy of truth inference algorithms, these have typically focused on a single type of crowdsourcing task and neglected the temporal information associated with workers' annotation activities. These limitations significantly restrict the practical applicability of these algorithms, particularly in the context of long-term and online truth inference. In this paper, we introduce a substantial crowdsourcing annotation dataset collected from a real-world crowdsourcing platform. This dataset comprises approximately two thousand workers, one million tasks, and six million annotations. The data was gathered over a period of approximately six months from various types of tasks, and the timestamps of each annotation were preserved. We analyze the characteristics of the dataset from multiple perspectives and evaluate the effectiveness of several representative truth inference algorithms on this dataset. We anticipate that this dataset will stimulate future research on tracking workers' abilities over time in relation to different types of tasks, as well as enhancing online truth inference.
Published: 2024

11. Crafting a Good Prompt or Providing Exemplary Dialogues? A Study of In-Context Learning for Persona-based Dialogue Generation

Author: Pu, Jiashu, Wan, Yajing, Zhang, Yuru, Chen, Jing, Cheng, Ling, Shao, Qian, Chang, Yongzhu, Lv, Tangjie, and Zhang, Rongsheng
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Previous in-context learning (ICL) research has focused on tasks such as classification, machine translation, text2table, etc., while studies on whether ICL can improve human-like dialogue generation are scarce. Our work fills this gap by systematically investigating the ICL capabilities of large language models (LLMs) in persona-based dialogue generation, conducting extensive experiments on high-quality real human Chinese dialogue datasets. From experimental results, we draw three conclusions: 1) adjusting prompt instructions is the most direct, effective, and economical way to improve generation quality; 2) randomly retrieving demonstrations (demos) achieves the best results, possibly due to the greater diversity and the amount of effective information; counter-intuitively, retrieving demos with a context identical to the query performs the worst; 3) even when we destroy the multi-turn associations and single-turn semantics in the demos, increasing the number of demos still improves dialogue performance, proving that LLMs can learn from corrupted dialogue demos. Previous explanations of the ICL mechanism, such as $n$-gram induction head, cannot fully account for this phenomenon.
Published: 2024

12. Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Author: Liu, Renshuai, Ma, Bowen, Zhang, Wei, Hu, Zhipeng, Fan, Changjie, Lv, Tangjie, Ding, Yu, and Cheng, Xuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis. Our expression control is so sophisticated that it can be specialized by the fine-grained emotional vocabulary. We devise a novel diffusion model that can undertake the task of simultaneously face swapping and reenactment. Due to the entanglement of identity and expression, it's nontrivial to separately and precisely control them in one framework, thus has not been explored yet. To overcome this, we propose several innovative designs in the conditional diffusion model, including balancing identity and expression encoder, improved midpoint sampling, and explicitly background conditioning. Extensive experiments have demonstrated the controllability and scalability of the proposed framework, in comparison with state-of-the-art text-to-image, face swapping, and face reenactment methods.
Published: 2024

13. AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model

Author: Dong, Zibin, Yuan, Yifu, Hao, Jianye, Ni, Fei, Mu, Yao, Zheng, Yan, Hu, Yujing, Lv, Tangjie, Fan, Changjie, and Hu, Zhipeng
Subjects: Computer Science - Artificial Intelligence
Abstract: Aligning agent behaviors with diverse human preferences remains a challenging problem in reinforcement learning (RL), owing to the inherent abstractness and mutability of human preferences. To address these issues, we propose AlignDiff, a novel framework that leverages RL from Human Feedback (RLHF) to quantify human preferences, covering abstractness, and utilizes them to guide diffusion planning for zero-shot behavior customizing, covering mutability. AlignDiff can accurately match user-customized behaviors and efficiently switch from one to another. To build the framework, we first establish the multi-perspective human feedback datasets, which contain comparisons for the attributes of diverse behaviors, and then train an attribute strength model to predict quantified relative strengths. After relabeling behavioral datasets with relative strengths, we proceed to train an attribute-conditioned diffusion model, which serves as a planner with the attribute strength model as a director for preference aligning at the inference phase. We evaluate AlignDiff on various locomotion tasks and demonstrate its superior performance on preference matching, switching, and covering compared to other baselines. Its capability of completing unseen downstream tasks under human instructions also showcases the promising potential for human-AI collaboration. More visualization videos are released on https://aligndiff.github.io/.
Published: 2023

14. Examining the Effect of Pre-training on Time Series Classification

Author: Pu, Jiashu, Zhao, Shiwei, Cheng, Ling, Chang, Yongzhu, Wu, Runze, Lv, Tangjie, and Zhang, Rongsheng
Subjects: Computer Science - Machine Learning
Abstract: Although the pre-training followed by fine-tuning paradigm is used extensively in many fields, there is still some controversy surrounding the impact of pre-training on the fine-tuning process. Currently, experimental findings based on text and image data lack consensus. To delve deeper into the unsupervised pre-training followed by fine-tuning paradigm, we have extended previous research to a new modality: time series. In this study, we conducted a thorough examination of 150 classification datasets derived from the Univariate Time Series (UTS) and Multivariate Time Series (MTS) benchmarks. Our analysis reveals several key conclusions. (i) Pre-training can only help improve the optimization process for models that fit the data poorly, rather than those that fit the data well. (ii) Pre-training does not exhibit the effect of regularization when given sufficient training time. (iii) Pre-training can only speed up convergence if the model has sufficient ability to fit the data. (iv) Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume, such as faster convergence. (v) While both the pre-training task and the model structure determine the effectiveness of the paradigm on a given dataset, the model structure plays a more significant role.
Published: 2023

15. Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

Author: Wang, Haowei, Tang, Jiji, Ji, Jiayi, Sun, Xiaoshuai, Zhang, Rongsheng, Ma, Yiwei, Zhao, Minda, Li, Lincheng, zhao, zeng, Lv, Tangjie, and Ji, Rongrong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years, 3D understanding has turned to 2D vision-language pre-trained models to overcome data scarcity challenges. However, existing methods simply transfer 2D alignment strategies, aligning 3D representations with single-view 2D images and coarse-grained parent category text. These approaches introduce information degradation and insufficient synergy issues, leading to performance loss. Information degradation arises from overlooking the fact that a 3D representation should be equivalent to a series of multi-view images and more fine-grained subcategory text. Insufficient synergy neglects the idea that a robust 3D representation should align with the joint vision-language space, rather than independently aligning with each modality. In this paper, we propose a multi-view joint modality modeling approach, termed JM3D, to obtain a unified representation for point cloud, text, and image. Specifically, a novel Structured Multimodal Organizer (SMO) is proposed to address the information degradation issue, which introduces contiguous multi-view images and hierarchical text to enrich the representation of vision and language modalities. A Joint Multi-modal Alignment (JMA) is designed to tackle the insufficient synergy problem, which models the joint modality by incorporating language knowledge into the visual modality. Extensive experiments on ModelNet40 and ScanObjectNN demonstrate the effectiveness of our proposed method, JM3D, which achieves state-of-the-art performance in zero-shot 3D classification. JM3D outperforms ULIP by approximately 4.3% on PointMLP and achieves an improvement of up to 6.5% accuracy on PointNet++ in top-1 accuracy for zero-shot 3D classification on ModelNet40. The source code and trained models for all our experiments are publicly available at https://github.com/Mr-Neko/JM3D., Comment: ACM MM 2023, 3D Understanding, JM3D
Published: 2023
Full Text: View/download PDF

16. Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

Author: Zhu, Renyu, Liu, Haoyu, Wu, Runze, Lin, Minmin, Lv, Tangjie, Fan, Changjie, and Wang, Haobo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: In this paper, we investigate the problem of learning with noisy labels in real-world annotation scenarios, where noise can be categorized into two types: factual noise and ambiguity noise. To better distinguish these noise types and utilize their semantics, we propose a novel sample selection-based approach for noisy label learning, called Proto-semi. Proto-semi initially divides all samples into the confident and unconfident datasets via warm-up. By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics. Subsequently, the distances between the unconfident samples and the prototype vectors are calculated to facilitate noise classification. Based on these distances, the labels are either corrected or retained, resulting in the refinement of the confident and unconfident datasets. Finally, we introduce a semi-supervised learning method to enhance training. Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels. Meanwhile, the prototype-based repartitioning strategy is shown to be effective in mitigating the adverse impact of label noise. Our code and data are available at https://github.com/fuxiAIlab/ProtoSemi., Comment: Submitted to AAAI 2024
Published: 2023

17. Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning

Author: Liu, Jinyi, Ma, Yi, Hao, Jianye, Hu, Yujing, Zheng, Yan, Lv, Tangjie, and Fan, Changjie
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In recent years, data-driven reinforcement learning (RL), also known as offline RL, have gained significant attention. However, the role of data sampling techniques in offline RL has been overlooked despite its potential to enhance online RL performance. Recent research suggests applying sampling techniques directly to state-transitions does not consistently improve performance in offline RL. Therefore, in this study, we propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories for more comprehensive information extraction from limited data. TR enhances learning efficiency by backward sampling of trajectories that optimizes the use of subsequent state information. Building on TR, we build the weighted critic target to avoid sampling unseen actions in offline training, and Prioritized Trajectory Replay (PTR) that enables more efficient trajectory sampling, prioritized by various trajectory priority metrics. We demonstrate the benefits of integrating TR and PTR with existing offline RL algorithms on D4RL. In summary, our research emphasizes the significance of trajectory-based data sampling techniques in enhancing the efficiency and performance of offline RL algorithms.
Published: 2023

18. FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping

Author: Zhang, Yu, Zeng, Hao, Ma, Bowen, Zhang, Wei, Zhang, Zhimeng, Ding, Yu, Lv, Tangjie, and Fan, Changjie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This work proposes a novel face-swapping framework FlowFace++, utilizing explicit semantic flow supervision and end-to-end architecture to facilitate shape-aware face-swapping. Specifically, our work pretrains a facial shape discriminator to supervise the face swapping network. The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results. The face swapping network is a stack of a pre-trained face-masked autoencoder (MAE), a cross-attention fusion module, and a convolutional decoder. The MAE provides a fine-grained facial image representation space, which is unified for the target and source faces and thus facilitates final realistic results. The cross-attention fusion module carries out the source-to-target face swapping in a fine-grained latent space while preserving other attributes of the target image (e.g. expression, head pose, hair, background, illumination, etc). Lastly, the convolutional decoder further synthesizes the swapping results according to the face-swapping latent embedding from the cross-attention fusion module. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace++ outperforms the state-of-the-art significantly, particularly while the source face is obstructed by uneven lighting or angle offset., Comment: arXiv admin note: text overlap with arXiv:2212.02797
Published: 2023

19. Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations

Author: Huang, Yufeng, Tang, Jiji, Chen, Zhuo, Zhang, Rongsheng, Zhang, Xinfeng, Chen, Weijie, Zhao, Zeng, Zhao, Zhou, Lv, Tangjie, Hu, Zhipeng, and Zhang, Wen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Multimedia
Abstract: Large-scale vision-language pre-training has achieved significant performance in multi-modal understanding and generation tasks. However, existing methods often perform poorly on image-text matching tasks that require structured representations, i.e., representations of objects, attributes, and relations. As illustrated in Fig.~reffig:case (a), the models cannot make a distinction between ``An astronaut rides a horse" and ``A horse rides an astronaut". This is because they fail to fully leverage structured knowledge when learning representations in multi-modal scenarios. In this paper, we present an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge (SGK) to enhance multi-modal structured representations. Firstly, we use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations. Moreover, a Knowledge-Enhance Encoder (KEE) is proposed to leverage SGK as input to further enhance structured representations. To verify the effectiveness of the proposed framework, we pre-train our model with the aforementioned approaches and conduct experiments on downstream tasks. Experimental results demonstrate that Structure-CLIP achieves state-of-the-art (SOTA) performance on VG-Attribution and VG-Relation datasets, with 12.5% and 4.1% ahead of the multi-modal SOTA model respectively. Meanwhile, the results on MSCOCO indicate that Structure-CLIP significantly enhances the structured representations while maintaining the ability of general representations. Our code is available at https://github.com/zjukg/Structure-CLIP., Comment: AAAI 2024, https://github.com/zjukg/Structure-CLIP
Published: 2023

20. TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles

Author: Ma, Yifeng, Wang, Suzhen, Ding, Yu, Ma, Bowen, Lv, Tangjie, Fan, Changjie, Hu, Zhipeng, Deng, Zhidong, and Yu, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Audio-driven talking head generation has drawn growing attention. To produce talking head videos with desired facial expressions, previous methods rely on extra reference videos to provide expression information, which may be difficult to find and hence limits their usage. In this work, we propose TalkCLIP, a framework that can generate talking heads where the expressions are specified by natural language, hence allowing for specifying expressions more conveniently. To model the mapping from text to expressions, we first construct a text-video paired talking head dataset where each video has diverse text descriptions that depict both coarse-grained emotions and fine-grained facial movements. Leveraging the proposed dataset, we introduce a CLIP-based style encoder that projects natural language-based descriptions to the representations of expressions. TalkCLIP can even infer expressions for descriptions unseen during training. TalkCLIP can also use text to modulate expression intensity and edit expressions. Extensive experiments demonstrate that TalkCLIP achieves the advanced capability of generating photo-realistic talking heads with vivid facial expressions guided by text descriptions.
Published: 2023

21. Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Author: Milani, Stephanie, Kanervisto, Anssi, Ramanauskas, Karolis, Schulhoff, Sander, Houghton, Brandon, Mohanty, Sharada, Galbraith, Byron, Chen, Ke, Song, Yan, Zhou, Tianze, Yu, Bingquan, Liu, He, Guan, Kai, Hu, Yujing, Lv, Tangjie, Malato, Federico, Leopold, Florian, Raut, Amogh, Hautamäki, Ville, Melnik, Andrew, Ishida, Shu, Henriques, João F., Klassert, Robert, Laurito, Walter, Novoseller, Ellen, Goecks, Vinicius G., Waytowich, Nicholas, Watkins, David, Miller, Josh, and Shah, Rohin
Subjects: Computer Science - Artificial Intelligence
Abstract: To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement.
Published: 2023

22. DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video

Author: Zhang, Zhimeng, Hu, Zhipeng, Deng, Wenjin, Fan, Changjie, Lv, Tangjie, and Ding, Yu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: For few-shot learning, it is still a critical challenge to realize photo-realistic face visually dubbing on high-resolution videos. Previous works fail to generate high-fidelity dubbing results. To address the above problem, this paper proposes a Deformation Inpainting Network (DINet) for high-resolution face visually dubbing. Different from previous works relying on multiple up-sample layers to directly generate pixels from latent embeddings, DINet performs spatial deformation on feature maps of reference images to better preserve high-frequency textural details. Specifically, DINet consists of one deformation part and one inpainting part. In the first part, five reference facial images adaptively perform spatial deformation to create deformed feature maps encoding mouth shapes at each frame, in order to align with the input driving audio and also the head poses of the input source images. In the second part, to produce face visually dubbing, a feature decoder is responsible for adaptively incorporating mouth movements from the deformed feature maps and other attributes (i.e., head pose and upper facial expression) from the source feature maps together. Finally, DINet achieves face visually dubbing with rich textural details. We conduct qualitative and quantitative comparisons to validate our DINet on high-resolution videos. The experimental results show that our method outperforms state-of-the-art works., Comment: AAAI-23, 9pages
Published: 2023

23. Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning

Author: Wang, Rundong, Zheng, Longtao, Qiu, Wei, He, Bowei, An, Bo, Rabinovich, Zinovi, Hu, Yujing, Chen, Yingfeng, Lv, Tangjie, and Fan, Changjie
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: Recent advances in multi-agent reinforcement learning (MARL) allow agents to coordinate their behaviors in complex environments. However, common MARL algorithms still suffer from scalability and sparse reward issues. One promising approach to resolving them is automatic curriculum learning (ACL). ACL involves a student (curriculum learner) training on tasks of increasing difficulty controlled by a teacher (curriculum generator). Despite its success, ACL's applicability is limited by (1) the lack of a general student framework for dealing with the varying number of agents across tasks and the sparse reward problem, and (2) the non-stationarity of the teacher's task due to ever-changing student strategies. As a remedy for ACL, we introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination. Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents. In addition, we model the teacher as a contextual bandit conditioned by student policies, enabling a team of agents to change its size while still retaining previously acquired skills. We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound. Empirical results show that our method improves the performance, scalability and sample efficiency in several MARL environments.
Published: 2023

24. StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Author: Ma, Yifeng, Wang, Suzhen, Hu, Zhipeng, Fan, Changjie, Lv, Tangjie, Ding, Yu, Deng, Zhidong, and Yu, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk., Comment: Accepted at AAAI2023 as Oral. Demo: https://youtu.be/mO2Tjcwr4u8
Published: 2023

25. TCFimt: Temporal Counterfactual Forecasting from Individual Multiple Treatment Perspective

Author: Xi, Pengfei, Wang, Guifeng, Hu, Zhipeng, Xiong, Yu, Gong, Mingming, Huang, Wei, Wu, Runze, Ding, Yu, Lv, Tangjie, Fan, Changjie, and Feng, Xiangnan
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Determining causal effects of temporal multi-intervention assists decision-making. Restricted by time-varying bias, selection bias, and interactions of multiple interventions, the disentanglement and estimation of multiple treatment effects from individual temporal data is still rare. To tackle these challenges, we propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt). TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions which further improves estimation accuracy. Through implementing experiments on two real-world datasets from distinct fields, the proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
Published: 2022

26. FlowFace: Semantic Flow-guided Shape-aware Face Swapping

Author: Zeng, Hao, Zhang, Wei, Fan, Changjie, Lv, Tangjie, Wang, Suzhen, Zhang, Zhimeng, Ma, Bowen, Li, Lincheng, Ding, Yu, and Yu, Xin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: In this work, we propose a semantic flow-guided two-stage framework for shape-aware face swapping, namely FlowFace. Unlike most previous methods that focus on transferring the source inner facial features but neglect facial contours, our FlowFace can transfer both of them to a target face, thus leading to more realistic face swapping. Concretely, our FlowFace consists of a face reshaping network and a face swapping network. The face reshaping network addresses the shape outline differences between the source and target faces. It first estimates a semantic flow (i.e., face shape differences) between the source and the target face, and then explicitly warps the target face shape with the estimated semantic flow. After reshaping, the face swapping network generates inner facial features that exhibit the identity of the source face. We employ a pre-trained face masked autoencoder (MAE) to extract facial features from both the source face and the target face. In contrast to previous methods that use identity embedding to preserve identity information, the features extracted by our encoder can better capture facial appearances and identity information. Then, we develop a cross-attention fusion module to adaptively fuse inner facial features from the source face with the target facial attributes, thus leading to better identity preservation. Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our FlowFace outperforms the state-of-the-art significantly.
Published: 2022

27. Facial Action Unit Detection and Intensity Estimation from Self-supervised Representation

Author: Ma, Bowen, An, Rudong, Zhang, Wei, Ding, Yu, Zhao, Zeng, Zhang, Rongsheng, Lv, Tangjie, Fan, Changjie, and Hu, Zhipeng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This paper introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images without additional data annotations. Then after being fine-tuned on AU datasets, MAE-Face exhibits convincing performance for both AU detection and AU intensity estimation, achieving a new state-of-the-art on nearly all the evaluation results. Further investigation shows that MAE-Face achieves decent performance even when fine-tuned on only 1\% of the AU training set, strongly proving its robustness and generalization performance.
Published: 2022

28. Investigating Accuracy-Novelty Performance for Graph-based Collaborative Filtering

Author: Zhao, Minghao, Wu, Le, Liang, Yile, Chen, Lei, Zhang, Jian, Deng, Qilin, Wang, Kai, Shen, Xudong, Lv, Tangjie, and Wu, Runze
Subjects: Computer Science - Information Retrieval
Abstract: Recent years have witnessed the great accuracy performance of graph-based Collaborative Filtering (CF) models for recommender systems. By taking the user-item interaction behavior as a graph, these graph-based CF models borrow the success of Graph Neural Networks (GNN), and iteratively perform neighborhood aggregation to propagate the collaborative signals. While conventional CF models are known for facing the challenges of the popularity bias that favors popular items, one may wonder "Whether the existing graph-based CF models alleviate or exacerbate popularity bias of recommender systems?" To answer this question, we first investigate the two-fold performances w.r.t. accuracy and novelty for existing graph-based CF methods. The empirical results show that symmetric neighborhood aggregation adopted by most existing graph-based CF models exacerbate the popularity bias and this phenomenon becomes more serious as the depth of graph propagation increases. Further, we theoretically analyze the cause of popularity bias for graph-based CF. Then, we propose a simple yet effective plugin, namely r-AdjNorm, to achieve an accuracy-novelty trade-off by controlling the normalization strength in the neighborhood aggregation process. Meanwhile, r-AdjNorm can be smoothly applied to the existing graph-based CF backbones without additional computation. Finally, experimental results on three benchmark datasets show that our proposed method can improve novelty without sacrificing accuracy under various graph-based CF backbones., Comment: To appear in SIGIR 2022
Published: 2022
Full Text: View/download PDF

29. Deep learning applications in games: a survey from a data perspective

Author: Hu, Zhipeng, Ding, Yu, Wu, Runze, Li, Lincheng, Zhang, Rongsheng, Hu, Yujing, Qiu, Feng, Zhang, Zhimeng, Wang, Kai, Zhao, Shiwei, Zhang, Yongqiang, Jiang, Ji, Xi, Yadong, Pu, Jiashu, Zhang, Wei, Wang, Suzhen, Chen, Ke, Zhou, Tianze, Chen, Jiarui, Song, Yan, Lv, Tangjie, and Fan, Changjie
Published: 2023
Full Text: View/download PDF

30. Fever Basketball: A Complex, Flexible, and Asynchronized Sports Game Environment for Multi-agent Reinforcement Learning

Author: Jia, Hangtian, Hu, Yujing, Chen, Yingfeng, Ren, Chunxu, Lv, Tangjie, Fan, Changjie, and Zhang, Chongjie
Subjects: Computer Science - Artificial Intelligence
Abstract: The development of deep reinforcement learning (DRL) has benefited from the emergency of a variety type of game environments where new challenging problems are proposed and new algorithms can be tested safely and quickly, such as Board games, RTS, FPS, and MOBA games. However, many existing environments lack complexity and flexibility and assume the actions are synchronously executed in multi-agent settings, which become less valuable. We introduce the Fever Basketball game, a novel reinforcement learning environment where agents are trained to play basketball game. It is a complex and challenging environment that supports multiple characters, multiple positions, and both the single-agent and multi-agent player control modes. In addition, to better simulate real-world basketball games, the execution time of actions differs among players, which makes Fever Basketball a novel asynchronized environment. We evaluate commonly used multi-agent algorithms of both independent learners and joint-action learners in three game scenarios with varying difficulties, and heuristically propose two baseline methods to diminish the extra non-stationarity brought by asynchronism in Fever Basketball Benchmarks. Besides, we propose an integrated curricula training (ICT) framework to better handle Fever Basketball problems, which includes several game-rule based cascading curricula learners and a coordination curricula switcher focusing on enhancing coordination within the team. The results show that the game remains challenging and can be used as a benchmark environment for studies like long-time horizon, sparse rewards, credit assignment, and non-stationarity, etc. in multi-agent settings., Comment: 7 pages,12 figures
Published: 2020

31. Reinforcement Learning Experience Reuse with Policy Residual Representation

Author: Zhou, Wen-Ji, Yu, Yang, Chen, Yingfeng, Guan, Kai, Lv, Tangjie, Fan, Changjie, and Zhou, Zhi-Hua
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple granularities. In this paper, we propose the policy residual representation (PRR) network, which can extract and store multiple levels of experience. PRR network is trained on a set of tasks with a multi-level architecture, where a module in each level corresponds to a subset of the tasks. Therefore, the PRR network represents the experience in a spectrum-like way. When training on a new task, PRR can provide different levels of experience for accelerating the learning. We experiment with the PRR network on a set of grid world navigation tasks, locomotion tasks, and fighting tasks in a video game. The results show that the PRR network leads to better reuse of experience and thus outperforms some state-of-the-art approaches., Comment: Conference version appears in IJCAI 2019
Published: 2019

32. Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Author: Tang, Hongyao, Hao, Jianye, Lv, Tangjie, Chen, Yingfeng, Zhang, Zongzhang, Jia, Hangtian, Ren, Chunxu, Zheng, Yan, Meng, Zhaopeng, Fan, Changjie, and Wang, Li
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: Multiagent reinforcement learning (MARL) is commonly considered to suffer from non-stationary environments and exponentially increasing policy space. It would be even more challenging when rewards are sparse and delayed over long trajectories. In this paper, we study hierarchical deep MARL in cooperative multiagent problems with sparse and delayed reward. With temporal abstraction, we decompose the problem into a hierarchy of different time scales and investigate how agents can learn high-level coordination based on the independent skills learned at the low level. Three hierarchical deep MARL architectures are proposed to learn hierarchical policies under different MARL paradigms. Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning. We empirically demonstrate the effectiveness of our approaches in two domains with extremely sparse feedback: (1) a variety of Multiagent Trash Collection tasks, and (2) a challenging online mobile game, i.e., Fever Basketball Defense.
Published: 2018

33. The MMO Economist: AI Empowers Robust, Healthy, and Sustainable P2W MMO Economies

Author: Zhao, Shiwei, primary, Yuan, Xi, additional, Wu, Runze, additional, Hu, Zhipeng, additional, Liu, Haoyu, additional, Wang, Kai, additional, Hu, Yujing, additional, Lv, Tangjie, additional, Fan, Changjie, additional, Tong, Xin, additional, Han, Jiangze, additional, Zheng, Yan, additional, and Hao, Jianye, additional
Published: 2024
Full Text: View/download PDF

34. StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads

Author: Wang, Suzhen, primary, Ma, Yifeng, additional, Ding, Yu, additional, Hu, Zhipeng, additional, Fan, Changjie, additional, Lv, Tangjie, additional, Deng, Zhidong, additional, and Yu, Xin, additional
Published: 2024
Full Text: View/download PDF

35. Facial Action Unit Detection and Intensity Estimation From Self-Supervised Representation.

Author: Ma, Bowen, An, Rudong, Zhang, Wei, Ding, Yu, Zhao, Zeng, Zhang, Rongsheng, Lv, Tangjie, Fan, Changjie, and Hu, Zhipeng
Abstract: As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This article introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images without additional data annotations. Then after being fine-tuned on AU datasets, MAE-Face exhibits convincing performance for both AU detection and AU intensity estimation, achieving a new state-of-the-art on nearly all the evaluation results. Further investigation shows that MAE-Face achieves decent performance even when fine-tuned on only 1% of the AU training set, strongly proving its robustness and generalization performance. The pre-trained model is available at our GitHub repository. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

Author: Wang, Haowei, primary, Tang, Jiji, additional, Ji, Jiayi, additional, Sun, Xiaoshuai, additional, Zhang, Rongsheng, additional, Ma, Yiwei, additional, Zhao, Minda, additional, Li, Lincheng, additional, Zhao, Zeng, additional, Lv, Tangjie, additional, and Ji, Rongrong, additional
Published: 2023
Full Text: View/download PDF

37. VESPA: A General System for Vision-Based Extrasensory Perception Anticheating in Online FPS Games

Author: Zhao, Shiwei, Qi, Jiaheng, Hu, Zhipeng, Yan, Han, Wu, Runze, Shen, Xudong, Lv, Tangjie, and Fan, Changjie
Abstract: Cheating is widespread in online games, particularly in competitive games, such as first-person shooter (FPS) games. One of the most common types of cheating is extrasensory perception (ESP), which involves illicitly obtaining visual information to gain an unfair advantage over normal players. To protect the gaming experience of legitimate players and the interests of game companies, there is an urgent need for anticheating applications. In this article, we propose a general system for ESP anticheating in online FPS games, considering the business characteristics and industrial applications. We present a vision-based anticheating framework that incorporates both supervised and unsupervised solutions for comprehensive cheating detection. Based on this framework, we design and deploy a dual-audit human-in-the-loop system for industrial gaming anticheating applications. We evaluate our proposed framework from multiple online and offline perspectives and demonstrate its practical significance with superior performance.
Published: 2024
Full Text: View/download PDF

38. PU-Detector: A PU Learning-based Framework for Real Money Trading Detection in MMORPG.

Author: Wang, Yilin, Zhao, Sha, Zhao, Shiwei, Wu, Runze, Xu, Yuhong, Tao, Jianrong, Lv, Tangjie, Li, Shijian, Hu, Zhipeng, and Pan, Gang
Subjects: MASSIVELY multiplayer online role-playing games, ENVIRONMENTAL sampling
Abstract: Massive multiplayer online role-playing games (MMORPG) have been becoming one of the most popular and exciting online games. In recent years, a cheating phenomenon called real money trading (RMT) has arisen and damaged the fantasy world in many ways. RMT is the sale of in-game items, currency, or even characters to earn real money, breaking the balance of the game economy ecosystem and damaging the game experience. Therefore, some studies have emerged to address the problem of RMT detection. However, they cannot well handle the label uncertainty problem in practice, where there are only labeled RMT samples (positive samples) and unlabeled samples, which could either be RMT samples or normal transactions (negative samples). Meanwhile, the trading relationship between RMTers is modeled in a simple way, leading to some normal transactions being falsely classified as RMT. In this article, we propose PU-Detector, a novel framework based on PU learning (learning from positive and unlabeled data) for RMT detection, considering the fact that there are only labeled RMT samples and other unlabeled transactions. We first automatically estimate the likelihood of one transaction being RMT by developing an improved PU learning method and proposing an assessment rule. Sequentially, we use the estimated likelihood as edge weight to construct a trading graph to learn trader representation. Then, with the trader representations and basic trading features, we detect RMT samples by the improved PU learning method. PU-Detector is evaluated on a large-scale real world dataset consisting of 33,809,956 transaction logs generated by 43,217 unique players. Compared with other approaches, it achieves the state-of-the-art performance and demonstrates its advantages in detecting underlying RMT samples. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Towards Long-term Annotators: A Supervised Label Aggregation Baseline

Author: Liu, Haoyu, Wang, Fei, Lin, Minmin, Wu, Runze, Zhu, Renyu, Zhao, Shiwei, Wang, Kai, Lv, Tangjie, Fan, Changjie, Liu, Haoyu, Wang, Fei, Lin, Minmin, Wu, Runze, Zhu, Renyu, Zhao, Shiwei, Wang, Kai, Lv, Tangjie, and Fan, Changjie
Abstract: Relying on crowdsourced workers, data crowdsourcing platforms are able to efficiently provide vast amounts of labeled data. Due to the variability in the annotation quality of crowd workers, modern techniques resort to redundant annotations and subsequent label aggregation to infer true labels. However, these methods require model updating during the inference, posing challenges in real-world implementation. Meanwhile, in recent years, many data labeling tasks have begun to require skilled and experienced annotators, leading to an increasing demand for long-term annotators. These annotators could leave substantial historical annotation records on the crowdsourcing platforms, which can benefit label aggregation, but are ignored by previous works. Hereby, in this paper, we propose a novel label aggregation technique, which does not need any model updating during inference and can extensively explore the historical annotation records. We call it SuperLA, a Supervised Label Aggregation method. Inside this model, we design three types of input features and a straightforward neural network structure to merge all the information together and subsequently produce aggregated labels. Based on comparison experiments conducted on 22 public datasets and 11 baseline methods, we find that SuperLA not only outperforms all those baselines in inference performance but also offers significant advantages in terms of efficiency.
Published: 2023

40. FlowFace: Semantic Flow-Guided Shape-Aware Face Swapping

Author: Zeng, Hao, primary, Zhang, Wei, additional, Fan, Changjie, additional, Lv, Tangjie, additional, Wang, Suzhen, additional, Zhang, Zhimeng, additional, Ma, Bowen, additional, Li, Lincheng, additional, Ding, Yu, additional, and Yu, Xin, additional
Published: 2023
Full Text: View/download PDF

41. DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video

Author: Zhang, Zhimeng, primary, Hu, Zhipeng, additional, Deng, Wenjin, additional, Fan, Changjie, additional, Lv, Tangjie, additional, and Ding, Yu, additional
Published: 2023
Full Text: View/download PDF

42. StyleTalk: One-Shot Talking Head Generation with Controllable Speaking Styles

Author: Ma, Yifeng, primary, Wang, Suzhen, additional, Hu, Zhipeng, additional, Fan, Changjie, additional, Lv, Tangjie, additional, Ding, Yu, additional, Deng, Zhidong, additional, and Yu, Xin, additional
Published: 2023
Full Text: View/download PDF

43. Structure-CLIP: Enhance Multi-modal Language Representations with Structure Knowledge

Author: Huang, Yufeng, Tang, Jiji, Chen, Zhuo, Zhang, Rongsheng, Zhang, Xinfeng, Chen, Weijie, Zhao, Zeng, Lv, Tangjie, Hu, Zhipeng, and Zhang, Wen
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Computer Science - Multimedia, Multimedia (cs.MM)
Abstract: Large-scale vision-language pre-training has shown promising advances on various downstream tasks and achieved significant performance in multi-modal understanding and generation tasks. However, existing methods often perform poorly on image-text matching tasks that require a detailed semantics understanding of the text. Although there have been some works on this problem, they do not sufficiently exploit the structural knowledge present in sentences to enhance multi-modal language representations, which leads to poor performance. In this paper, we present an end-to-end framework Structure-CLIP, which integrates latent detailed semantics from the text to enhance fine-grained semantic representations. Specifically, (1) we use scene graphs in order to pay more attention to the detailed semantic learning in the text and fully explore structured knowledge between fine-grained semantics, and (2) we utilize the knowledge-enhanced framework with the help of the scene graph to make full use of representations of structured knowledge. To verify the effectiveness of our proposed method, we pre-trained our models with the aforementioned approach and conduct experiments on different downstream tasks. Numerical results show that Structure-CLIP can often achieve state-of-the-art performance on both VG-Attribution and VG-Relation datasets. Extensive experiments show its components are effective and its predictions are interpretable, which proves that our proposed method can enhance detailed semantic representation well., Work in progress
Published: 2023

44. Just Adjust One Prompt: Enhancing In-Context Dialogue Scoring via Constructing the Optimal Subgraph of Demonstrations and Prompts

Author: Pu, Jiashu, primary, Cheng, Ling, additional, Fan, Lu, additional, Lv, Tangjie, additional, and Zhang, Rongsheng, additional
Published: 2023
Full Text: View/download PDF

45. VESPA: A General System for Vision-Based Extrasensory Perception Anti-Cheating in Online FPS Games

Author: Zhao, Shiwei, primary, Qi, Jiaheng, additional, Hu, Zhipeng, additional, Yan, Han, additional, Wu, Runze, additional, Shen, Xudong, additional, Lv, Tangjie, additional, and Fan, Changjie, additional
Published: 2023
Full Text: View/download PDF

46. VARIATIONAL ITERATIVE ALGORITHMS IN PHOTOACOUSTIC TOMOGRAPHY WITH VARIABLE SOUND SPEED

Author: Lv, Tangjie and Zhou, Tie
Published: 2014

47. Investigating Accuracy-Novelty Performance for Graph-based Collaborative Filtering

Author: Zhao, Minghao, primary, Wu, Le, additional, Liang, Yile, additional, Chen, Lei, additional, Zhang, Jian, additional, Deng, Qilin, additional, Wang, Kai, additional, Shen, Xudong, additional, Lv, Tangjie, additional, and Wu, Runze, additional
Published: 2022
Full Text: View/download PDF

48. EasySM: A Data-Driven Intelligent Decision Support System for Server Merge

Author: Qu, Manhu, primary, Huang, Jie, additional, Deng, Hao, additional, Wu, Runze, additional, Shen, Xudong, additional, Tao, Jianrong, additional, and Lv, Tangjie, additional
Published: 2022
Full Text: View/download PDF

49. Prior Aided Streaming Network for Multi-task Affective Analysis

Author: Zhang, Wei, primary, Guo, Zunhu, additional, Chen, Keyu, additional, Li, Lincheng, additional, Zhang, Zhimeng, additional, Ding, Yu, additional, Wu, Runze, additional, Lv, Tangjie, additional, and Fan, Changjie, additional
Published: 2021
Full Text: View/download PDF

50. Reinforcement Learning with Action-Specific Focuses in Video Games

Author: Yu Yang, Chen Yingfeng, Lv Tangjie, Guan Kai, Wang Meng, Song Yan, and Fan Changjie
Subjects: Action (philosophy), Human–computer interaction, Process (engineering), Order (exchange), Computer science, Benchmark (computing), Information processing, Reinforcement learning, State (computer science), Set (psychology)
Abstract: It is intuitive that different actions prefer different information in human decisions. However, classical reinforcement learning models use the same information process procedure for all actions. In order to imitate human decision-making process closer, in this paper we investigate a new policy model, i.e., Action-Specific Focuses (ASF) framework, which enables different focuses when learning different actions. In the ASF framework, the whole action set is taken as part of the queries for the attention module, in which state-dependent action-specific features can be generated. Through extracting different action-specific features, our approach enables the agent to learn the action-focus map for each action separately. The ASF framework is also different from the previous usages of attention mechanisms in reinforcement learning that are mostly based on the state. Experiments on the Atari benchmark show that ASF is able to improve the performance in various types of games. Moreover, the visualizations of the attention weights suggest that ASF can learn meaningful focuses when taking different actions.
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

92 results on '"Lv, Tangjie"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources