Author: "Zhou, Pengyuan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhou, Pengyuan"' showing total 208 results

Start Over Author "Zhou, Pengyuan"

208 results on '"Zhou, Pengyuan"'

1. A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

Author: Tang, Wei, Cao, Yixin, Ying, Jiahao, Wang, Bo, Zhao, Yuyue, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Computation and Language
Abstract: Retrieval-Augmented Generation (RAG) is an effective solution to supplement necessary knowledge to large language models (LLMs). Targeting its bottleneck of retriever performance, "generate-then-read" pipeline is proposed to replace the retrieval stage with generation from the LLM itself. Although promising, this research direction is underexplored and still cannot work in the scenario when source knowledge is given. In this paper, we formalize a general "A + B" framework with varying combinations of foundation models and types for systematic investigation. We explore the efficacy of the base and chat versions of LLMs and found their different functionalities suitable for generator A and reader B, respectively. Their combinations consistently outperform single models, especially in complex scenarios. Furthermore, we extend the application of the "A + B" framework to scenarios involving source documents through continuous learning, enabling the direct integration of external knowledge into LLMs. This approach not only facilitates effective acquisition of new knowledge but also addresses the challenges of safety and helpfulness post-adaptation. The paper underscores the versatility of the "A + B" framework, demonstrating its potential to enhance the practical application of LLMs across various domains., Comment: Accepted to ACL'24 (Findings) more...
Published: 2024

2. 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

Author: Zheng, Xu, Zhou, Pengyuan, Vasilakos, Athanasios V., and Wang, Lin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP2AM) to transfer knowledge at both prediction and prototype levels. RP$^2$AM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods., Comment: arXiv admin note: substantial text overlap with arXiv:2403.12505 more...
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

3. BotDGT: Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers

Author: He, Buyun, Yang, Yingguang, Wu, Qi, Liu, Hao, Yang, Renyu, Peng, Hao, Wang, Xiang, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence
Abstract: Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score., Comment: IJCAI 2024 more...
Published: 2024

4. DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Author: Li, Haoran, Shi, Haolin, Zhang, Wenli, Wu, Wenjun, Liao, Yong, Wang, Lin, Lee, Lik-hang, and Zhou, Pengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io . more...
Published: 2024

5. Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation

Author: Zheng, Xu, Zhou, Pengyuan, Vasilakos, Athanasios V., and Wang, Lin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper addresses an interesting yet challenging problem -- source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation -- given only a pinhole image-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is nontrivial due to the semantic mismatches, style discrepancies, and inevitable distortion of panoramic images. To this end, we propose a novel method that utilizes Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) with a fixed FoV to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, the distinct projection discrepancies between source and target domains impede the direct knowledge transfer; thus, we propose a panoramic prototype adaptation module (PPAM) to integrate panoramic prototypes from the extracted knowledge for adaptation. We then impose the loss constraints on both predictions and prototypes and propose a cross-dual attention module (CDAM) at the feature level to better align the spatial and channel characteristics across the domains and projections. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our method achieves significantly better performance than prior SFUDA methods for pinhole-to-panoramic adaptation., Comment: Accepted to CVPR 2024 more...
Published: 2024

6. DeCoF: Generated Video Detection via Frame Consistency: The First Benchmark Dataset

Author: Ma, Long, Zhang, Jiajia, Deng, Hongping, Zhang, Ningyu, Guo, Qinglang, Yu, Haiyang, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The escalating quality of video generated by advanced video generation methods results in new security challenges, while there have been few relevant research efforts: 1) There is no open-source dataset for generated video detection, 2) No generated video detection method has been proposed so far. To this end, we propose an open-source dataset and a detection method for generated video for the first time. First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions, as well as various generation models with different architectures and generation methods, including the most popular commercial models like OpenAI's Sora and Google's Veo. Second, we found via probing experiments that spatial artifact-based detectors lack generalizability. Hence, we propose a simple yet effective \textbf{de}tection model based on \textbf{f}rame \textbf{co}nsistency (\textbf{DeCoF}), which focuses on temporal artifacts by eliminating the impact of spatial artifacts during feature learning. Extensive experiments demonstrate the efficacy of DeCoF in detecting videos generated by unseen video generation models and confirm its powerful generalizability across several commercially proprietary models. Our code and dataset will be released at \url{https://github.com/wuwuwuyue/DeCoF}. more...
Published: 2024

7. A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

Author: Zhou, Pengyuan, Wang, Lin, Liu, Zhi, Hao, Yanbin, Hui, Pan, Tarkoma, Sasu, and Kangasharju, Jussi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Multimedia
Abstract: This paper offers an insightful examination of how currently top-trending AI technologies, i.e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field of video technology, including video generation, understanding, and streaming. It highlights the innovative use of these technologies in producing highly realistic videos, a significant leap in bridging the gap between real-world dynamics and digital creation. The study also delves into the advanced capabilities of LLMs in video understanding, demonstrating their effectiveness in extracting meaningful information from visual content, thereby enhancing our interaction with videos. In the realm of video streaming, the paper discusses how LLMs contribute to more efficient and user-centric streaming experiences, adapting content delivery to individual viewer preferences. This comprehensive review navigates through the current achievements, ongoing challenges, and future possibilities of applying Generative AI and LLMs to video-related tasks, underscoring the immense potential these technologies hold for advancing the field of video technology related to multimedia, networking, and AI communities., Comment: 16 pages, 10 figures, 4 tables more...
Published: 2024

8. Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting

Author: Ai, Hao, Cao, Zidong, Lu, Haonan, Chen, Chen, Ma, Jian, Zhou, Pengyuan, Kim, Tae-Kyun, Hui, Pan, and Wang, Lin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction
Abstract: 360 images, with a field-of-view (FoV) of 180x360, provide immersive and realistic environments for emerging virtual reality (VR) applications, such as virtual tourism, where users desire to create diverse panoramic scenes from a narrow FoV photo they take from a viewpoint via portable devices. It thus brings us to a technical challenge: `How to allow the users to freely create diverse and immersive virtual scenes from a narrow FoV image with a specified viewport?' To this end, we propose a transformer-based 360 image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360 images. Compared with existing methods, e.g., [3], which primarily focus on inputs with rectangular masks and central locations while overlooking the spherical property of 360 images, our Dream360 offers higher outpainting flexibility and fidelity based on the spherical representation. Dream360 comprises two key learning stages: (I) codebook-based panorama outpainting via Spherical-VQGAN (S-VQGAN), and (II) frequency-aware refinement with a novel frequency-aware consistency loss. Specifically, S-VQGAN learns a sphere-specific codebook from spherical harmonic (SH) values, providing a better representation of spherical data distribution for scene modeling. The frequency-aware refinement matches the resolution and further improves the semantic consistency and visual fidelity of the generated results. Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods. We also conducted a user study involving 15 participants to interactively evaluate the quality of the generated results in VR, demonstrating the flexibility and superiority of our Dream360 framework., Comment: 11 pages, accepted to IEEE VR 2024 more...
Published: 2024

9. Noise-NeRF: Hide Information in Neural Radiance Fields using Trainable Noise

Author: Huang, Qinglong, Li, Haoran, Liao, Yong, Hao, Yanbin, and Zhou, Pengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural Radiance Field (NeRF) has been proposed as an innovative advancement in 3D reconstruction techniques. However, little research has been conducted on the issues of information confidentiality and security to NeRF, such as steganography. Existing NeRF steganography solutions have shortcomings in low steganography quality, model weight damage, and limited amount of steganographic information. This paper proposes Noise-NeRF, a novel NeRF steganography method employing Adaptive Pixel Selection strategy and Pixel Perturbation strategy to improve the quality and efficiency of steganography via trainable noise. Extensive experiments validate the state-of-the-art performances of Noise-NeRF on both steganography quality and rendering quality, as well as effectiveness in super-resolution image steganography. more...
Published: 2024

10. 2D-Guided 3D Gaussian Segmentation

Author: Lan, Kun, Li, Haoran, Shi, Haolin, Wu, Wenjun, Liao, Yong, Wang, Lin, and Zhou, Pengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, 3D Gaussian, as an explicit 3D representation method, has demonstrated strong competitiveness over NeRF (Neural Radiance Fields) in terms of expressing complex scenes and training duration. These advantages signal a wide range of applications for 3D Gaussians in 3D understanding and editing. Meanwhile, the segmentation of 3D Gaussians is still in its infancy. The existing segmentation methods are not only cumbersome but also incapable of segmenting multiple objects simultaneously in a short amount of time. In response, this paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision. This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information, while nearest neighbor clustering and statistical filtering refine the segmentation results. Experiments show that our concise method can achieve comparable performances on mIOU and mAcc for multi-object segmentation as previous single-object segmentation methods. more...
Published: 2023

11. FedMKGC: Privacy-Preserving Federated Multilingual Knowledge Graph Completion

Author: Tang, Wei, Wu, Zhiqian, Cao, Yixin, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Computation and Language
Abstract: Knowledge graph completion (KGC) aims to predict missing facts in knowledge graphs (KGs), which is crucial as modern KGs remain largely incomplete. While training KGC models on multiple aligned KGs can improve performance, previous methods that rely on transferring raw data among KGs raise privacy concerns. To address this challenge, we propose a new federated learning framework that implicitly aggregates knowledge from multiple KGs without demanding raw data exchange and entity alignment. We treat each KG as a client that trains a local language model through textbased knowledge representation learning. A central server then aggregates the model weights from clients. As natural language provides a universal representation, the same knowledge thus has similar semantic representations across KGs. As such, the aggregated language model can leverage complementary knowledge from multilingual KGs without demanding raw user data sharing. Extensive experiments on a benchmark dataset demonstrate that our method substantially improves KGC on multilingual KGs, achieving comparable performance to state-of-the-art alignment-based models without requiring any labeled alignments or raw user data sharing. Our codes will be publicly available. more...
Published: 2023

12. Take History as a Mirror in Heterogeneous Federated Learning

Author: Jiang, Xiaorui, Xu, Hengwei, Gao, Yu, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Machine Learning
Abstract: Federated Learning (FL) allows several clients to cooperatively train machine learning models without disclosing the raw data. In practice, due to the system and statistical heterogeneity among devices, synchronous FL often encounters the straggler effect. In contrast, asynchronous FL can mitigate this problem, making it suitable for scenarios involving numerous participants. However, Non-IID data and stale models present significant challenges to asynchronous FL, as they would diminish the practicality of the global model and even lead to training failures. In this work, we propose a novel asynchronous FL framework called Federated Historical Learning (FedHist), which effectively addresses the challenges posed by both Non-IID data and gradient staleness. FedHist enhances the stability of local gradients by performing weighted fusion with historical global gradients cached on the server. Relying on hindsight, it assigns aggregation weights to each participant in a multi-dimensional manner during each communication round. To further enhance the efficiency and stability of the training process, we introduce an intelligent $\ell_2$-norm amplification scheme, which dynamically regulates the learning progress based on the $\ell_2$-norms of the submitted gradients. Extensive experiments demonstrate that FedHist outperforms state-of-the-art methods in terms of convergence performance and test accuracy. more...
Published: 2023

13. User Authentication and Identity Inconsistency Detection via Mouse-trajectory Similarity Measurement

Author: Jin, Rui, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Cryptography and Security
Abstract: Completely Automated Public Turing Test To Tell Computers and Humans Apart (CAPTCHA) is a type of challenge-response test widely used in authentication systems. A well-known challenge it faces is the CAPTCHA farm, where workers are hired to solve CAPTCHAs manually. In this work, we propose to tackle this challenge from a novel perspective, converting CAPTCHA farm detection to identity inconsistency detection, which essentially becomes an authentication process. Specifically, we develop a novel embedding model, which measures the similarity between mouse trajectories collected during the session and when registering/solving CAPTCHA, to authenticate and detect identity inconsistency. Moreover, unlike most existing works that employ a separate mouse movement classifier for each individual user, which brings in considerable costs when serving a large number of users, our model performs detection tasks using only one classifier for all users, significantly reducing the cost. Experiment results validate the superiority of our method over the state-of-the-art time series classification methods, achieving 94.3% and 97.7% of AUC in identity and authentication inconsistency detection, respectively. more...
Published: 2023

14. AHSecAgg and TSKG: Lightweight Secure Aggregation for Federated Learning Without Compromise

Author: Zhang, Siqing, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Cryptography and Security
Abstract: Leveraging federated learning (FL) to enable cross-domain privacy-sensitive data mining represents a vital breakthrough to accomplish privacy-preserving learning. However, attackers can infer the original user data by analyzing the uploaded intermediate parameters during the aggregation process. Therefore, secure aggregation has become a critical issue in the field of FL. Many secure aggregation protocols face the problem of high computation costs, which severely limits their applicability. To this end, we propose AHSecAgg, a lightweight secure aggregation protocol using additive homomorphic masks. AHSecAgg significantly reduces computation overhead without compromising the dropout handling capability or model accuracy. We prove the security of AHSecAgg in semi-honest and active adversary settings. In addition, in cross-silo scenarios where the group of participants is relatively fixed during each round, we propose TSKG, a lightweight Threshold Signature based masking key generation method. TSKG can generate different temporary secrets and shares for different aggregation rounds using the initial key and thus effectively eliminates the cost of secret sharing and key agreement. We prove TSKG does not sacrifice security. Extensive experiments show that AHSecAgg significantly outperforms state-of-the-art mask-based secure aggregation protocols in terms of computational efficiency, and TSKG effectively reduces the computation and communication costs for existing secure aggregation protocols. more...
Published: 2023

15. Towards Efficient Secure Aggregation in FL: Partial Vector Freezing for Cost Compression

Author: Zhang, Siqing, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Cryptography and Security
Abstract: Secure aggregation of user vectors has become a critical issue in the field of federated learning. Many Secure Aggregation Protocols (SAP) face exorbitant computation costs, which severely limit their applicability. We uncover that current endeavors to reduce computation costs tend to overlook a crucial fact: a considerable portion of SAP's computation burden stems from processing each entry in the private vectors. Given this observation, we propose PVF, a portable module for compressing computation costs. PVF is able to ``freeze'' a substantial portion of the private vector through specific linear transformations, only requiring $\frac{1}{\lambda}$ of the original vector to participate in SAP. Eventually, users can ``thaw'' the public sum of the ``frozen entries" by the result of SAP. To enhance functionality, we introduce extensions that can enforce consistency constraints on users' original vectors, verify aggregated results, and enhance security when a portion of the private vector is known to the server. We demonstrate that PVF can seamlessly integrate with various SAP and prove that it poses no threat to user privacy in the semi-honest and active adversary settings. We select $8$ baselines, encompassing $6$ distinct types of SAP, and explore the acceleration effects of PVF on these SAP. Empirical investigations indicate that when $\lambda=100$, PVF yields up to $99.5\times$ speedup and up to $32.3\times$ communication reduction, with the potential to approach nearly $1000\times$ acceleration as $\lambda$ increases. more...
Published: 2023

16. Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information

Author: Li, Jie, Li, Zhixin, Liu, Zhi, Zhou, Pengyuan, Hong, Richang, Li, Qiyue, and Hu, Han
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Volumetric video, also known as hologram video, is a novel medium that portrays natural content in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). It is expected to be the next-gen video technology and a prevalent use case for 5G and beyond wireless communication. Considering that each user typically only watches a section of the volumetric video, known as the viewport, it is essential to have precise viewport prediction for optimal performance. However, research on this topic is still in its infancy. In the end, this paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP), which aims to improve the precision of viewport prediction in volumetric video streaming. The STVP extensively utilizes video saliency information and viewport trajectory. To our knowledge, this is the first comprehensive study of viewport prediction in volumetric video streaming. In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity while still preserving video features in an efficient manner. Then we present a saliency detection technique that incorporates both spatial and temporal information for detecting static, dynamic geometric, and color salient regions. Finally, we intelligently fuse saliency and trajectory information to achieve more accurate viewport prediction. We conduct extensive simulations to evaluate the effectiveness of our proposed viewport prediction methods using state-of-the-art volumetric video sequences. The experimental results show the superiority of the proposed method over existing schemes. The dataset and source code will be publicly accessible after acceptance. more...
Published: 2023

17. 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing

Author: Li, Haoran, Ma, Long, Shi, Haolin, Hao, Yanbin, Liao, Yong, Cheng, Lechao, and Zhou, Pengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The current GAN inversion methods typically can only edit the appearance and shape of a single object and background while overlooking spatial information. In this work, we propose a 3D editing framework, 3D-GOI, to enable multifaceted editing of affine information (scale, translation, and rotation) on multiple objects. 3D-GOI realizes the complex editing function by inverting the abundance of attribute codes (object shape/appearance/scale/rotation/translation, background shape/appearance, and camera pose) controlled by GIRAFFE, a renowned 3D GAN. Accurately inverting all the codes is challenging, 3D-GOI solves this challenge following three main steps. First, we segment the objects and the background in a multi-object image. Second, we use a custom Neural Inversion Encoder to obtain coarse codes of each object. Finally, we use a round-robin optimization algorithm to get precise codes to reconstruct the image. To the best of our knowledge, 3D-GOI is the first framework to enable multifaceted editing on multiple objects. Both qualitative and quantitative experiments demonstrate that 3D-GOI holds immense potential for flexible, multifaceted editing in complex multi-object scenes.Our project and code are released at https://3d-goi.github.io . more...
Published: 2023

18. Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation

Author: Zheng, Xu, Luo, Yunhao, Zhou, Pengyuan, and Wang, Lin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we tackle a new problem: how to transfer knowledge from the pre-trained cumbersome yet well-performed CNN-based model to learn a compact Vision Transformer (ViT)-based model while maintaining its learning capacity? Due to the completely different characteristics of ViT and CNN and the long-existing capacity gap between teacher and student models in Knowledge Distillation (KD), directly transferring the cross-model knowledge is non-trivial. To this end, we subtly leverage the visual and linguistic-compatible feature character of ViT (i.e., student), and its capacity gap with the CNN (i.e., teacher) and propose a novel CNN-to-ViT KD framework, dubbed C2VKD. Importantly, as the teacher's features are heterogeneous to those of the student, we first propose a novel visual-linguistic feature distillation (VLFD) module that explores efficient KD among the aligned visual and linguistic-compatible representations. Moreover, due to the large capacity gap between the teacher and student and the inevitable prediction errors of the teacher, we then propose a pixel-wise decoupled distillation (PDD) module to supervise the student under the combination of labels and teacher's predictions from the decoupled target and non-target classes. Experiments on three semantic segmentation benchmark datasets consistently show that the increment of mIoU of our method is over 200% of the SoTA KD methods more...
Published: 2023

19. NLPBench: Evaluating Large Language Models on Solving NLP Problems

Author: Song, Linxin, Zhang, Jieyu, Cheng, Lechao, Zhou, Pengyuan, Zhou, Tianyi, and Li, Irene
Subjects: Computer Science - Computation and Language
Abstract: Recent developments in large language models (LLMs) have shown promise in enhancing the capabilities of natural language processing (NLP). Despite these successes, there remains a dearth of research dedicated to the NLP problem-solving abilities of LLMs. To fill the gap in this area, we present a unique benchmarking dataset, NLPBench, comprising 378 college-level NLP questions spanning various NLP topics sourced from Yale University's prior final exams. NLPBench includes questions with context, in which multiple sub-questions share the same public information, and diverse question types, including multiple choice, short answer, and math. Our evaluation, centered on LLMs such as GPT-3.5/4, PaLM-2, and LLAMA-2, incorporates advanced prompting strategies like the chain-of-thought (CoT) and tree-of-thought (ToT). Our study reveals that the effectiveness of the advanced prompting strategies can be inconsistent, occasionally damaging LLM performance, especially in smaller models like the LLAMA-2 (13b). Furthermore, our manual assessment illuminated specific shortcomings in LLMs' scientific problem-solving skills, with weaknesses in logical decomposition and reasoning notably affecting results. more...
Published: 2023

20. Detect Depression from Social Networks with Sentiment Knowledge Sharing

Author: Shi, Yan, Tian, Yao, Tong, Chengwei, Zhu, Chunyan, Li, Qianqian, Zhang, Mengzhu, Zhao, Wei, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks
Abstract: Social network plays an important role in propagating people's viewpoints, emotions, thoughts, and fears. Notably, following lockdown periods during the COVID-19 pandemic, the issue of depression has garnered increasing attention, with a significant portion of individuals resorting to social networks as an outlet for expressing emotions. Using deep learning techniques to discern potential signs of depression from social network messages facilitates the early identification of mental health conditions. Current efforts in detecting depression through social networks typically rely solely on analyzing the textual content, overlooking other potential information. In this work, we conduct a thorough investigation that unveils a strong correlation between depression and negative emotional states. The integration of such associations as external knowledge can provide valuable insights for detecting depression. Accordingly, we propose a multi-task training framework, DeSK, which utilizes shared sentiment knowledge to enhance the efficacy of depression detection. Experiments conducted on both Chinese and English datasets demonstrate the cross-lingual effectiveness of DeSK. more...
Published: 2023

21. How Secure is Your Website? A Comprehensive Investigation on CAPTCHA Providers and Solving Services

Author: Jin, Rui, Huang, Lin, Duan, Jikang, Zhao, Wei, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Cryptography and Security, Computer Science - Computers and Society
Abstract: Completely Automated Public Turing Test To Tell Computers and Humans Apart (CAPTCHA) has been implemented on many websites to identify between harmful automated bots and legitimate users. However, the revenue generated by the bots has turned circumventing CAPTCHAs into a lucrative business. Although earlier studies provided information about text-based CAPTCHAs and the associated CAPTCHA-solving services, a lot has changed in the past decade regarding content, suppliers, and solvers of CAPTCHA. We have conducted a comprehensive investigation of the latest third-party CAPTCHA providers and CAPTCHA-solving services' attacks. We dug into the details of CAPTCHA-As-a-Service and the latest CAPTCHA-solving services and carried out adversarial experiments on CAPTCHAs and CAPTCHA solvers. The experiment results show a worrying fact: most latest CAPTCHAs are vulnerable to both human solvers and automated solvers. New CAPTCHAs based on hard AI problems and behavior analysis are needed to stop CAPTCHA solvers. more...
Published: 2023

22. Deepfake in the Metaverse: An Outlook Survey

Author: Wu, Haojie, Hui, Pan, and Zhou, Pengyuan
Subjects: Computer Science - Human-Computer Interaction
Abstract: We envision deepfake technologies, which synthesize realistic fake images and videos, will play an important role in the future metaverse. While enhancing users' immersion and experience with synthesized virtual characters and scenes, deepfake can cause serious consequences if used for fraud, impersonation, and dissemination of fake information. In this paper, we introduce the principles, applications, and risks of deepfake technology, and propose some countermeasures to help users and developers in the metaverse deal with the challenges brought by deepfake technologies. Further, we provide an outlook on the future development of deepfake in the metaverse. more...
Published: 2023

23. Last Week with ChatGPT: A Weibo Study on Social Perspective Regarding ChatGPT for Education and Beyond

Author: Tian, Yao, Tong, Chengwei, Lee, Lik-Hang, Mogavi, Reza Hadi, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Computers and Society, Computer Science - Human-Computer Interaction
Abstract: The application of AI-powered tools has piqued the interest of many fields, particularly in the academic community. This study uses ChatGPT, currently the most powerful and popular AI tool, as a representative example to analyze how the Chinese public perceives the potential of large language models (LLMs) for educational and general purposes. Although facing accessibility challenges, we found that the number of discussions on ChatGPT per month is 16 times that of Ernie Bot developed by Baidu, the most popular alternative product to ChatGPT in the mainland, making ChatGPT a more suitable subject for our analysis. The study also serves as the first effort to investigate the changes in public opinion as AI technologies become more advanced and intelligent. The analysis reveals that, upon first encounters with advanced AI that was not yet highly capable, some social media users believed that AI advancements would benefit education and society, while others feared that advanced AI, like ChatGPT, would make humans feel inferior and lead to problems such as cheating and a decline in moral principles. The majority of users remained neutral. Interestingly, with the rapid development and improvement of AI capabilities, public attitudes have tended to shift in a positive direction. We present a thorough analysis of the trending shift and a roadmap to ensure the ethical application of ChatGPT-like models in education and beyond. more...
Published: 2023

24. Exploring User Perspectives on ChatGPT: Applications, Perceptions, and Implications for AI-Integrated Education

Author: Mogavi, Reza Hadi, Deng, Chao, Kim, Justin Juho, Zhou, Pengyuan, Kwon, Young D., Metwally, Ahmed Hosny Saleh, Tlili, Ahmed, Bassanelli, Simone, Bucchiarone, Antonio, Gujar, Sujit, Nacke, Lennart E., and Hui, Pan more...
Subjects: Computer Science - Computers and Society, Computer Science - Human-Computer Interaction
Abstract: Understanding user perspectives on Artificial Intelligence (AI) in education is essential for creating pedagogically effective and ethically responsible AI-integrated learning environments. In this paper, we conduct an extensive qualitative content analysis of four major social media platforms (Twitter, Reddit, YouTube, and LinkedIn) to explore the user experience (UX) and perspectives of early adopters toward ChatGPT-an AI Chatbot technology-in various education sectors. We investigate the primary applications of ChatGPT in education (RQ1) and the various perceptions of the technology (RQ2). Our findings indicate that ChatGPT is most popularly used in the contexts of higher education (24.18%), K-12 education (22.09%), and practical-skills learning (15.28%). On social media platforms, the most frequently discussed topics about ChatGPT are productivity, efficiency, and ethics. While some early adopters lean toward seeing ChatGPT as a revolutionary technology with the potential to boost students' self-efficacy and motivation to learn, others express concern that overreliance on the AI system may promote superficial learning habits and erode students' social and critical thinking skills. Our study contributes to the broader discourse on Human-AI Interaction and offers recommendations based on crowd-sourced knowledge for educators and learners interested in incorporating ChatGPT into their educational settings. Furthermore, we propose a research agenda for future studies that sets the foundation for continued investigation into the application of ChatGPT in education., Comment: Preprint version more...
Published: 2023

25. What if we have Meta GPT? From Content Singularity to Human-Metaverse Interaction in AIGC Era

Author: Lee, Lik-Hang, Zhou, Pengyuan, Zhang, Chaoning, and Hosio, Simo
Subjects: Computer Science - Human-Computer Interaction
Abstract: The global metaverse development is facing a "cooldown moment", while the academia and industry attention moves drastically from the Metaverse to AI Generated Content (AIGC) in 2023. Nonetheless, the current discussion rarely considers the connection between AIGCs and the Metaverse. We can imagine the Metaverse, i.e., immersive cyberspace, is the black void of space, and AIGCs can simultaneously offer content and facilitate diverse user needs. As such, this article argues that AIGCs can be a vital technological enabler for the Metaverse. The article first provides a retrospect of the major pitfall of the metaverse applications in 2022. Second, we discuss from a user-centric perspective how the metaverse development will accelerate with AIGCs. Next, the article conjectures future scenarios concatenating the Metaverse and AIGCs. Accordingly, we advocate for an AI-Generated Metaverse (AIGM) framework for energizing the creation of metaverse content in the AIGC era., Comment: 11 pages, 4 figures more...
Published: 2023

26. Spatiotemporal and Semantic Zero-inflated Urban Anomaly Prediction

Author: Lu, Yao, Zhou, Pengyuan, Liao, Yong, and Xie, Haiyong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Urban anomaly predictions, such as traffic accident prediction and crime prediction, are of vital importance to smart city security and maintenance. Existing methods typically use deep learning to capture the intra-dependencies in spatial and temporal dimensions. However, numerous key challenges remain unsolved, for instance, sparse zero-inflated data due to urban anomalies occurring with low frequency (which can lead to poor performance on real-world datasets), and both intra- and inter-dependencies of abnormal patterns across spatial, temporal, and semantic dimensions. Moreover, a unified approach to predict multiple kinds of anomaly is left to explore. In this paper, we propose STS to jointly capture the intra- and inter-dependencies between the patterns and the influential factors in three dimensions. Further, we use a multi-task prediction module with a customized loss function to solve the zero-inflated issue. To verify the effectiveness of the model, we apply it to two urban anomaly prediction tasks, crime prediction and traffic accident risk prediction, respectively. Experiments on two application scenarios with four real-world datasets demonstrate the superiority of STS, which outperforms state-of-the-art methods in the mean absolute error and the root mean square error by 37.88% and 18.10% on zero-inflated datasets, and, 60.32% and 37.28% on non-zero datasets, respectively. more...
Published: 2023

27. Unleashing GPT on the Metaverse: Savior or Destroyer?

Author: Zhou, Pengyuan
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence
Abstract: Incorporating artificial intelligence (AI) technology, particularly large language models (LLMs), is becoming increasingly vital for developing immersive and interactive metaverse experiences. GPT, a representative LLM developed by OpenAI, is leading LLM development and gaining attention for its potential in building the metaverse. The article delves into the pros and cons of utilizing GPT for metaverse-based education, entertainment, personalization, and support. Dynamic and personalized experiences are possible with this technology, but there are also legitimate privacy, bias, and ethical issues to consider. This article aims to help readers understand the possible influence of GPT, according to its unique technological advantages, on the metaverse and how it may be used to effectively create a more immersive and engaging virtual environment by evaluating these opportunities and obstacles., Comment: 15 pages, 18 figures more...
Published: 2023

28. TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Author: Li, Haoran, Zhou, Pengyuan, Lin, Yihang, Hao, Yanbin, Xie, Haiyong, and Liao, Yong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Video prediction is a complex time-series forecasting task with great potential in many use cases. However, conventional methods overemphasize accuracy while ignoring the slow prediction speed caused by complicated model structures that learn too much redundant information with excessive GPU memory consumption. Furthermore, conventional methods mostly predict frames sequentially (frame-by-frame) and thus are hard to accelerate. Consequently, valuable use cases such as real-time danger prediction and warning cannot achieve fast enough inference speed to be applicable in reality. Therefore, we propose a transformer-based keypoint prediction neural network (TKN), an unsupervised learning method that boost the prediction process via constrained information extraction and parallel prediction scheme. TKN is the first real-time video prediction solution to our best knowledge, while significantly reducing computation costs and maintaining other performance. Extensive experiments on KTH and Human3.6 datasets demonstrate that TKN predicts 11 times faster than existing methods while reducing memory consumption by 17.4% and achieving state-of-the-art prediction performance on average. more...
Published: 2023

29. FedACK: Federated Adversarial Contrastive Knowledge Distillation for Cross-Lingual and Cross-Model Social Bot Detection

Author: Yang, Yingguang, Yang, Renyu, Peng, Hao, Li, Yangyang, Li, Tong, Liao, Yong, and Zhou, Pengyuan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Social bot detection is of paramount importance to the resilience and security of online social platforms. The state-of-the-art detection models are siloed and have largely overlooked a variety of data characteristics from multiple cross-lingual platforms. Meanwhile, the heterogeneity of data distribution and model architecture makes it intricate to devise an efficient cross-platform and cross-model detection framework. In this paper, we propose FedACK, a new federated adversarial contrastive knowledge distillation framework for social bot detection. We devise a GAN-based federated knowledge distillation mechanism for efficiently transferring knowledge of data distribution among clients. In particular, a global generator is used to extract the knowledge of global data distribution and distill it into each client's local model. We leverage local discriminator to enable customized model design and use local generator for data enhancement with hard-to-decide samples. Local training is conducted as multi-stage adversarial and contrastive learning to enable consistent feature spaces among clients and to constrain the optimization direction of local models, reducing the divergences between local and global models. Experiments demonstrate that FedACK outperforms the state-of-the-art approaches in terms of accuracy, communication efficiency, and feature space consistency., Comment: Accepted by the ACM Web Conference 2023 (WWW'23) more...
Published: 2023
Full Text: View/download PDF

30. Mitigating Backdoors in Federated Learning with FLD

Author: Lin, Yihang, Zhou, Pengyuan, Wu, Zhiqian, and Liao, Yong
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Federated learning allows clients to collaboratively train a global model without uploading raw data for privacy preservation. This feature, i.e., the inability to review participants' datasets, has recently been found responsible for federated learning's vulnerability in the face of backdoor attacks. Existing defense methods fall short from two perspectives: 1) they consider only very specific and limited attacker models and unable to cope with advanced backdoor attacks, such as distributed backdoor attacks, which break down the global trigger into multiple distributed triggers. 2) they conduct detection based on model granularity thus the performance gets impacted by the model dimension. To address these challenges, we propose Federated Layer Detection (FLD), a novel model filtering approach for effectively defending against backdoor attacks. FLD examines the models based on layer granularity to capture the complete model details and effectively detect potential backdoor models regardless of model dimension. We provide theoretical analysis and proof for the convergence of FLD. Extensive experiments demonstrate that FLD effectively mitigates state-of-the-art backdoor attacks with negligible impact on the accuracy of the primary task. more...
Published: 2023

31. A Roadmap Toward Metaversity: Recent Developments and Perspectives in Education

Author: Lee, Lik-Hang, Hosio, Simo, Braud, Tristan, Zhou, Pengyuan, Kinshuk, Series Editor, Huang, Ronghuai, Series Editor, Sampson, Demetrios, Series Editor, Liu, Dejian, editor, Metwally, Ahmed Hosny Saleh, editor, Tlili, Ahmed, editor, and Fan Lin, Emma, editor more...
Published: 2024
Full Text: View/download PDF

32. Detect Depression from Social Networks with Sentiment Knowledge Sharing

Author: Shi, Yan, Tian, Yao, Tong, Chengwei, Zhu, Chunyan, Li, Qianqian, Zhang, Mengzhu, Zhao, Wei, Liao, Yong, Zhou, Pengyuan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Wu, Feng, editor, Huang, Xuanjing, editor, He, Xiangnan, editor, Tang, Jiliang, editor, Zhao, Shu, editor, Li, Daifeng, editor, and Zhang, Jing, editor more...
Published: 2024
Full Text: View/download PDF

33. Vetaverse: A Survey on the Intersection of Metaverse, Vehicles, and Transportation Systems

Author: Zhou, Pengyuan, Zhu, Jinjing, Wang, Yiting, Lu, Yunfan, Wei, Zixiang, Shi, Haolin, Ding, Yuchen, Gao, Yu, Huang, Qinglong, Shi, Yan, Alhilal, Ahmad, Lee, Lik-Hang, Braud, Tristan, Hui, Pan, and Wang, Lin more...
Subjects: Computer Science - Human-Computer Interaction
Abstract: Since 2021, the term "Metaverse" has been the most popular one, garnering a lot of interest. Because of its contained environment and built-in computing and networking capabilities, a modern car makes an intriguing location to host its own little metaverse. Additionally, the travellers don't have much to do to pass the time while traveling, making them ideal customers for immersive services. Vetaverse (Vehicular-Metaverse), which we define as the future continuum between vehicular industries and Metaverse, is envisioned as a blended immersive realm that scales up to cities and countries, as digital twins of the intelligent Transportation Systems, referred to as "TS-Metaverse", as well as customized XR services inside each Individual Vehicle, referred to as "IV-Metaverse". The two subcategories serve fundamentally different purposes, namely long-term interconnection, maintenance, monitoring, and management on scale for large transportation systems (TS), and personalized, private, and immersive infotainment services (IV). By outlining the framework of Vetaverse and examining important enabler technologies, we reveal this impending trend. Additionally, we examine unresolved issues and potential routes for future study while highlighting some intriguing Vetaverse services., Comment: 27 pages, 19 figures more...
Published: 2022

34. Celeritas: Fast Optimizer for Large Dataflow Graphs

Author: Xu, Hengwei, Liao, Yong, Xie, Haiyong, and Zhou, Pengyuan
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence
Abstract: The rapidly enlarging neural network models are becoming increasingly challenging to run on a single device. Hence model parallelism over multiple devices is critical to guarantee the efficiency of training large models. Recent proposals fall short either in long processing time or poor performance. Therefore, we propose Celeritas, a fast framework for optimizing device placement for large models. Celeritas employs a simple but efficient model parallelization strategy in the Standard Evaluation, and generates placement policies through a series of scheduling algorithms. We conduct experiments to deploy and evaluate Celeritas on numerous large models. The results show that Celeritas not only reduces the placement policy generation time by 26.4\% but also improves the model running time by 34.2\% compared to most advanced methods. more...
Published: 2022

35. FRAS: Federated Reinforcement Learning empowered Adaptive Point Cloud Video Streaming

Author: Gao, Yu, Zhou, Pengyuan, Liu, Zhi, Han, Bo, and Hui, Pan
Subjects: Computer Science - Multimedia
Abstract: Point cloud video transmission is challenging due to high encoding/decoding complexity, high video bitrate, and low latency requirement. Consequently, conventional adaptive streaming methodologies often find themselves unsatisfactory to meet the requirements in threefold: 1) current algorithms reuse existing quality of experience (QoE) definitions while overlooking the unique features of point cloud video thus failing to provide optimal user experience, 2) most deep learning approaches require long-span data collections to learn sufficiently varied network conditions and result in long training periods and capacity occupation, 3) cloud training approaches pose privacy risks caused by leakage of user reported service usage and networking conditions. To overcome the limitations, we present FRAS, the first federated reinforcement learning framework, to the best of our knowledge, for adaptive point cloud video streaming. We define a new QoE model which takes the unique features of point cloud video into account. Each client uses reinforcement learning (RL) to train video quality selection with the objective of optimizing the user's QoE under multiple constraints. Then, a federated learning framework is integrated with the RL algorithm to enhance training performance with privacy preservation. Extensive simulations using real point cloud videos and network traces reveal the superiority of the proposed scheme over baseline schemes. We also implement a prototype that demonstrates the performance of FRAS via real-world tests. more...
Published: 2022

36. Federated Split GANs

Author: Kortoçi, Pranvera, Liang, Yilei, Zhou, Pengyuan, Lee, Lik-Hang, Mehrabi, Abbas, Hui, Pan, Tarkoma, Sasu, and Crowcroft, Jon
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Mobile devices and the immense amount and variety of data they generate are key enablers of machine learning (ML)-based applications. Traditional ML techniques have shifted toward new paradigms such as federated (FL) and split learning (SL) to improve the protection of user's data privacy. However, these paradigms often rely on server(s) located in the edge or cloud to train computationally-heavy parts of a ML model to avoid draining the limited resource on client devices, resulting in exposing device data to such third parties. This work proposes an alternative approach to train computationally-heavy ML models in user's devices themselves, where corresponding device data resides. Specifically, we focus on GANs (generative adversarial networks) and leverage their inherent privacy-preserving attribute. We train the discriminative part of a GAN with raw data on user's devices, whereas the generative model is trained remotely (e.g., server) for which there is no need to access sensor true data. Moreover, our approach ensures that the computational load of training the discriminative model is shared among user's devices-proportional to their computation capabilities-by means of SL. We implement our proposed collaborative training scheme of a computationally-heavy GAN model in real resource-constrained devices. The results show that our system preserves data privacy, keeps a short training time, and yields same accuracy of model training in unconstrained devices (e.g., cloud). Our code can be found on https://github.com/YukariSonz/FSL-GAN more...
Published: 2022

37. HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign Supermask

Author: Vallapuram, Anish K., Zhou, Pengyuan, Kwon, Young D., Lee, Lik Hang, Xu, Hengwei, and Hui, Pan
Subjects: Computer Science - Machine Learning
Abstract: Federated learning alleviates the privacy risk in distributed learning by transmitting only the local model updates to the central server. However, it faces challenges including statistical heterogeneity of clients' datasets and resource constraints of client devices, which severely impact the training performance and user experience. Prior works have tackled these challenges by combining personalization with model compression schemes including quantization and pruning. However, the pruning is data-dependent and thus must be done on the client side which requires considerable computation cost. Moreover, the pruning normally trains a binary supermask $\in \{0, 1\}$ which significantly limits the model capacity yet with no computation benefit. Consequently, the training requires high computation cost and a long time to converge while the model performance does not pay off. In this work, we propose HideNseek which employs one-shot data-agnostic pruning at initialization to get a subnetwork based on weights' synaptic saliency. Each client then optimizes a sign supermask $\in \{-1, +1\}$ multiplied by the unpruned weights to allow faster convergence with the same compression rates as state-of-the-art. Empirical results from three datasets demonstrate that compared to state-of-the-art, HideNseek improves inferences accuracies by up to 40.6\% while reducing the communication cost and training time by up to 39.7\% and 46.8\% respectively. more...
Published: 2022

38. What is the Metaverse? An Immersive Cyberspace and Open Challenges

Author: Lee, Lik-Hang, Zhou, Pengyuan, Braud, Tristan, and Hui, Pan
Subjects: Computer Science - Multimedia, Computer Science - Computers and Society, A.1, K.0
Abstract: The Metaverse refers to a virtual-physical blended space in which multiple users can concurrently interact with a unified computer-generated environment and other users, which can be regarded as the next significant milestone of the current cyberspace. This article primarily discusses the development and challenges of the Metaverse. We first briefly describe the development of cyberspace and the necessity of technology enablers. Accordingly, our bottom-up approach highlights three critical technology enablers for the Metaverse: networks, systems, and users. Also, we highlight a number of indispensable issues, under technological and ecosystem perspectives, that build and sustain the Metaverse., Comment: 7 pages, 2 figures more...
Published: 2022

39. Distilling efficient Vision Transformers from CNNs for semantic segmentation

Author: Zheng, Xu, Luo, Yunhao, Zhou, Pengyuan, and Wang, Lin
Published: 2025
Full Text: View/download PDF

40. Towards User-Centered Metrics for Trustworthy AI in Immersive Cyberspace

Author: Zhou, Pengyuan, Finley, Benjamin, Lee, Lik-Hang, Liao, Yong, Xie, Haiyong, and Hui, Pan
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence
Abstract: AI plays a key role in current cyberspace and future immersive ecosystems that pinpoint user experiences. Thus, the trustworthiness of such AI systems is vital as failures in these systems can cause serious user harm. Although there are related works on exploring trustworthy AI (TAI) metrics in the current cyberspace, ecosystems towards user-centered services, such as the metaverse, are much more complicated in terms of system performance and user experience assessment, thus posing challenges for the applicability of existing approaches. Thus, we give an overlook on fairness, privacy and robustness, across the historical path from existing approaches. Eventually, we propose a research agenda towards systematic yet user-centered TAI in immersive ecosystems. more...
Published: 2022

41. EdgeXAR: A 6-DoF Camera Multi-target Interaction Framework for MAR with User-friendly Latency Compensation

Author: Zhang, Wenxiao, Lin, Sikun, Bijarbooneh, Farshid Hassani, Cheng, Haofei, Braud, Tristan, Zhou, Pengyuan, Lee, Lik-hang, and Hui, Pan
Subjects: Computer Science - Human-Computer Interaction
Abstract: The computational capabilities of recent mobile devices enable the processing of natural features for Augmented Reality (AR), but the scalability is still limited by the devices' computation power and available resources. In this paper, we propose EdgeXAR, a mobile AR framework that utilizes the advantages of edge computing through task offloading to support flexible camera-based AR interaction. We propose a hybrid tracking system for mobile devices that provides lightweight tracking with 6 Degrees of Freedom and hides the offloading latency from users' perception. A practical, reliable and unreliable communication mechanism is used to achieve fast response and consistency of crucial information. We also propose a multi-object image retrieval pipeline that executes fast and accurate image recognition tasks on the cloud and edge servers. Extensive experiments are carried out to evaluate the performance of EdgeXAR by building mobile AR Apps upon it. Regarding the Quality of Experience (QoE), the mobile AR Apps powered by EdgeXAR framework run on average at the speed of 30 frames per second with precise tracking of only 1~2 pixel errors and accurate image recognition of at least 97% accuracy. As compared to Vuforia, one of the leading commercial AR frameworks, EdgeXAR transmits 87% less data while providing a stable 30 FPS performance and reducing the offloading latency by 50 to 70% depending on the transmission medium. Our work facilitates the large-scale deployment of AR as the next generation of ubiquitous interfaces., Comment: 24 pages, ACM EICS'22. arXiv admin note: substantial text overlap with arXiv:1805.03060 more...
Published: 2021

42. All One Needs to Know about Metaverse: A Complete Survey on Technological Singularity, Virtual Ecosystem, and Research Agenda

Author: Lee, Lik-Hang, Braud, Tristan, Zhou, Pengyuan, Wang, Lin, Xu, Dianlei, Lin, Zijun, Kumar, Abhishek, Bermejo, Carlos, and Hui, Pan
Subjects: Computer Science - Computers and Society, A.1, K.0
Abstract: Since the popularisation of the Internet in the 1990s, the cyberspace has kept evolving. We have created various computer-mediated virtual environments including social networks, video conferencing, virtual 3D worlds (e.g., VR Chat), augmented reality applications (e.g., Pokemon Go), and Non-Fungible Token Games (e.g., Upland). Such virtual environments, albeit non-perpetual and unconnected, have bought us various degrees of digital transformation. The term `metaverse' has been coined to further facilitate the digital transformation in every aspect of our physical lives. At the core of the metaverse stands the vision of an immersive Internet as a gigantic, unified, persistent, and shared realm. While the metaverse may seem futuristic, catalysed by emerging technologies such as Extended Reality, 5G, and Artificial Intelligence, the digital `big bang' of our cyberspace is not far away. This survey paper presents the first effort to offer a comprehensive framework that examines the latest metaverse development under the dimensions of state-of-the-art technologies and metaverse ecosystems, and illustrates the possibility of the digital `big bang'. First, technologies are the enablers that drive the transition from the current Internet to the metaverse. We thus examine eight enabling technologies rigorously - Extended Reality, User Interactivity (Human-Computer Interaction), Artificial Intelligence, Blockchain, Computer Vision, IoT and Robotics, Edge and Cloud computing, and Future Mobile Networks. In terms of applications, the metaverse ecosystem allows human users to live and play within a self-sustaining, persistent, and shared realm. Therefore, we discuss six user-centric factors -- Avatar, Content Creation, Virtual Economy, Social Acceptability, Security and Privacy, and Trust and Accountability. Finally, we propose a concrete research agenda for the development of the metaverse., Comment: 66 pages more...
Published: 2021

43. Detect Depression from Social Networks with Sentiment Knowledge Sharing

Author: Shi, Yan, primary, Tian, Yao, additional, Tong, Chengwei, additional, Zhu, Chunyan, additional, Li, Qianqian, additional, Zhang, Mengzhu, additional, Zhao, Wei, additional, Liao, Yong, additional, and Zhou, Pengyuan, additional more...
Published: 2023
Full Text: View/download PDF

44. Loss Tolerant Federated Learning

Author: Zhou, Pengyuan, Fang, Pei, and Hui, Pan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Networking and Internet Architecture
Abstract: Federated learning has attracted attention in recent years for collaboratively training data on distributed devices with privacy-preservation. The limited network capacity of mobile and IoT devices has been seen as one of the major challenges for cross-device federated learning. Recent solutions have been focusing on threshold-based client selection schemes to guarantee the communication efficiency. However, we find this approach can cause biased client selection and results in deteriorated performance. Moreover, we find that the challenge of network limit may be overstated in some cases and the packet loss is not always harmful. In this paper, we explore the loss tolerant federated learning (LT-FL) in terms of aggregation, fairness, and personalization. We use ThrowRightAway (TRA) to accelerate the data uploading for low-bandwidth-devices by intentionally ignoring some packet losses. The results suggest that, with proper integration, TRA and other algorithms can together guarantee the personalization and fairness performance in the face of packet loss below a certain fraction (10%-30%). more...
Published: 2021

45. AICP: Augmented Informative Cooperative Perception

Author: Zhou, Pengyuan, Kortoci, Pranvera, Yau, Yui-Pan, Braud, Tristan, Wang, Xiujun, Finley, Benjamin, Lee, Lik-Hang, Tarkoma, Sasu, Kangasharju, Jussi, and Hui, Pan
Subjects: Computer Science - Multimedia, Computer Science - Human-Computer Interaction
Abstract: Connected vehicles, whether equipped with advanced driver-assistance systems or fully autonomous, require human driver supervision and are currently constrained to visual information in their line-of-sight. A cooperative perception system among vehicles increases their situational awareness by extending their perception range. Existing solutions focus on improving perspective transformation and fast information collection. However, such solutions fail to filter out large amounts of less relevant data and thus impose significant network and computation load. Moreover, presenting all this less relevant data can overwhelm the driver and thus actually hinder them. To address such issues, we present Augmented Informative Cooperative Perception (AICP), the first fast-filtering system which optimizes the informativeness of shared data at vehicles to improve the fused presentation. To this end, an informativeness maximization problem is presented for vehicles to select a subset of data to display to their drivers. Specifically, we propose (i) a dedicated system design with custom data structure and lightweight routing protocol for convenient data encapsulation, fast interpretation and transmission, and (ii) a comprehensive problem formulation and efficient fitness-based sorting algorithm to select the most valuable data to display at the application layer. We implement a proof-of-concept prototype of AICP with a bandwidth-hungry, latency-constrained real-life augmented reality application. The prototype adds only 12.6 milliseconds of latency to a current informativeness-unaware system. Next, we test the networking performance of AICP at scale and show that ACIP effectively filters out less relevant packets and decreases the channel busy time., Comment: Accepted in IEEE Transactions on Intelligent Transportation Systems more...
Published: 2021

46. 5G MEC Computation Handoff for Mobile Augmented Reality

Author: Zhou, Pengyuan, Fu, Shuhao, Finley, Benjamin, Li, Xuebing, Tarkoma, Sasu, Kangasharju, Jussi, Ammar, Mostafa, and Hui, Pan
Subjects: Computer Science - Networking and Internet Architecture
Abstract: The combination of 5G and Multi-access Edge Computing (MEC) can significantly reduce application delay by lowering transmission delay and bringing computational capabilities closer to the end user. Therefore, 5G MEC could enable excellent user experience in applications like Mobile Augmented Reality (MAR), which are computation-intensive, and delay and jitter-sensitive. However, existing 5G handoff algorithms often do not consider the computational load of MEC servers, are too complex for real-time execution, or do not integrate easily with the standard protocol stack. Thus they can impair the performance of 5G MEC. To address this gap, we propose Comp-HO, a handoff algorithm that finds a local solution to the joint problem of optimizing signal strength and computational load. Additionally, Comp-HO can easily be integrated into current LTE and 5G base stations thanks to its simplicity and standard-friendly deployability. Specifically, we evaluate Comp-HO through a custom NS-3 simulator which we calibrate via MAR prototype measurements from a real-world 5G testbed. We simulate both Comp-HO and several classic handoff algorithms. The results show that, even without a global optimum, the proposed algorithm still significantly reduces the number of large delays, caused by congestion at MECs, at the expense of a small increase in transmission delay. more...
Published: 2021

47. ChatGPT in education: A blessing or a curse? A qualitative study exploring early adopters’ utilization and perceptions

Author: Hadi Mogavi, Reza, Deng, Chao, Juho Kim, Justin, Zhou, Pengyuan, D. Kwon, Young, Hosny Saleh Metwally, Ahmed, Tlili, Ahmed, Bassanelli, Simone, Bucchiarone, Antonio, Gujar, Sujit, Nacke, Lennart E., and Hui, Pan more...
Published: 2024
Full Text: View/download PDF

48. DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control in the IoV

Author: Zhou, Pengyuan, Chen, Xianfu, Liu, Zhi, Braud, Tristan, Hui, Pan, and Kangasharju, Jussi
Subjects: Computer Science - Multiagent Systems, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Systems and Control
Abstract: The Internet of Vehicles (IoV) enables real-time data exchange among vehicles and roadside units and thus provides a promising solution to alleviate traffic jams in the urban area. Meanwhile, better traffic management via efficient traffic light control can benefit the IoV as well by enabling a better communication environment and decreasing the network load. As such, IoV and efficient traffic light control can formulate a virtuous cycle. Edge computing, an emerging technology to provide low-latency computation capabilities at the edge of the network, can further improve the performance of this cycle. However, while the collected information is valuable, an efficient solution for better utilization and faster feedback has yet to be developed for edge-empowered IoV. To this end, we propose a Decentralized Reinforcement Learning at the Edge for traffic light control in the IoV (DRLE). DRLE exploits the ubiquity of the IoV to accelerate the collection of traffic data and its interpretation towards alleviating congestion and providing better traffic light control. DRLE operates within the coverage of the edge servers and uses aggregated data from neighboring edge servers to provide city-scale traffic light control. DRLE decomposes the highly complex problem of large area control. into a decentralized multi-agent problem. We prove its global optima with concrete mathematical reasoning. The proposed decentralized reinforcement learning algorithm running at each edge node adapts the traffic lights in real time. We conduct extensive evaluations and demonstrate the superiority of this approach over several state-of-the-art algorithms., Comment: Accepted by IEEE Transactions on Intelligent Transportation Systems more...
Published: 2020
Full Text: View/download PDF

49. Evaluating Transport Protocols on 5G for Mobile Augmented Reality

Author: Cao, Jacky, Su, Xiang, Finley, Benjamin, Zhou, Pengyuan, and Hui, Pan
Subjects: Computer Science - Networking and Internet Architecture
Abstract: Mobile Augmented Reality (MAR) mixes physical environments with user-interactive virtual annotations. Immersive MAR experiences are supported by computation-intensive tasks which rely on offloading mechanisms to ease device workloads. However, this introduces additional network traffic which in turn influences the motion-to-photon latency (a determinant of user-perceived quality of experience). Therefore, a proper transport protocol is crucial to minimise transmission latency and ensure sufficient throughput to support MAR performance. Relatedly, 5G, a potential MAR supporting technology, is widely believed to be smarter, faster, and more efficient than its predecessors. However, the suitability and performance of existing transport protocols in MAR in the 5G context has not been explored. Therefore, we present an evaluation of popular transport protocols, including UDP, TCP, MPEG-TS, RTP, and QUIC, with a MAR system on a real-world 5G testbed. We also compare with their 5G performance with LTE and WiFi. Our evaluation results indicate that TCP has the lowest round-trip-time on 5G, with a median of $15.09\pm0.26$ ms, while QUIC appears to perform better on LTE. Through an additional test with varying signal quality (specifically, degrading secondary synchronisation signal reference signal received quality), we discover that protocol performance appears to be significantly impacted by signal quality., Comment: 8 pages, 5 figures, 3 tables more...
Published: 2020

50. Functional polymorphisms in Benzo(a)Pyrene-induced toxicity pathways associated with the risk on laryngeal squamous cell carcinoma

Author: Xu, Lin, Sun, Xueying, Wang, Yiyi, Zhou, Tao, Jia, Jingjing, Zhang, Mai, Zhou, Pengyuan, Wang, Yixiao, Wang, Youshuo, Shou, Yingqing, Huo, Xiaoyu, Ji, Xiaoying, Chen, Jing, and Yu, Dianke
Published: 2023
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

208 results on '"Zhou, Pengyuan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources