Author: "Zhang, Zhehao" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Zhehao"' showing total 9 results

Start Over Author "Zhang, Zhehao" Publication Type Reports

9 results on '"Zhang, Zhehao"'

1. VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

Author: Zhang, Zhehao, Rossi, Ryan, Yu, Tong, Dernoncourt, Franck, Zhang, Ruiyi, Gu, Jiuxiang, Kim, Sungchul, Chen, Xiang, Wang, Zichao, and Lipka, Nedim
Subjects: Computer Science - Computation and Language
Abstract: While vision-language models (VLMs) have demonstrated remarkable performance across various tasks combining textual and visual information, they continue to struggle with fine-grained visual perception tasks that require detailed pixel-level analysis. Effectively eliciting comprehensive reasoning from VLMs on such intricate visual elements remains an open challenge. In this paper, we present VipAct, an agent framework that enhances VLMs by integrating multi-agent collaboration and vision expert models, enabling more precise visual understanding and comprehensive reasoning. VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks such as image captioning and vision expert models that provide high-precision perceptual information. This multi-agent approach allows VLMs to better perform fine-grained visual perception tasks by synergizing planning, reasoning, and tool use. We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements over state-of-the-art baselines across all tasks. Furthermore, comprehensive ablation studies reveal the critical role of multi-agent collaboration in eliciting more detailed System-2 reasoning and highlight the importance of image input for task planning. Additionally, our error analysis identifies patterns of VLMs' inherent limitations in visual perception, providing insights into potential future improvements. VipAct offers a flexible and extensible framework, paving the way for more advanced visual perception systems across various real-world applications.
Published: 2024

2. Visual Prompting in Multimodal Large Language Models: A Survey

Author: Wu, Junda, Zhang, Zhehao, Xia, Yu, Li, Xintong, Xia, Zhaoyang, Chang, Aaron, Yu, Tong, Kim, Sungchul, Rossi, Ryan A., Zhang, Ruiyi, Mitra, Subrata, Metaxas, Dimitris N., Yao, Lina, Shang, Jingbo, and McAuley, Julian
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal large language models (MLLMs) equip pre-trained large-language models (LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied, visual prompting has emerged for more fine-grained and free-form visual instructions. This paper presents the first comprehensive survey on visual prompting methods in MLLMs, focusing on visual prompting, prompt generation, compositional reasoning, and prompt learning. We categorize existing visual prompts and discuss generative methods for automatic prompt annotations on the images. We also examine visual prompting methods that enable better alignment between visual encoders and backbone LLMs, concerning MLLM's visual grounding, object referring, and compositional reasoning abilities. In addition, we provide a summary of model training and in-context learning methods to improve MLLM's perception and understanding of visual prompts. This paper examines visual prompting methods developed in MLLMs and provides a vision of the future of these methods., Comment: 10 pages
Published: 2024

3. DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

Author: Zhang, Zhehao, Chen, Jiaao, and Yang, Diyi
Subjects: Computer Science - Computation and Language
Abstract: The current paradigm of evaluating Large Language Models (LLMs) through static benchmarks comes with significant limitations, such as vulnerability to data contamination and a lack of adaptability to the evolving capabilities of LLMs. Therefore, evaluation methods that can adapt and generate evaluation data with controlled complexity are urgently needed. In this work, we introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity. Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data. Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks. We further use a code-augmented LLM to ensure the label correctness of newly generated data. We apply our DARG framework to diverse reasoning tasks in four domains with 15 state-of-the-art LLMs. Experimental results show that almost all LLMs experience a performance decrease with increased complexity and certain LLMs exhibit significant drops. Additionally, we find that LLMs exhibit more biases when being evaluated via the data generated by DARG with higher complexity levels. These observations provide useful insights into how to dynamically and adaptively evaluate LLMs. The code is available at https://github.com/SALT-NLP/DARG.
Published: 2024

4. Bounds on the Distribution of a Sum of Two Random Variables: Revisiting a problem of Kolmogorov with application to Individual Treatment Effects

Author: Zhang, Zhehao and Richardson, Thomas S.
Subjects: Mathematics - Statistics Theory, Economics - Econometrics, Mathematics - Probability
Abstract: We revisit the following problem, proposed by Kolmogorov: given prescribed marginal distributions $F$ and $G$ for random variables $X,Y$ respectively, characterize the set of compatible distribution functions for the sum $Z=X+Y$. Bounds on the distribution function for $Z$ were given by Markarov (1982), and Frank et al. (1987), the latter using copula theory. However, though they obtain the same bounds, they make different assertions concerning their sharpness. In addition, their solutions leave some open problems in the case when the given marginal distribution functions are discontinuous. These issues have led to some confusion and erroneous statements in subsequent literature, which we correct. Kolmogorov's problem is closely related to inferring possible distributions for individual treatment effects $Y_1 - Y_0$ given the marginal distributions of $Y_1$ and $Y_0$; the latter being identified from a randomized experiment. We use our new insights to sharpen and correct results due to Fan and Park (2010) concerning individual treatment effects, and to fill some other logical gaps.
Published: 2024

5. Quantum Communication and Mixed-State Order in Decohered Symmetry-Protected Topological States

Author: Zhang, Zhehao, Agrawal, Utkarsh, and Vijay, Sagar
Subjects: Quantum Physics, Condensed Matter - Strongly Correlated Electrons
Abstract: Certain pure-state symmetry-protected topological orders (SPT) can be used as a resource for transmitting quantum information. Here, we investigate the ability to transmit quantum information using decohered SPT states, and relate this property to the "strange correlation functions" which diagnose quantum many-body orders in these mixed-states. This perspective leads to the identification of a class of quantum channels -- termed symmetry-decoupling channels -- which do not necessarily preserve any weak or strong symmetries of the SPT state, but nevertheless protect quantum many-body order in the decohered mixed-state. We quantify the ability to transmit quantum information in decohered SPT states through the coherent quantum information, whose behavior is generally related to a decoding problem, whereby local measurements in the system are used to attempt to "learn" the symmetry charge of the SPT state before decoherence., Comment: 27 pages, 8 figures
Published: 2024

6. A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy

Author: Jin, Xiyao, Hao, Yao, Hilliard, Jessica, Zhang, Zhehao, Thomas, Maria A., Li, Hua, Jha, Abhinav K., and Hugo, Geoffrey D.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Physics - Medical Physics
Abstract: To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients were collected, including one 'common' image domain and five 'uncommon' domains. Segmentation models were tested on the benchmark dataset for an initial evaluation of model capacity and limitations. An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features. Another Variational Autoencoder (VAE) was also trained to estimate the shape quality of the auto-segmentation results. Using the extracted features from the image/segmentation pair as inputs, a regression model was trained to predict the per-patient segmentation accuracy, measured by Dice coefficient similarity (DSC). The framework was tested across 19 segmentation models to evaluate the generalizability of the entire framework. As results, the predicted DSC of regression models achieved a mean absolute error (MAE) ranging from 0.036 to 0.046 with an averaged MAE of 0.041. When tested on the benchmark dataset, the performances of all segmentation models were not significantly affected by scanning parameters: FOV, slice thickness and reconstructions kernels. For input images with Poisson noise, CNN-based segmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41, while the transformer-based model was not significantly affected.
Published: 2023

7. Can Large Language Models Transform Computational Social Science?

Author: Ziems, Caleb, Held, William, Shaikh, Omar, Chen, Jiaao, Zhang, Zhehao, and Yang, Diyi
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references. We conclude that the performance of today's LLMs can augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans., Comment: To appear in "Computational Linguistics" (CL)
Published: 2023

8. Mixed-state long-range order and criticality from measurement and feedback

Author: Lu, Tsung-Cheng, Zhang, Zhehao, Vijay, Sagar, and Hsieh, Timothy H.
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Statistical Mechanics, Quantum Physics
Abstract: We propose a general framework for using local measurements, local unitaries, and non-local classical communication to construct quantum channels which can efficiently prepare mixed states with long-range quantum order or quantum criticality. As an illustration, symmetry-protected topological (SPT) phases can be universally converted into mixed-states with long-range entanglement, which can undergo phase transitions with quantum critical correlations of local operators and a logarithmic scaling of the entanglement negativity, despite coexisting with volume-law entropy. Within the same framework, we present two applications using fermion occupation number measurement to convert (i) spinful free fermions in one dimension into a quantum-critical mixed state with enhanced algebraic correlations between spins and (ii) Chern insulators into a mixed state with critical quantum correlations in the bulk. The latter is an example where mixed-state quantum criticality can emerge from a gapped state of matter in constant depth using local quantum operations and non-local classical communication., Comment: 25 pages, 11 figures; updated to the published version
Published: 2023
Full Text: View/download PDF

9. The X-Cube Floquet Code

Author: Zhang, Zhehao, Aasen, David, and Vijay, Sagar
Subjects: Quantum Physics, Condensed Matter - Strongly Correlated Electrons
Abstract: Inspired by the coupled-layer construction of the X-Cube model, we introduce the X-Cube Floquet code, a dynamical quantum error-correcting code where the number of encoded logical qubits grows with system size. The X-Cube Floquet code is defined on a three-dimensional lattice, built from intersecting two-dimensional layers in the $xy$, $yz$, and $xz$ directions, and consists of a periodic sequence of two-qubit measurements which couple the layers together. Within a single Floquet cycle, the codespace switches between that of the X-Cube fracton order and layers of entangled, two-dimensional toric codes. The encoded logical qubits' dynamics are analyzed, and we argue that the new code has a non-zero error threshold. We provide a new Hamiltonian realization of the X-Cube model and, more generally, explore the phase diagram related to the sequence of measurements that define the X-Cube Floquet code., Comment: Main Text (6 pages, 5 figures), Appendices (4 pages, 5 figures)
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Zhang, Zhehao"'

1. VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

2. Visual Prompting in Multimodal Large Language Models: A Survey

3. DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

4. Bounds on the Distribution of a Sum of Two Random Variables: Revisiting a problem of Kolmogorov with application to Individual Treatment Effects

5. Quantum Communication and Mixed-State Order in Decohered Symmetry-Protected Topological States

6. A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy

7. Can Large Language Models Transform Computational Social Science?

8. Mixed-state long-range order and criticality from measurement and feedback

9. The X-Cube Floquet Code

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

9 results on '"Zhang, Zhehao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources