Hello! I am Baiqi Li, a research assistant at Carnegie Mellon University, wadvised by Professor Deva Ramanan. I graduated in June 2024 with a masterβs degree from the School of Computer Science at East China Normal University. My research focuses on computer vision and language, particularly on evaluation metrics and benchmarks for the cognitive abilities of VLMs and improvement of multimodal generative models.
I am looking for a PhD/RA opportunity starting in 2025. If you are interested in my profile, please feel free to contact me. π
π₯ News
- 2024.09: ππ Our paper NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples was accepted by NeurIPS2024.
- 2024.08: ππ Our VQAScore was rated by Imagen as the strongest text-to-vision evaluation metric, and our benchmark, GenAI-Bench, was also extensively used by Imagen.
- 2024.06: ππ Our workshop paper GenAI-Bench: A Holistic Benchmark for Compositional Text-to-Visual Generation has been selected as the best paper at SynData4CV workshop @ CVPR2024.
- 2024.06: We introduced GenAI-Bench for evaluating the performance of leading image and video generation models in various aspects of compositional text-to-visual generation and evaluation metrics.: GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation.
- 2024.06: We proposed a semi-automated approach to collect a vision-centric benchmark, NaturalBench, for reliably evaluating VLMs: NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples.
- 2024.04: We introduced VQAScore for evaluating the prompt alignment of text-to-image/video/3D models: Evaluating Text-to-Visual Generation with Image-to-Text Generation.
- 2024: Federated Learning Vulnerabilities: Privacy Attacks with Denoising Diffusion Probabilistic Models, Hongyan Gu, Xinyan Zhang, Jiang Li, Hui Wei, Baiqi Li, Xinli Huang, accepted by WWWβ24.
π Publications

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li*, Zhiqiu Lin*, Wenxuan Peng*, Jean de Dieu Nyandwi*, Daniel Jiang, Zixian Ma, Simran Khanuja, Ranjay Krishna β , Graham Neubig β , Deva Ramanan β
Website | Arxiv | HuggingFace |
- In this work, we show that VLMs still struggle with natural images and questions that humans can easily answer, which we term natural adversarial samples.
- We propose a semi-automated approach to collect a new benchmark, NaturalBench, for reliably evaluating VLMs with over 10,000 human-verified VQA samples.
- Evaluated NaturalBench on 53 vision-language models, including both open-source and closed-source examples like GPT4-o, Qwen2-VL, Molmo, and InternVL
- Mitigating biases, such as the tendency of GPT-4o to agree with most questions due to lack of calibration, can yield a 100\% improvement in model performance.

Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin, Deepak Pathak, Baiqi Li, Emily Li, Xide Xia, Graham Neubig, Pengchuan Zhang β , Deva Ramanan β
- We propose VQAScore, the state-of-the-art alignment metric for text-to-image/video/3D models.
- VQAScore based on our new CLIP-FlanT5 model outperforms previous metrics based on GPT-4Vision or costly human feedback.

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Baiqi Li*, Zhiqiu Lin*, Deepak Pathak, Emily Li, Feiyi Xin, Kewen Wu, Tiffany Ling, Xide Xia β , Pengchuan Zhang β , Graham Neubig β , Deva Ramanan β
Website | Arxiv | HuggingFace
- We conduct an extensive human study on compositional text-to-visual generation using GenAI-Bench, revealing limitations of leading open-source and closed-source models.
- We present a simple black-box approach that improves generation by ranking images with VQAScore, significantly surpassing other scoring methods by 2x to 3x.
- We will release GenAI-Rank with over 40,000 human ratings to benchmark methods that rank images generated from the same prompt.
- 2024: Federated Learning Vulnerabilities: Privacy Attacks with Denoising Diffusion Probabilistic Models, Hongyan Gu, Xinyan Zhang, Jiang Li, Hui Wei, Baiqi Li, Xinli Huang, accepted by WWWβ24.
- 2024: Federated Learning on Distributed Graphs Considering Multiple Heterogeneities, Baiqi Li; Yedi Ma; Yufei Liu; Hongyan Gu; Xinli Huang, accepted by ICASSPβ24.
π Honors and Services
- Reviewer: ICLR, ECAI, TIST β¦
π Research Experience
- *2023.08 - present, Research Assistant, Carnegie Mellon University.
- *2021.09 - 2024.06, Master student, East China Normal University.
π¬ Invited Talks
- 2024.05, I presented [GenAI-Bench/NaturalBench Benchmark] at CMU.