VisFactor Leaderboard

Jen-Tse Huang1, Dasen Dai1, Jen-Yuan Huang2, Youliang Yuan3, Xiaoyuan Liu3, Wenxuan Wang4*, Wenxiang Jiao5, Pinjia He3, Zhaopeng Tu5, Haodong Duan6*
1 The Chinese University of Hong Kong
2 Peking University
3 The Chinese University of Hong Kong, Shenzhen
4 Renmin University of China
5 Tencent
6 Shanghai AI Laboratory
* Corresponding authors
arXiv Paper GitHub
VisFactor Overview

Results

VisFactor digitizes 20 vision-centric subtests from the Factor-Referenced Cognitive Test (FRCT) battery. Select a domain to filter the leaderboard, or view overall results across all subtests. Click any column header to sort.

Framework

Each FRCT subtest is digitized into a unified vision-to-text format while preserving its intended cognitive factor.

VisFactor Framework

Comparison with Existing Benchmarks

The following table compares VisFactor with prior vision-centric evaluation benchmarks, highlighting psychological grounding, automatic generation, difficulty control, rigorous measurement, image type, and task coverage. Click any column header to sort.

Benchmarks #T #Q P G D M I PR BM MM RT MZ PZ
#T: number of tasks; #Q: number of queries; P: psychological grounding; G: generation of new tests; D: different difficulties; M: rigorous measurement; I: natural (N) or synthetic (S) images; PR: pattern recognition; BM: Bongard/matrix reasoning; MM: memory; RT: rotation; MZ: maze; PZ: puzzle.

BibTeX

If you find our paper & tool useful, you are welcome to cite us using:

@article{huang2025visfactor,
  title={Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs},
  author={Huang, Jen-Tse and Dai, Dasen and Huang, Jen-Yuan and Yuan, Youliang and Liu, Xiaoyuan and Wang, Wenxuan and Jiao, Wenxiang and He, Pinjia and Tu, Zhaopeng and Duan, Haodong},
  journal={arXiv preprint arXiv:2502.16435},
  year={2025}
}

More Leaderboards

Explore more excellent benchmarks and leaderboards from ARISE Lab: