Skip to content

ML + Vision Top-6 Agent Survey (2023-2026)

  • Generated: 2026-06-03T12:22:27Z
  • Venues: Neural Information Processing Systems, International Conference on Machine Learning, International Conference on Learning Representations, Computer Vision and Pattern Recognition, IEEE International Conference on Computer Vision, European Conference on Computer Vision
  • Years: 2023-2026
  • Topic keywords: autonomous research agents, LLM agents, multimodal agents, vision-language agents, vision-language models, embodied agents, computer-use agents, program synthesis, alpha factor search
  • Selection threshold: relevance_score >= 3
  • Scoring provider/model: heuristic/heuristic-v1
  • Cache: /Users/paulweng/Documents/Codex/alpha-search-frontier-notes/.journal-survey-cache

Note: this is a Semantic Scholar snapshot generated on 2026-06-03. Citation counts are cumulative as of fetch time, not year-end counts. The 2026 rows are incomplete because several 2026 conferences were not fully published or indexed at snapshot time. Scoring uses heuristic keyword/alias matching for fast triage; selected papers should be treated as candidates for deeper review.

Summary

  • Fetched papers: 36384
  • Selected papers: 1277

Papers Per Venue Per Year

Venue 2023 2024 2025 2026 Total
Neural Information Processing Systems 40 167 0 0 207
International Conference on Machine Learning 35 75 98 0 208
International Conference on Learning Representations 45 124 74 0 243
Computer Vision and Pattern Recognition 53 143 124 0 320
IEEE International Conference on Computer Vision 19 50 153 0 222
European Conference on Computer Vision 16 61 0 0 77

Top 10 By Citation Count

Rank Title Venue Year Citations Score DOI
1 Reflexion: language agents with verbal reinforcement learning Neural Information Processing Systems 2023 3719 3 10.52202/075280-0377
2 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Neural Information Processing Systems 2023 3492 3 10.48550/arXiv.2305.06500
3 MMBench: Is Your Multi-modal Model an All-around Player? European Conference on Computer Vision 2023 2135 3 10.48550/arXiv.2307.06281
4 Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Neural Information Processing Systems 2023 1786 3 10.52202/075280-0943
5 Kosmos-2: Grounding Multimodal Large Language Models to the World International Conference on Learning Representations 2023 1195 3 10.48550/arXiv.2306.14824
6 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Neural Information Processing Systems 2024 1175 5 10.48550/arXiv.2405.15793
7 Are We on the Right Way for Evaluating Large Vision-Language Models? Neural Information Processing Systems 2024 818 3 10.48550/arXiv.2403.20330
8 SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Computer Vision and Pattern Recognition 2024 815 3 10.1109/CVPR52733.2024.01370
9 Language Is Not All You Need: Aligning Perception with Language Models Neural Information Processing Systems 2023 760 3 10.48550/arXiv.2302.14045
10 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Neural Information Processing Systems 2024 757 5 10.48550/arXiv.2404.07972

Top 5 By Topic Match

Rank Title Venue Year Citations Score DOI
1 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Neural Information Processing Systems 2024 1175 5 10.48550/arXiv.2405.15793
2 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Neural Information Processing Systems 2024 757 5 10.48550/arXiv.2404.07972
3 CogAgent: A Visual Language Model for GUI Agents Computer Vision and Pattern Recognition 2023 749 5 10.1109/CVPR52733.2024.01354
4 Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models International Conference on Machine Learning 2023 493 5 10.48550/arXiv.2310.04406
5 Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents International Conference on Learning Representations 2024 344 5 10.48550/arXiv.2410.05243

Paginated Papers

The paper-level sections are split into static venue/year pages so the deployed MkDocs page stays readable and fast to load.

Total paper entries: 1277. Detail pages: 53. Target page size: 30 papers.

Venue Year Papers Detail pages Start
Neural Information Processing Systems 2023 40 2 NeurIPS 2023
Neural Information Processing Systems 2024 167 6 NeurIPS 2024
Neural Information Processing Systems 2025 0 1 NeurIPS 2025
International Conference on Machine Learning 2023 35 2 ICML 2023
International Conference on Machine Learning 2024 75 3 ICML 2024
International Conference on Machine Learning 2025 98 4 ICML 2025
International Conference on Learning Representations 2023 45 2 ICLR 2023
International Conference on Learning Representations 2024 124 5 ICLR 2024
International Conference on Learning Representations 2025 74 3 ICLR 2025
Computer Vision and Pattern Recognition 2023 53 2 CVPR 2023
Computer Vision and Pattern Recognition 2024 143 5 CVPR 2024
Computer Vision and Pattern Recognition 2025 124 5 CVPR 2025
IEEE International Conference on Computer Vision 2023 19 1 ICCV 2023
IEEE International Conference on Computer Vision 2024 50 2 ICCV 2024
IEEE International Conference on Computer Vision 2025 153 6 ICCV 2025
European Conference on Computer Vision 2023 16 1 ECCV 2023
European Conference on Computer Vision 2024 61 3 ECCV 2024