ML + Vision Top-6 Agent Survey - ICLR 2024 - Page 5 of 5¶

Venue: International Conference on Learning Representations
Year: 2024
Page: 5 / 5
Papers: 121-124 / 124

Papers

LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension Paper

Authors: Amaia Cardiel, É. Zablocki, Elias Ramzi, Oriane Sim'eoni, Matthieu Cord
Year: 2024
Venue: International Conference on Learning Representations
DOI: Not stated.
Citations: 4
Relevance: 3 / 5
Why selected: Heuristic keyword/alias matches: vision-language models (matched: vlm, vision language models, vlms).
Code: Not found.
Extraction: method/data pending

Abstract

Vision Language Models (VLMs) have demonstrated remarkable capabilities in various open-vocabulary tasks, yet their zero-shot performance lags behind task-specific fine-tuned models, particularly in complex tasks like Referring Expression Comprehension (REC). Fine-tuning usually requires 'white-box' access to the model's architecture and weights, which is not always feasible due to proprietary or privacy concerns. In this work, we propose LLM-wrapper, a method for 'black-box' adaptation of VLMs for the REC task using Large Language Models (LLMs). LLM-wrapper capitalizes on the reasoning abilities of LLMs, improved with a light fine-tuning, to select the most relevant bounding box matching the referring expression, from candidates generated by a zero-shot black-box VLM. Our approach offers several advantages: it enables the adaptation of closed-source models without needing access to their internal workings, it is versatile as it works with any VLM, it transfers to new VLMs and datasets, and it allows for the adaptation of an ensemble of VLMs. We evaluate LLM-wrapper on multiple datasets using different VLMs and LLMs, demonstrating significant performance improvements and highlighting the versatility of our method. While LLM-wrapper is not meant to directly compete with standard white-box fine-tuning, it offers a practical and effective alternative for black-box VLM adaptation. Code and checkpoints are available at https://github.com/valeoai/LLM_wrapper .

Claim

Vision Language Models (VLMs) have demonstrated remarkable capabilities in various open-vocabulary tasks, yet their zero-shot performance lags behind task-specific fine-tuned models, particularly in complex tasks like Referring Expression Comprehension (REC).

OCCAM: Towards Cost-Efficient and Accuracy-Aware Image Classification Inference Paper

Authors: Dujian Ding, Bicheng Xu, L. Lakshmanan
Year: 2024
Venue: International Conference on Learning Representations
DOI: 10.48550/arXiv.2406.04508
Citations: 3
Relevance: 3 / 5
Why selected: Heuristic keyword/alias matches: program synthesis, alpha factor search (matched: programming, portfolio).
Code: Not found.
Extraction: method/data pending

Abstract

Classification tasks play a fundamental role in various applications, spanning domains such as healthcare, natural language processing and computer vision. With the growing popularity and capacity of machine learning models, people can easily access trained classifiers as a service online or offline. However, model use comes with a cost and classifiers of higher capacity (such as large foundation models) usually incur higher inference costs. To harness the respective strengths of different classifiers, we propose a principled approach, OCCAM, to compute the best classifier assignment strategy over classification queries (termed as the optimal model portfolio) so that the aggregated accuracy is maximized, under user-specified cost budgets. Our approach uses an unbiased and low-variance accuracy estimator and effectively computes the optimal solution by solving an integer linear programming problem. On a variety of real-world datasets, OCCAM achieves 40% cost reduction with little to no accuracy drop.

Claim

Classification tasks play a fundamental role in various applications, spanning domains such as healthcare, natural language processing and computer vision.

L2MAC: Large Language Model Automatic Computer for Extensive Code Generation Paper

Authors: Samuel Holt, Max Ruiz Luyten, M. Schaar
Year: 2024
Venue: International Conference on Learning Representations
DOI: Not stated.
Citations: 0
Relevance: 3 / 5
Why selected: Heuristic keyword/alias matches: program synthesis (matched: code generation).
Code: Not found.
Extraction: method/data pending

Abstract

Not stated in metadata.

Claim

Not stated in abstract.

Large Language Models as Automated Aligners for benchmarking Vision-Language Models Paper

Authors: Yuanfeng Ji, Chongjian Ge, Weikai Kong, Enze Xie, Zhengying Liu, Zhenguo Li, Ping Luo
Year: 2024
Venue: International Conference on Learning Representations
DOI: Not stated.
Citations: 0
Relevance: 3 / 5
Why selected: Heuristic keyword/alias matches: vision-language models (matched: vision language models).
Code: Not found.
Extraction: method/data pending

Abstract

Not stated in metadata.

Claim

Not stated in abstract.