ML + Vision Top-6 Agent Survey - ICML 2023 - Page 1 of 2

  • Venue: International Conference on Machine Learning
  • Year: 2023
  • Page: 1 / 2
  • Papers: 1-30 / 35
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models Paper
  • Authors: Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2310.04406
  • Citations: 493
  • Relevance: 5 / 5
  • Why selected: Heuristic keyword/alias matches: LLM agents, program synthesis (matched: language agent, programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning. By leveraging the in-context learning ability of LMs, we integrate Monte Carlo Tree Search into LATS to enable LMs as agents, along with LM-powered value functions and self-reflections for proficient exploration and enhanced decision-making. A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that surpasses the constraints of existing techniques. Our experimental evaluation across diverse domains, including programming, interactive question-answering (QA), web navigation, and math, validates the effectiveness and generality of LATS in decision-making while maintaining competitive or improved reasoning performance. Notably, LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT-3.5. Code can be found at https://github.com/lapisrocks/LanguageAgentTreeSearch

Claim

While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents.

Distilling Internet-Scale Vision-Language Models into Embodied Agents Paper
  • Authors: T. Sumers, Kenneth Marino, Arun Ahuja, R. Fergus, Ishita Dasgupta
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2301.12507
  • Citations: 36
  • Relevance: 5 / 5
  • Why selected: Heuristic keyword/alias matches: vision-language models, embodied agents (matched: vlm, vision language models, vlms, embodied agents).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior. Simple prompting allows us to control the supervision signal, teaching an agent to interact with novel objects based on their names (e.g., planes) or their features (e.g., colors) in a 3D rendered environment. Fewshot prompting lets us teach abstract category membership, including pre-existing categories (food vs toys) and ad-hoc ones (arbitrary preferences over objects). Our work outlines a new and effective way to use internet-scale VLMs, repurposing the generic language grounding acquired by such models to teach task-relevant groundings to embodied agents.

Claim

Instruction-following agents must ground language into their observation and action spaces.

LEVER: Learning to Verify Language-to-Code Generation with Execution Paper
  • Authors: Ansong Ni, Srini Iyer, Dragomir R. Radev, Ves Stoyanov, Wen-tau Yih, Sida I. Wang, Xi Victoria Lin
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2302.08468
  • Citations: 332
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming, code generation).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with sample pruning and reranking using test cases or heuristics based on the execution results. However, it is challenging to obtain test cases for many real-world language-to-code applications, and heuristics cannot well capture the semantic features of the execution results, such as data type and value range, which often indicates the correctness of the program. In this work, we propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. The sampled programs are reranked by combining the verification score with the LLM generation probability, and marginalizing over programs with the same execution results. On four datasets across the domains of table QA, math QA and basic Python programming, LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci-002) and achieves new state-of-the-art results on all of them.

Claim

The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation.

Magicoder: Empowering Code Generation with OSS-Instruct Paper
  • Authors: Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, Lingming Zhang
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: Not stated.
  • Citations: 262
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: code generation).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets to generate diverse instruction data for code. Our main motivation is to mitigate the inherent bias of the synthetic data generated by LLMs through the wealth of open-source references for the production of more realistic and controllable data. The orthogonality of OSS-Instruct and other data generation methods like Evol-Instruct further enables us to build an enhanced MagicoderS. Both Magicoder and MagicoderS substantially outperform state-of-the-art code models with similar or even larger sizes on a wide range of coding benchmarks. Notably, MagicoderS-CL-7B based on CodeLlama even surpasses the prominent ChatGPT on HumanEval+ (66.5 vs. 65.9 in pass@1 ). Overall, OSS-Instruct opens a new direction for crafting diverse synthetic instruction data for code using abundant open-source references.

Claim

We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters.

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation Paper
  • Authors: Qian Huang, Jian Vora, Percy Liang, J. Leskovec
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: Not stated.
  • Citations: 229
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: LLM agents (matched: language agents).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

A central aspect of machine learning research is experimentation, the process of designing and running experiments, analyzing the results, and iterating towards some positive outcome (e.g., improving accuracy). Could agents driven by powerful language models perform machine learning experimentation effectively? To answer this question, we introduce MLAgentBench, a suite of 13 tasks ranging from improving model performance on CIFAR-10 to recent research problems like BabyLM. For each task, an agent can perform actions like reading/writing files, executing code, and inspecting outputs. We then construct an agent that can perform ML experimentation based on ReAct framework. We benchmark agents based on Claude v1.0, Claude v2.1, Claude v3 Opus, GPT-4, GPT-4-turbo, Gemini-Pro, and Mixtral and find that a Claude v3 Opus agent is the best in terms of success rate. It can build compelling ML models over many tasks in MLAgentBench with 37.5% average success rate. Our agents also display highly interpretable plans and actions. However, the success rates vary considerably; they span from 100% on well-established older datasets to as low as 0% on recent Kaggle challenges created potentially after the underlying LM was trained. Finally, we identify several key challenges for LM-based agents such as long-term planning and reducing hallucination. Our code is released at https://github.com/snap-stanford/MLAgentBench.

Claim

A central aspect of machine learning research is experimentation, the process of designing and running experiments, analyzing the results, and iterating towards some positive outcome (e.g., improving accuracy).

Image Hijacks: Adversarial Images can Control Generative Models at Runtime Paper
  • Authors: Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2309.00236
  • Citations: 176
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: vision-language models (matched: vlm, vision language model, vlms).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Are foundation models secure against malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control the behaviour of VLMs at inference time, and introduce the general Behaviour Matching algorithm for training image hijacks. From this, we derive the Prompt Matching method, allowing us to train hijacks matching the behaviour of an arbitrary user-defined text prompt (e.g. 'the Eiffel Tower is now located in Rome') using a generic, off-the-shelf dataset unrelated to our choice of prompt. We use Behaviour Matching to craft hijacks for four types of attack, forcing VLMs to generate outputs of the adversary's choice, leak information from their context window, override their safety training, and believe false statements. We study these attacks against LLaVA, a state-of-the-art VLM based on CLIP and LLaMA-2, and find that all attack types achieve a success rate of over 80%. Moreover, our attacks are automated and require only small image perturbations.

Claim

Are foundation models secure against malicious actors? In this work, we focus on the image input to a vision-language model (VLM).

Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game Paper
  • Authors: Zelai Xu, Chao Yu, Fei Fang, Yu Wang, Yi Wu
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2310.18940
  • Citations: 153
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: LLM agents (matched: language agents).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Agents built with large language models (LLMs) have shown great potential across a wide range of domains. However, in complex decision-making tasks, pure LLM-based agents tend to exhibit intrinsic bias in their choice of actions, which is inherited from the model's training data and results in suboptimal performance. To develop strategic language agents, i.e., agents that generate flexible language actions and possess strong decision-making abilities, we propose a novel framework that powers LLM-based agents with reinforcement learning (RL). We consider Werewolf, a popular social deduction game, as a challenging testbed that emphasizes versatile communication and strategic gameplay. To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates. Then an RL policy trained to optimize the decision-making ability chooses an action from the candidates to play in the game. Extensive experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game. We also conduct human-agent experiments and find that our agents achieve human-level performance and demonstrate strong strategic play.

Claim

Agents built with large language models (LLMs) have shown great potential across a wide range of domains.

Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling Paper
  • Authors: Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, Roy Fox
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2301.12050
  • Citations: 112
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: embodied agents (matched: embodied agents).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world. However, if initialized with knowledge of high-level subgoals and transitions between subgoals, RL agents could utilize this Abstract World Model (AWM) for planning and exploration. We propose using few-shot large language models (LLMs) to hypothesize an AWM, that will be verified through world experience, to improve sample efficiency of RL agents. Our DECKARD agent applies LLM-guided exploration to item crafting in Minecraft in two phases: (1) the Dream phase where the agent uses an LLM to decompose a task into a sequence of subgoals, the hypothesized AWM; and (2) the Wake phase where the agent learns a modular policy for each subgoal and verifies or corrects the hypothesized AWM. Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude but is also robust to and corrects errors in the LLM, successfully blending noisy internet-scale information from LLMs with knowledge grounded in environment dynamics.

Claim

Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.

Measuring The Impact Of Programming Language Distribution Paper
  • Authors: Gabriel Orlanski, Kefan Xiao, Xavier García, Jeffrey Hui, Joshua Howland, J. Malmaud, Jacob Austin, Risha Singh, Michele Catasta
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2302.01973
  • Citations: 50
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced corpus results in, on average, 12.34% higher \(pass@k\) across all tasks and languages compared to the baseline. We find that this strategy achieves 66.48% better \(pass@k\) on low-resource languages at the cost of only a 12.94% decrease to high-resource languages. In our three translation tasks, this strategy yields, on average, 30.77% better low-resource \(pass@k\) while having 19.58% worse high-resource \(pass@k\).

Claim

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust.

Revisiting the Role of Language Priors in Vision-Language Models Paper
  • Authors: Zhiqiu Lin, Xinyue Chen, Deepak Pathak, Pengchuan Zhang, Deva Ramanan
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: Not stated.
  • Citations: 45
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: vision-language models (matched: vision language models, vlms).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Vision-language models (VLMs) are impactful in part because they can be applied to a variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning. We study \(*generative VLMs*\) that are trained for next-word generation given an image. We explore their zero-shot performance on the illustrative task of image-text retrieval across 8 popular vision-language benchmarks. Our first observation is that they can be repurposed for discriminative tasks (such as image-text retrieval) by simply computing the match score of generating a particular text string given an image. We call this probabilistic score the \(*Visual Generative Pre-Training Score*\) (VisualGPTScore). While the VisualGPTScore produces near-perfect accuracy on some retrieval benchmarks, it yields poor accuracy on others. We analyze this behavior through a probabilistic lens, pointing out that some benchmarks inadvertently capture unnatural language distributions by creating adversarial but unlikely text captions. In fact, we demonstrate that even a"blind"language model that ignores any image evidence can sometimes outperform all prior art, reminiscent of similar challenges faced by the visual-question answering (VQA) community many years ago. We derive a probabilistic post-processing scheme that controls for the amount of linguistic bias in generative VLMs at test time without having to retrain or fine-tune the model. We show that the VisualGPTScore, when appropriately debiased, is a strong zero-shot baseline for vision-language understanding, oftentimes producing state-of-the-art accuracy.

Claim

Vision-language models (VLMs) are impactful in part because they can be applied to a variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning.

Emergent Agentic Transformer from Chain of Hindsight Experience Paper
  • Authors: Hao Liu, P. Abbeel
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2305.16554
  • Citations: 38
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: LLM agents (matched: agentic).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards. Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion tokens, the resulting model, Agentic Transformer (AT), can learn to improve upon itself both at training and test time. As we show on D4RL and ExoRL benchmarks, to the best our knowledge, this is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches, even from sub-optimal data. Our Agentic Transformer also shows a promising scaling trend that bigger models consistently improve results.

Claim

Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas.

Brainformers: Trading Simplicity for Efficiency Paper
  • Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David R. So, Andrew M. Dai, Yifeng Lu, Zhifeng Chen, Quoc V. Le, et al.
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2306.00008
  • Citations: 38
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: alpha factor search (matched: trading).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this insight, we develop a complex block, named Brainformer, that consists of a diverse sets of layers such as sparsely gated feed-forward layers, dense feed-forward layers, attention layers, and various forms of layer normalization and activation functions. Brainformer consistently outperforms the state-of-the-art dense and sparse Transformers, in terms of both quality and efficiency. A Brainformer model with 8 billion activated parameters per token demonstrates 2x faster training convergence and 5x faster step time compared to its GLaM counterpart. In downstream task evaluation, Brainformer also demonstrates a 3% higher SuperGLUE score with fine-tuning compared to GLaM with a similar number of activated parameters. Finally, Brainformer largely outperforms a Primer dense model derived with NAS with similar computation per token on fewshot evaluations.

Claim

Transformers are central to recent successes in natural language processing and computer vision.

Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation Paper
  • Authors: Wenqing Zheng, S. Sharan, Ajay Jaiswal, Kevin Wang, Yihan Xi, Dejia Xu, Zhangyang Wang
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2305.00909
  • Citations: 35
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: program synthesis, code generation).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

For a complicated algorithm, its implementation by a human programmer usually starts with outlining a rough control flow followed by iterative enrichments, eventually yielding carefully generated syntactic structures and variables in a hierarchy. However, state-of-the-art large language models generate codes in a single pass, without intermediate warm-ups to reflect the structured thought process of"outline-then-detail". Inspired by the recent success of chain-of-thought prompting, we propose ChainCoder, a program synthesis language model that generates Python code progressively, i.e. from coarse to fine in multiple passes. We first decompose source code into layout frame components and accessory components via abstract syntax tree parsing to construct a hierarchical representation. We then reform our prediction target into a multi-pass objective, each pass generates a subsequence, which is concatenated in the hierarchy. Finally, a tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples. Extensive evaluations show that ChainCoder outperforms state-of-the-arts, demonstrating that our progressive generation eases the reasoning procedure and guides the language model to generate higher-quality solutions. Our codes are available at: https://github.com/VITA-Group/ChainCoder.

Claim

For a complicated algorithm, its implementation by a human programmer usually starts with outlining a rough control flow followed by iterative enrichments, eventually yielding carefully generated syntactic structures and variables in a hierarchy.

Controllable Neural Symbolic Regression Paper
  • Authors: Tommaso Bendinelli, Luca Biggio, Pierre-Alexandre Kamienny
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2304.10336
  • Citations: 22
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: symbolic regression).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

In symbolic regression, the goal is to find an analytical expression that accurately fits experimental data with the minimal use of mathematical symbols such as operators, variables, and constants. However, the combinatorial space of possible expressions can make it challenging for traditional evolutionary algorithms to find the correct expression in a reasonable amount of time. To address this issue, Neural Symbolic Regression (NSR) algorithms have been developed that can quickly identify patterns in the data and generate analytical expressions. However, these methods, in their current form, lack the capability to incorporate user-defined prior knowledge, which is often required in natural sciences and engineering fields. To overcome this limitation, we propose a novel neural symbolic regression method, named Neural Symbolic Regression with Hypothesis (NSRwH) that enables the explicit incorporation of assumptions about the expected structure of the ground-truth expression into the prediction process. Our experiments demonstrate that the proposed conditioned deep learning model outperforms its unconditioned counterparts in terms of accuracy while also providing control over the predicted expression structure.

Claim

In symbolic regression, the goal is to find an analytical expression that accurately fits experimental data with the minimal use of mathematical symbols such as operators, variables, and constants.

GNN&GBDT-Guided Fast Optimizing Framework for Large-scale Integer Programming Paper
  • Authors: Huigen Ye, Hua-Hui Xu, Hongyan Wang, Chengming Wang, Yu Jiang
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: Not stated.
  • Citations: 22
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Not stated in metadata.

Claim

Not stated in abstract.

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models Paper
  • Authors: Zhihe Lu, Jiawang Bai, Xin Li, Zeyu Xiao, Xinchao Wang
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2311.17091
  • Citations: 21
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: vision-language models (matched: vlm, vision language models, vlms).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Fine-tuning pre-trained vision-language models (VLMs), e.g., CLIP, for the open-world generalization has gained increasing popularity due to its practical value. However, performance advancements are limited when relying solely on intricate algorithmic designs for a single model, even one exhibiting strong performance, e.g., CLIP-ViT-B/16. This paper, for the first time, explores the collaborative potential of leveraging much weaker VLMs to enhance the generalization of a robust single model. The affirmative findings motivate us to address the generalization problem from a novel perspective, i.e., ensemble of pre-trained VLMs. We introduce three customized ensemble strategies, each tailored to one specific scenario. Firstly, we introduce the zero-shot ensemble, automatically adjusting the logits of different models based on their confidence when only pre-trained VLMs are available. Furthermore, for scenarios with extra few-shot samples, we propose the training-free and tuning ensemble, offering flexibility based on the availability of computing resources. The proposed ensemble strategies are evaluated on zero-shot, base-to-new, and cross-dataset generalization, achieving new state-of-the-art performance. Notably, this work represents an initial stride toward enhancing the generalization performance of VLMs via ensemble. The code is available at https://github.com/zhiheLu/Ensemble_VLM.git.

Claim

Fine-tuning pre-trained vision-language models (VLMs), e.g., CLIP, for the open-world generalization has gained increasing popularity due to its practical value.

DiscoBAX: Discovery of optimal intervention sets in genomic experiment design Paper
  • Authors: Clare Lyle, Arash Mehrjou, Pascal Notin, A. Jesson, Stefan Bauer, Y. Gal, P. Schwab, Google DeepMind, AI Helmholtz
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2312.04064
  • Citations: 17
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: autonomous research agents (matched: experiment design).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanisms. Existing approaches search over the billions of potential interventions to maximize the expected influence on the target phenotype. However, to reduce the risk of failure in future stages of trials, practical experiment design aims to find a set of interventions that maximally change a target phenotype via diverse mechanisms. We propose DiscoBAX, a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign. We provide theoretical guarantees of approximate optimality under standard assumptions, and conduct a comprehensive experimental evaluation covering both synthetic as well as real-world experimental design tasks. DiscoBAX outperforms existing state-of-the-art methods for experimental design, selecting effective and diverse perturbations in biological systems.

Claim

The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanisms.

Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming Paper
  • Authors: Chunlin Sun, Shang Liu, Xiaocheng Li
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2301.11260
  • Citations: 14
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program. The problem is also known as predictive analytics or contextual linear programming. The existing approaches largely suffer from either (i) optimization intractability (a non-convex objective function)/statistical inefficiency (a suboptimal generalization bound) or (ii) requiring strong condition(s) such as no constraint or loss calibration. We develop a new approach to the problem called maximum optimality margin which designs the machine learning loss function by the optimality condition of the downstream optimization. The max-margin formulation enjoys both computational efficiency and good theoretical properties for the learning procedure. More importantly, our new approach only needs the observations of the optimal solution in the training data rather than the objective function, which makes it a new and natural approach to the inverse linear programming problem under both contextual and context-free settings; we also analyze the proposed method under both offline and online settings, and demonstrate its performance using numerical experiments.

Claim

In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program.

Feature Programming for Multivariate Time Series Prediction Paper
  • Authors: Alex Reneau, Jerry Yao-Chieh Hu, Chenwei Xu, Weijian Li, A. Gilani, Han Liu
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2306.06252
  • Citations: 11
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework. This framework generates large amounts of predictive features for noisy multivariate time series while allowing users to incorporate their inductive bias with minimal effort. The key motivation of our framework is to view any multivariate time series as a cumulative sum of fine-grained trajectory increments, with each increment governed by a novel spin-gas dynamical Ising model. This fine-grained perspective motivates the development of a parsimonious set of operators that summarize multivariate time series in an abstract fashion, serving as the foundation for large-scale automated feature engineering. Numerically, we validate the efficacy of our method on several synthetic and real-world noisy time series datasets.

Claim

We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework.

Leveraging VLM-Based Pipelines to Annotate 3D Objects Paper
  • Authors: Rishabh Kabra, L. Matthey, Alexander Lerchner, N. Mitra
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: Not stated.
  • Citations: 11
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: vision-language models (matched: vlm, vision language models, vlms).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Pretrained vision language models (VLMs) present an opportunity to caption unlabeled 3D objects at scale. The leading approach to summarize VLM descriptions from different views of an object (Luo et al., 2023) relies on a language model (GPT4) to produce the final output. This text-based aggregation is susceptible to hallucinations as it merges potentially contradictory descriptions. We propose an alternative algorithm to marginalize over factors such as the viewpoint that affect the VLM's response. Instead of merging text-only responses, we utilize the VLM's joint image-text likelihoods. We show our probabilistic aggregation is not only more reliable and efficient, but sets the SoTA on inferring object types with respect to human-verified labels. The aggregated annotations are also useful for conditional inference; they improve downstream predictions (e.g., of object material) when the object's type is specified as an auxiliary text-based input. Such auxiliary inputs allow ablating the contribution of visual reasoning over visionless reasoning in an unsupervised setting. With these supervised and unsupervised evaluations, we show how a VLM-based pipeline can be leveraged to produce reliable annotations for 764K objects from the Objaverse dataset.

Claim

Pretrained vision language models (VLMs) present an opportunity to caption unlabeled 3D objects at scale.

Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation Paper
  • Authors: A. Ozdaglar, S. Pattathil, Jiawei Zhang, Kaiqing Zhang
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2212.13861
  • Citations: 11
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Not stated in metadata.

Claim

Not stated in abstract.

Optimistic Planning by Regularized Dynamic Programming Paper
  • Authors: Antoine Moulin, Gergely Neu
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2302.14004
  • Citations: 8
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.

Claim

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure.

BNN-DP: Robustness Certification of Bayesian Neural Networks via Dynamic Programming Paper
  • Authors: Steven Adams, A. Patané, Morteza Lahijanian, L. Laurenti
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2306.10742
  • Citations: 8
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs). Given a compact set of input points \(T\subset \mathbb{R}^n\), BNN-DP computes lower and upper bounds on the BNN's predictions for all the points in \(T\). The framework is based on an interpretation of BNNs as stochastic dynamical systems, which enables the use of Dynamic Programming (DP) algorithms to bound the prediction range along the layers of the network. Specifically, the method uses bound propagation techniques and convex relaxations to derive a backward recursion procedure to over-approximate the prediction range of the BNN with piecewise affine functions. The algorithm is general and can handle both regression and classification tasks. On a set of experiments on various regression and classification tasks and BNN architectures, we show that BNN-DP outperforms state-of-the-art methods by up to four orders of magnitude in both tightness of the bounds and computational efficiency.

Claim

In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs).

Difference of Submodular Minimization via DC Programming Paper
  • Authors: Marwa El Halabi, George Orfanides, Tim Hoheisel
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2305.11046
  • Citations: 7
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Minimizing the difference of two submodular (DS) functions is a problem that naturally occurs in various machine learning problems. Although it is well known that a DS problem can be equivalently formulated as the minimization of the difference of two convex (DC) functions, existing algorithms do not fully exploit this connection. A classical algorithm for DC problems is called the DC algorithm (DCA). We introduce variants of DCA and its complete form (CDCA) that we apply to the DC program corresponding to DS minimization. We extend existing convergence properties of DCA, and connect them to convergence properties on the DS problem. Our results on DCA match the theoretical guarantees satisfied by existing DS algorithms, while providing a more complete characterization of convergence properties. In the case of CDCA, we obtain a stronger local minimality guarantee. Our numerical results show that our proposed algorithms outperform existing baselines on two applications: speech corpus selection and feature selection.

Claim

Minimizing the difference of two submodular (DS) functions is a problem that naturally occurs in various machine learning problems.

Self-Infilling Code Generation Paper
  • Authors: Lin Zheng, Jianbo Yuan, Zhi Zhang, Hongxia Yang, Lingpeng Kong
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2311.17972
  • Citations: 6
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: code generation).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. Our approach capitalizes on the observation that recent infilling-capable code language models can self-infill: whereas infilling operations aim to fill in the middle based on a predefined prefix and suffix, self-infilling sequentially generates both such surrounding context and the infilled content. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding, evolving it into a non-monotonic process. Interruptions allow for postponing the generation of specific code until a definitive suffix is established, enhancing control over the output. Meanwhile, the looping mechanism, which leverages the complementary nature of self-infilling and left-to-right decoding, can iteratively update and synchronize each piece of generation cyclically. Extensive experiments are conducted to demonstrate that our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.

Claim

This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding.

Amortizing Pragmatic Program Synthesis with Rankings Paper
  • Authors: Yewen Pu, Saujas Vaduguru, Priyan Vaithilingam, Elena L. Glassman, Daniel Fried
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2309.03225
  • Citations: 5
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: program synthesis).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

The usage of Rational Speech Acts (RSA) framework has been successful in building pragmatic program synthesizers that return programs which, in addition to being logically consistent with user-generated examples, account for the fact that a user chooses their examples informatively. We present a general method of amortizing the slow, exact RSA synthesizer. Our method first query the exact RSA synthesizer to compile a communication dataset. The dataset contains a number of example-dependent rankings of subsets of programs. It then distills a single global ranking of all programs as an approximation to every ranking in the dataset. This global ranking is then used at inference time to rank multiple logically consistent candidate programs generated from a fast, non-pragmatic synthesizer. Experiments on two program synthesis domains using our ranking method resulted in orders of magnitudes of speed ups compared to the exact RSA synthesizer, while being more accurate than a non-pragmatic synthesizer when communicating with humans. Finally, we prove that in the special case of synthesis from a single example, this approximation is exact.

Claim

The usage of Rational Speech Acts (RSA) framework has been successful in building pragmatic program synthesizers that return programs which, in addition to being logically consistent with user-generated examples, account for the fact that a user chooses their examples informatively.

Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching Paper
  • Authors: Rico Angell, Andrew McCallum
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2312.11801
  • Citations: 5
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

While semidefinite programming (SDP) has traditionally been limited to moderate-sized problems, recent algorithms augmented with matrix sketching techniques have enabled solving larger SDPs. However, these methods achieve scalability at the cost of an increase in the number of necessary iterations, resulting in slower convergence as the problem size grows. Furthermore, they require iteration-dependent parameter schedules that prohibit effective utilization of warm-start initializations important in practical applications with incrementally-arriving data or mixed-integer programming. We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs that can leverage a warm-start initialization to further accelerate convergence. Our proposed algorithm is a spectral bundle method for solving general SDPs containing both equality and inequality constraints. Moveover, when augmented with an optional matrix sketching technique, our algorithm achieves the dramatically improved scalability of previous work while sustaining convergence speed. We empirically demonstrate the effectiveness of our method across multiple applications, with and without warm-starting. For example, USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.

Claim

While semidefinite programming (SDP) has traditionally been limited to moderate-sized problems, recent algorithms augmented with matrix sketching techniques have enabled solving larger SDPs.

Automatically Marginalized MCMC in Probabilistic Programming Paper
  • Authors: Jinlin Lai, Javier Burroni, Hui Guan, D. Sheldon
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2302.00564
  • Citations: 4
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models. The advent of probabilistic programming languages (PPLs) frees users from writing inference algorithms and lets users focus on modeling. However, many models are difficult for HMC to solve directly, and often require tricks like model reparameterization. We are motivated by the fact that many of those models could be simplified by marginalization. We propose to use automatic marginalization as part of the sampling process using HMC in a graphical model extracted from a PPL, which substantially improves sampling from real-world hierarchical models.

Claim

Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models.

Theoretical bounds on the network community profile from low-rank semi-definite programming Paper
  • Authors: Yufan Huang, S. Seshadhri, D. Gleich
  • Year: 2023
  • Venue: International Conference on Machine Learning
  • DOI: 10.48550/arXiv.2303.14550
  • Citations: 4
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

We study a new connection between a technical measure called \(\mu\)-conductance that arises in the study of Markov chains for sampling convex bodies and the network community profile that characterizes size-resolved properties of clusters and communities in social and information networks. The idea of \(\mu\)-conductance is similar to the traditional graph conductance, but disregards sets with small volume. We derive a sequence of optimization problems including a low-rank semi-definite program from which we can derive a lower bound on the optimal \(\mu\)-conductance value. These ideas give the first theoretically sound bound on the behavior of the network community profile for a wide range of cluster sizes. The algorithm scales up to graphs with hundreds of thousands of nodes and we demonstrate how our framework validates the predicted structures of real-world graphs.

Claim

We study a new connection between a technical measure called \(\mu\)-conductance that arises in the study of Markov chains for sampling convex bodies and the network community profile that characterizes size-resolved properties of clusters and communities in social and information networks.