ML + Vision Top-6 Agent Survey - NeurIPS 2023 - Page 2 of 2

  • Venue: Neural Information Processing Systems
  • Year: 2023
  • Page: 2 / 2
  • Papers: 31-40 / 40
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models Paper
  • Authors: Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.52202/079017-3185
  • Citations: 11
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: vision-language models (matched: vision language models, large vision language models, lvlms).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

As Archimedes famously said, “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world”, in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters. Specifically, we use this tiny Lever-LM to configure effective in-context demonstration (ICD) sequences to improve the In-Context Learinng (ICL) performance of LVLMs. Previous studies show that diverse ICD configurations like the selection and ordering of the demonstrations heavily affect the ICL performance, highlighting the significance of configuring effective ICD sequences. Motivated by this and by re-considering the the process of configuring ICD sequence, we find this is a mirror process of human sentence composition and further assume that effective ICD configurations may contain internal statistical patterns that can be captured by Lever-LM. Then a dataset with effective ICD sequences is constructed to train Lever-LM. After training, given novel queries, new ICD sequences are configured by the trained Lever-LM to solve vision-language tasks through ICL. Experiments show that these ICD sequences can improve the ICL performance of two LVLMs compared with some strong baselines in Visual Question Answering and Image Captioning, validating that Lever-LM can really capture the statistical patterns for levering LVLMs. The code is available at \url{https://github.com/ForJadeForest/Lever-LM}.

Claim

As Archimedes famously said, “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world”, in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters.

Maximum Independent Set: Self-Training through Dynamic Programming Paper
  • Authors: L. Brusca, Lars C.P.M. Quaedvlieg, Stratis Skoulakis, Grigorios G. Chrysos, V. Cevher
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.48550/arXiv.2310.18672
  • Citations: 10
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call. To train our algorithm, we require annotated comparisons of different graphs concerning their MIS size. Annotating the comparisons with the output of our algorithm leads to a self-training process that results in more accurate self-annotation of the comparisons and vice versa. We provide numerical evidence showing the superiority of our method vs prior methods in multiple synthetic and real-world datasets.

Claim

This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP).

Generalization Bound and Learning Methods for Data-Driven Projections in Linear Programming Paper
  • Authors: Shinsaku Sakaue, Taihei Oki
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.52202/079017-0408
  • Citations: 9
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

How to solve high-dimensional linear programs (LPs) efficiently is a fundamental question. Recently, there has been a surge of interest in reducing LP sizes using random projections, which can accelerate solving LPs independently of improving LP solvers. This paper explores a new direction of data-driven projections, which use projection matrices learned from data instead of random projection matrices. Given training data of \(n\)-dimensional LPs, we learn an \(n\times k\) projection matrix with \(n>k\). When addressing a future LP instance, we reduce its dimensionality from \(n\) to \(k\) via the learned projection matrix, solve the resulting LP to obtain a \(k\)-dimensional solution, and apply the learned matrix to it to recover an \(n\)-dimensional solution. On the theoretical side, a natural question is: how much data is sufficient to ensure the quality of recovered solutions? We address this question based on the framework of data-driven algorithm design, which connects the amount of data sufficient for establishing generalization bounds to the pseudo-dimension of performance metrics. We obtain an \(\tilde{\mathrm{O}}(nk^2)\) upper bound on the pseudo-dimension, where \(\tilde{\mathrm{O}}\) compresses logarithmic factors. We also provide an \(\Omega(nk)\) lower bound, implying our result is tight up to an \(\tilde{\mathrm{O}}(k)\) factor. On the practical side, we explore two simple methods for learning projection matrices: PCA- and gradient-based methods. While the former is relatively efficient, the latter can sometimes achieve better solution quality. Experiments demonstrate that learning projection matrices from data is indeed beneficial: it leads to significantly higher solution quality than the existing random projection while greatly reducing the time for solving LPs.

Claim

How to solve high-dimensional linear programs (LPs) efficiently is a fundamental question.

Structured Semidefinite Programming for Recovering Structured Preconditioners Paper
  • Authors: Arun Jambulapati, Jerry Li, Christopher Musco, Kirankumar Shiragur, Aaron Sidford, Kevin Tian
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.48550/arXiv.2310.18265
  • Citations: 8
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite \(\mathbf{K} \in \mathbb{R}^{d \times d}\) with \(\mathrm{nnz}(\mathbf{K})\) nonzero entries, computes an \(\epsilon\)-optimal diagonal preconditioner in time \(\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1}))\), where \(\kappa^\star\) is the optimal condition number of the rescaled matrix. We give an algorithm which, given \(\mathbf{M} \in \mathbb{R}^{d \times d}\) that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in \(\mathbf{M}\) in \(\widetilde{O}(d^2)\) time. Our diagonal preconditioning results improve state-of-the-art runtimes of \(\Omega(d^{3.5})\) attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of \(\Omega(d^{\omega})\) where \(\omega>2.3\) is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.

Claim

We develop a general framework for finding approximately-optimal preconditioners for solving linear systems.

Trading-off price for data quality to achieve fair online allocation Paper
  • Authors: Mathieu Molina, Nicolas Gast, P. Loiseau, Vianney Perchet
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.48550/arXiv.2306.13440
  • Citations: 7
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: alpha factor search (matched: trading).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead they can purchase data that help estimate them from sources of different quality; and hence reduce the fairness penalty at some cost. We model this problem as a multi-armed bandit problem where each arm corresponds to the choice of a data source, coupled with the online allocation problem. We propose an algorithm that jointly solves both problems and show that it has a regret bounded by \(\mathcal{O}(\sqrt{T})\). A key difficulty is that the rewards received by selecting a source are correlated by the fairness penalty, which leads to a need for randomization (despite a stochastic setting). Our algorithm takes into account contextual information available before the source selection, and can adapt to many different fairness notions. We also show that in some instances, the estimates used can be learned on the fly.

Claim

We consider the problem of online allocation subject to a long-term fairness penalty.

A Heavy-Tailed Algebra for Probabilistic Programming Paper
  • Authors: Feynman T. Liang, Liam Hodgkinson, Michael W. Mahoney
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.48550/arXiv.2306.09262
  • Citations: 4
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler. To characterize how the tails change under various operations, we develop an algebra which acts on a three-parameter family of tail asymptotics and which is based on the generalized Gamma distribution. Our algebraic operations are closed under addition and multiplication; they are capable of distinguishing sub-Gaussians with differing scales; and they handle ratios sufficiently well to reproduce the tails of most important statistical distributions directly from their definitions. Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.

Claim

Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated.

UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models Paper
  • Authors: Xin Li, Sima Behpour, T. Doan, Wenbin He, Liangke Gou, Liu Ren
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.48550/arXiv.2307.11227
  • Citations: 4
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: vision-language models (matched: vision language models, vision language model).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget. Previous approaches to data pre-selection relied solely on visual features extracted from foundation models, such as CLIP and BLIP-2, but largely ignored the powerfulness of text features. In this work, we argue that, with proper design, the joint feature space of both vision and text can yield a better representation for data pre-selection. To this end, we introduce UP-DP, a simple yet effective unsupervised prompt learning approach that adapts vision-language models, like BLIP-2, for data pre-selection. Specifically, with the BLIP-2 parameters frozen, we train text prompts to extract the joint features with improved representation, ensuring a diverse cluster structure that covers the entire dataset. We extensively compare our method with the state-of-the-art using seven benchmark datasets in different settings, achieving up to a performance gain of 20%. Interestingly, the prompts learned from one dataset demonstrate significant generalizability and can be applied directly to enhance the feature extraction of BLIP-2 from other datasets. To the best of our knowledge, UP-DP is the first work to incorporate unsupervised prompt learning in a vision-language model for data pre-selection.

Claim

In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget.

Enhancing Robot Program Synthesis Through Environmental Context Paper
  • Authors: Tianyi Chen, Qidi Wang, Zhen Dong, Liwei Shen, Xin Peng
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.48550/arXiv.2312.08250
  • Citations: 4
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming, program synthesis).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

Program synthesis aims to automatically generate an executable program that conforms to the given specification. Recent advancements have demonstrated that deep neural methodologies and large-scale pretrained language models are highly proficient in capturing program semantics. For robot programming, prior works have facilitated program synthesis by incorporating global environments. However, the assumption of acquiring a comprehensive understanding of the entire environment is often excessively challenging to achieve. In this work, we present a framework that learns to synthesize a program by rectifying potentially erroneous code segments, with the aid of partially observed environments. To tackle the issue of inadequate attention to partial observations, we propose to first learn an environment embedding space that can implicitly evaluate the impacts of each program token based on the precondition. Furthermore, by employing a graph structure, the model can aggregate both environmental and syntactic information flow and furnish smooth program rectification guidance. Extensive experimental evaluations and ablation studies on the partially observed VizDoom domain authenticate that our method offers superior generalization capability across various tasks and greater robustness when encountering noises.

Claim

Program synthesis aims to automatically generate an executable program that conforms to the given specification.

Human spatiotemporal pattern learning as probabilistic program synthesis Paper
  • Authors: Tracey Mills, Josh Tenenbaum, Samuel J. Cheyette
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.52202/075280-2367
  • Citations: 4
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming, program synthesis).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

People are adept at learning a wide variety of structured patterns from small amounts of data, presenting a conundrum from the standpoint of the bias-variance tradeoff: what kinds of representations and algorithms support the joint flexibility and data-paucity of human learning? One possibility is that people “learn by programming”: inducing probabilistic models to fit observed data. Here, we experimentally test human learning in the domain of structured 2-dimensional patterns, using a task in which participants repeatedly predicted where a dot would move based on its previous trajectory. We evaluate human performance against standard parametric and non-parametric time-series models, as well as two Bayesian program synthesis models whose hypotheses vary in their degree of structure: a compositional Gaussian Process model and a structured “Language of Thought” (LoT) model. We find that signatures of human pattern learning are best explained by the LoT model, supporting the idea that the flexibility and data-efficiency of human structure learning can be understood as probabilistic inference over an expressive space of programs.

Claim

People are adept at learning a wide variety of structured patterns from small amounts of data, presenting a conundrum from the standpoint of the bias-variance tradeoff: what kinds of representations and algorithms support the joint flexibility and data-paucity of human learning? One possibility is that people “learn by programming”: inducing probabilistic models to fit observed data.

QuadAttack: A Quadratic Programming Approach to Ordered Top-K Attacks Paper
  • Authors: Thomas Paniagua, Ryan Grainger, Tianfu Wu
  • Year: 2023
  • Venue: Neural Information Processing Systems
  • DOI: 10.48550/arXiv.2312.11510
  • Citations: 0
  • Relevance: 3 / 5
  • Why selected: Heuristic keyword/alias matches: program synthesis (matched: programming).
  • Code: Not found.
  • Extraction: method/data pending

Abstract

The adversarial vulnerability of Deep Neural Networks (DNNs) has been well-known and widely concerned, often under the context of learning top-\(1\) attacks (e.g., fooling a DNN to classify a cat image as dog). This paper shows that the concern is much more serious by learning significantly more aggressive ordered top-\(K\) clear-box \footnote{ This is often referred to as white/black-box attacks in the literature. We choose to adopt neutral terminology, clear/opaque-box attacks in this paper, and omit the prefix clear-box for simplicity.} targeted attacks proposed in Adversarial Distillation. We propose a novel and rigorous quadratic programming (QP) method of learning ordered top-\(K\) attacks with low computing cost, dubbed as QuadAttac\(K\). Our QuadAttac\(K\) directly solves the QP to satisfy the attack constraint in the feature embedding space (i.e., the input space to the final linear classifier), which thus exploits the semantics of the feature embedding space (i.e., the principle of class coherence). With the optimized feature embedding vector perturbation, it then computes the adversarial perturbation in the data space via the vanilla one-step back-propagation. In experiments, the proposed QuadAttac\(K\) is tested in the ImageNet-1k classification using ResNet-50, DenseNet-121, and Vision Transformers (ViT-B and DEiT-S). It successfully pushes the boundary of successful ordered top-\(K\) attacks from \(K=10\) up to \(K=20\) at a cheap budget (\(1\times 60\)) and further improves attack success rates for \(K=5\) for all tested models, while retaining the performance for \(K=1\).

Claim

The adversarial vulnerability of Deep Neural Networks (DNNs) has been well-known and widely concerned, often under the context of learning top-\(1\) attacks (e.g., fooling a DNN to classify a cat image as dog).