Publications

Video Killed the HD-Map: Predicting Driving Behavior Directly From Drone Images

Published in CVPR 2023 Workshop-2023, 2023

When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a substantial improvement resulting from improving the object placement distribution.

Recommended citation: Setareh, Dabiri, et al. "Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models" CVPR 2023 Workshop 2023. https://arxiv.org/abs/2305.14621

Video Killed the HD-Map: Predicting Driving Behavior Directly From Drone Images

Published in IEEE-ITSC-2023, 2023

The development of algorithms that learn behavioral driving models using human demonstrations has led to increasingly realistic simulations. In general, such models learn to jointly predict trajectories for all controlled agents by exploiting road context information such as drivable lanes obtained from manually annotated high-definition (HD) maps. Recent studies show that these models can greatly benefit from increasing the amount of human data available for training. However, the manual annotation of HD maps which is necessary for every new location puts a bottleneck on efficiently scaling up human traffic datasets. We propose a drone birdview image-based map (DBM) representation that requires minimal annotation and provides rich road context information. We evaluate multi-agent trajectory prediction using the DBM by incorporating it into a differentiable driving simulator as an image-texture-based differentiable rendering module. Our results demonstrate competitive multi-agent trajectory prediction performance when using our DBM representation as compared to models trained with rasterized HD maps.

Recommended citation: Liu, Yunpeng, et al. "Video Killed the HD-Map: Predicting Driving Behavior Directly From Drone Images" IEEE-ITSC 2023. https://arxiv.org/abs/2305.11856

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

Published in Arxiv-2023, 2023

We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension n. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require O(n^2) time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only O(nlogn) time.

Recommended citation: Ramesh, Amrutha, et al. "Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm" Arxiv 2023. https://arxiv.org/abs/2307.01169

Conditional Permutation Invariant Flows

Published in TMLR-2023, 2023

We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including (1) complex traffic scene generation conditioned on visually specified map information, and (2) object bounding box generation conditioned directly on images. We train our model by maximizing the expected likelihood of labeled conditional data under our flow, with the aid of a penalty that ensures the dynamics are smooth and hence efficiently solvable. Our method significantly outperforms non-permutation invariant baselines in terms of log likelihood and domain-specific metrics (offroad, collision, and combined infractions), yielding realistic samples that are difficult to distinguish from real data.

Recommended citation: Berend, Zwartsenberg, et al. "Conditional Permutation Invariant Flows." TMLR 2023. https://openreview.net/forum?id=DUsgPi3oCC

A Diffusion-Model of Joint Interactive Navigation

Published in NuerIPS-2023, 2023

Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN – a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.

Recommended citation: Niedoba, Matthew, et al. "A Diffusion-Model of Joint Interactive Navigation." NuerIPS 2023. https://openreview.net/pdf?id=2yXExAl0FW

Noise Is Not the Main Factor Behind the Gap Between Sgd and Adam on Transformers, But Sign Descent Might Be

Published in ICLR-2023, 2023

The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding of this discrepancy is lagging, preventing the development of significant improvements on either algorithm. Recent work advances the hypothesis that Adam and other heuristics like gradient clipping outperform SGD on language tasks because the distribution of the error induced by sampling has heavy tails. This suggests that Adam outperform SGD because it uses a more robust gradient estimate. We evaluate this hypothesis by varying the batch size, up to the entire dataset, to control for stochasticity. We present evidence that stochasticity and heavy-tailed noise are not major factors in the performance gap between SGD and Adam. Rather, Adam performs better as the batch size increases, while SGD is less effective at taking advantage of the reduction in noise. This raises the question as to why Adam outperforms SGD in the full-batch setting. Through further investigation of simpler variants of SGD, we find that the behavior of Adam with large batches is similar to sign descent with momentum.

Recommended citation: Kunstner, Frederik, et al. "Noise Is Not the Main Factor Behind the Gap Between Sgd and Adam on Transformers, But Sign Descent Might Be" https://www.proquest.com/openview/da3b1b61b3fd2f0ca38a3418172867d9/1?pq-origsite=gscholar&cbl=18750

Target-based Surrogates for Stochastic Optimization

Published in ICML-2023, 2023

We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation learning and adversarial training. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a target space (e.g. the logits output by a linear model for classification) that can be minimized efficiently. This allows for multiple parameter updates to the model, amortizing the cost of gradient computation. In the full-batch setting, we prove that our surrogate is a global upper-bound on the loss, and can be (locally) minimized using a black-box optimization algorithm. We prove that the resulting majorization-minimization algorithm ensures convergence to a stationary point of the loss. Next, we instantiate our framework in the stochastic setting and propose the SSO algorithm, which can be viewed as projected stochastic gradient descent in the target space. This connection enables us to prove theoretical guarantees for SSO when minimizing convex functions. Our framework allows the use of standard stochastic optimization algorithms to construct surrogates which can be minimized by any deterministic optimization method. To evaluate our framework, we consider a suite of supervised learning and imitation learning problems. Our experiments indicate the benefits of target optimization and the effectiveness of SSO.

Recommended citation: Lavington, Jonathan, et al. "Target-based Surrogates for Stochastic Optimization." ICML 2023. https://proceedings.mlr.press/v202/lavington23a/lavington23a.pdf

Critic Sequential Monte Carlo

Published in ICLR-2023, 2023

We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard constraints placed sparsely in time. Compared with previous work, we modify the placement of such heuristic factors, which allows us to cheaply propose and evaluate large numbers of putative action particles, greatly increasing inference and planning efficiency. CriticSMC is compatible with informative priors, whose density function need not be known, and can be used as a model-free control algorithm. Our experiments on collision avoidance in a high-dimensional simulated driving task show that CriticSMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios.

Recommended citation: Lioutas, Vasileios, et al. "Critic Sequential Monte Carlo." ICLR 2023. https://openreview.net/pdf?id=ObtGcyKmwna

Vehicle Type Specific Waypoint Generation

Published in IROS-2022, 2022

We develop a generic mechanism for generating vehicle-type specific sequences of waypoints from a probabilistic foundation model of driving behavior. Many foundation behavior models are trained on data that does not include vehicle information, which limits their utility in downstream applications such as planning. Our novel methodology conditionally specializes such a behavior predictive model to a vehicle-type by utilizing byproducts of the reinforcement learning algorithms used to produce vehicle specific controllers. We show how to compose a vehicle specific value function estimate with a generic probabilistic behavior model to generate vehicle-type specific waypoint sequences that are more likely to be physically plausible then their vehicle-agnostic counterparts.

Recommended citation: Liu, Yunpeng, et al. "Vehicle Type Specific Waypoint Generation." IROS 2022. https://ieeexplore.ieee.org/abstract/document/9981421

Improved Policy Optimization for Online Imitation Learning

Published in CoLLAs-2022, 2022

We consider online imitation learning (OIL), where the task is to find a policy that imitates the behavior of an expert via active interaction with the environment. We aim to bridge the gap between the theory and practice of policy optimization algorithms for OIL by analyzing one of the most popular OIL algorithms, DAGGER. Specifically, if the class of policies is sufficiently expressive to contain the expert policy, we prove that DAGGER achieves constant regret. Unlike previous bounds that require the losses to be strongly-convex, our result only requires the weaker assumption that the losses be strongly-convex with respect to the policy’s sufficient statistics (not its parameterization). In order to ensure convergence for a wider class of policies and losses, we augment DAGGER with an additional regularization term. In particular, we propose a variant of Follow-the-Regularized-Leader (FTRL) and its adaptive variant for OIL and develop a memory-efficient implementation, which matches the memory requirements of FTL. Assuming that the loss functions are smooth and convex with respect to the parameters of the policy, we also prove that FTRL achieves constant regret for any sufficiently expressive policy class, while retaining O(sqrt(T))regret in the worst-case. We demonstrate the effectiveness of these algorithms with experiments on synthetic and high-dimensional control tasks. Download paper here

Recommended citation: Lavington, Jonathan, et al. "Improved Policy Optimization for Online Imitation Learning." CoLLAs 2022. https://proceedings.mlr.press/v199/lavington22a/lavington22a.pdf

A Closer Look at Gradient Estimators with Reinforcement Learning as Inference

Published in Nuerips-2021 Deep RL Workshop, 2021

The concept of reinforcement learning as inference (RLAI) has led to the creation of a variety of popular algorithms in deep reinforcement learning. Unfortunately, most research in this area relies on wider algorithmic innovations not necessarily relevant to such frameworks. Additionally, many seemingly unimportant modifications made to these algorithms, actually produce inconsistencies with the original inference problem posed by RLAI. Taking a divergence minimization perspective, this work considers some of the practical merits and theoretical issues created by the choice of loss function minimized in the policy update in off-policy reinforcement learning. Our results show that while the choice of divergence rarely has a major affect on the sample efficiency of the algorithm, it can have important practical repercussions on ease of implementation, computational efficiency, and restrictions to the distribution over actions.

Recommended citation: Lavington, Jonathan, et al. "A Closer Look at Gradient Estimators with Reinforcement Learning as Inference." https://openreview.net/pdf?id=bR0K-nz1-6p

Robust Asymmetric Learning in POMDPs

Published in ICML-2021, 2021

Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial information. We derive an objective to instead train the expert to maximize the expected reward of the imitating agent policy, and use it to construct an efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and the agent. We show that A2D produces an expert policy that the agent can safely imitate, in turn outperforming policies learned by imitating a fixed expert.

Recommended citation: Warrington, Andrew, et al. "Robust Asymmetric Learning in POMDPs." ICML 2021. https://proceedings.mlr.press/v139/warrington21a.html

An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control

Published in NuerIPS Workshop 2021, 2021

Off-policy reinforcement learning (RL) algorithms can take advantage of samples generated from all previous interactions with the environment through “experience replay”. Such methods outperform almost all on-policy and model-based alternatives in complex tasks where a structured or well parameterized model of the world does not exist. This makes them desirable for practitioners who lack domain specific knowledge, but who still require high sample efficiency. However this high performance can come at a cost. Because of additional hyperparameters introduced to efficiently learn function approximators, off-policy RL can perform poorly on new problems. To address parameter sensitivity, we show how the correct choice of non-uniform sampling for experience replay can stabilize model performance under varying environmental conditions and hyper-parameters.

Recommended citation: Nicholas, Ioannidis, et al. "An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control" NuerIPS Workshop 2021. https://openreview.net/pdf?id=KvDedKtOX7B

A Probabilistic Modeling Approach to CRISPR-Cas9

Published in Masters Thesis, University of Colorado at Boulder, Department of Applied Mathematics, 2018

CRISPR-Cas, a particular type of microbial immune response system, has in recent years been modified to make precise changes to an organisms DNA. In the early 2000s scientists discovered through the study of Streptococcus pyogenes, that a unique CRISPR locus (Cas9) exhibited specific RNA-guided cleavage near short trinucleotide motifs (PAMs). Further research on Cas9 eventually led researchers to create methods that actively edit genomes through Cas9-dependent cleavage and to manipulate transcription of genes through engineered nuclease-deficient Cas9 (dCas9). These techniques have enabled new avenues for analyzing existing gene functions or engineering new ones, manipulating gene expression, gene therapy, and much more. While great strides have been made over the last decade, CRISPR is still prone to inaccuracies which often generate sub-optimal editing efficiency or off-target effects. The primary interest of this thesis is the investigation of targeting efficiency concerning changes in the guide RNA (gRNA) composition. While many different factors affect the ability with which a given gRNA can target a DNA sequence, we have focused our research primarily on the formation of the R-loop: the hybrid structure formed when the Cas9/dCas9:gRNA complex binds to a host DNA site. In our investigation, we have attempted to account for several experimental findings reported in the literature as influential for binding efficiency. These include position dependence, base pair composition dependence, and the effects of runs of consecutive mismatches. Using a Gambler’s Ruin Markov model to mimic the process of R-loop formation, we fit our model to experimental data and show that the match/mismatch configuration between the gRNA and the DNA target allows for accurate predictions of R-loop formation in bacteria

Recommended citation: Lavington, Jonathan, et al. "A Probabilistic Modeling Approach to CRISPR-Cas9" https://www.proquest.com/openview/da3b1b61b3fd2f0ca38a3418172867d9/1?pq-origsite=gscholar&cbl=18750