Aritra Mitra, North Carolina State University
Towards robust collaborative reinforcement learning

April 8, 2024, 2pm; EEB 132

Host: Lars Lindemann

Abstract

Modern large-scale learning paradigms rely on leveraging data from multiple agents to improve performance. However, to reap the benefits of more data, one must account for the communication bottleneck created by noisy and lossy finite data-rate channels that can introduce delays. While the theme of communication efficiency has been well-explored in the context of supervised learning, not much is known in this regard when it comes to sequential decision-making under uncertainty. In this talk, I will share some recent results that help to bridge the above gap.

We will start by providing a simple, new finite-time analysis for the celebrated Temporal Difference (TD) learning algorithm with linear function approximation for solving the policy evaluation problem. The core of our argument is to use induction to establish uniform boundedness (in expectation) of the iterates generated by TD Learning. Building on the new inductive proof technique, we will then provide the first finite-time analysis of contractive stochastic approximation schemes with time-varying delays under Markovian sampling. Notably, our bound is tight in its dependence on both the maximum delay and the mixing time of the underlying Markov chain.

In the second half of the talk, we will turn our attention to model-free control over rate-limited channels by studying a setting where a worker agent transmits quantized policy gradients of the LQR cost to a server over a noiseless channel with a finite bit rate. For this setting, we will propose a new algorithm titled Adaptively Quantized Gradient Descent (AQGD), and prove that above a certain finite threshold bit-rate, AQGD guarantees exponentially fast convergence to the globally optimal policy, with no deterioration of the exponent relative to the unquantized setting. More generally, our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization. Overall, our work contributes toward facilitating multi-agent reinforcement learning in harsh, communication-constrained environments.

Biosketch

Aritra Mitra is currently an Assistant Professor at the Department of Electrical and Computer Engineering at North Carolina State University. His research interests include control theory, optimization, statistical signal processing, machine learning, and distributed algorithms. Previously, he was a Postdoctoral Researcher at the University of Pennsylvania from 2020 to 2022. Prior to that, he received his Ph.D. degree from Purdue University in 2020, his M. Tech degree from the Indian Institute of Technology Kanpur in 2015, and his B.E. degree from Jadavpur University in 2013, all in Electrical Engineering. He was a recipient of the University Gold Medal at Jadavpur University and the Academic Excellence Award at IIT Kanpur.

Acknowledgement: seminar series is supported by the Ming Hsieh Institute and Quanser.