skip to main content

Linde Institute/Social and Information Sciences Laboratory (SISL) Seminar

Friday, May 9, 2014
12:00pm to 1:00pm
Add to Cal
Baxter 127
The Art of Optimizing via Simulations: Empirical Dynamic Programming
Rahul Jain, Associate Professor, Kenneth C. Dahlberg Early Career Chair, Viterbi School of Engineering, University of Southern California,

In this talk, I will talk about a natural framework for simulation-based optimization and control of Markov decision process (MDP) models. The idea is very simple: Replace the Bellman operator by its `empirical' variant wherein Expectation is replaced by a sample average approximation. This leads to a random Bellman operator in the dynamic programming equations. We introduce several notions of probabilistic fixed points of such random operators, and show their asymptotic equivalence.  We establish convergence of empirical Value and Policy Iteration algorithms by a stochastic dominance argument. The mathematical technique introduced is useful for analyzing other iterated random operators (than just the empirical Bellman operator), and may also be useful in random matrix theory. The idea can be generalized to asynchronous dynamic programming, and is also useful for computing equilibria of zero-sum stochastic games. Preliminary numerical results show better convergence rate than stochastic approximation/reinforcement learning schemes. If time permits, I will also briefly talk about Blackwell's approachability for MDPs and Stochastic games. In particular, a learning scheme for approachability in MDPs and games.

This is joint work with Dileep Kalathil, William Haskell and Vivek Borkar.

For more information, please contact Jenny Niese by phone at Ext. 6010 or by email at [email protected].