Analyzing optimization landscape of recent policy optimization methods in deep RL
Abstract
In this work we will analyze control variates and baselines in policy optimization
methods in deep reinforcement learning (RL). Recently there has been a lot of
progress in policy gradient methods in deep RL, where baselines are typically used
for variance reduction. However, there has been recent progress on the mirage of
state and state-action dependent baselines in policy gradients. To this end, it is
not clear how control variates play a role in the optimization landscape of policy
gradients.
This work will dive into understanding the landscape issues of policy optimization,
to see whether control variates are only for variance reduction or whether they play
a role in smoothing out the optimization landscape. Our work will further investigate
the issues of different optimizers used in deep RL experiments, and ablation
studies of the interplay of control variates and optimizers in policy gradients from
an optimization perspective.