dc.contributor.advisor | Rashid, Warida | |
dc.contributor.advisor | Islam, Riashat | |
dc.contributor.author | Khan, Mahir Asaf | |
dc.contributor.author | Ashraf, Adib | |
dc.contributor.author | Amin, Tahmid Adib | |
dc.date.accessioned | 2023-05-23T04:43:23Z | |
dc.date.available | 2023-05-23T04:43:23Z | |
dc.date.copyright | 2022 | |
dc.date.issued | 2022-05 | |
dc.identifier.other | ID 22141075 | |
dc.identifier.other | ID 20241063 | |
dc.identifier.other | ID 22141076 | |
dc.identifier.uri | http://hdl.handle.net/10361/18306 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 42-43). | |
dc.description.abstract | In this work we will analyze control variates and baselines in policy optimization
methods in deep reinforcement learning (RL). Recently there has been a lot of
progress in policy gradient methods in deep RL, where baselines are typically used
for variance reduction. However, there has been recent progress on the mirage of
state and state-action dependent baselines in policy gradients. To this end, it is
not clear how control variates play a role in the optimization landscape of policy
gradients.
This work will dive into understanding the landscape issues of policy optimization,
to see whether control variates are only for variance reduction or whether they play
a role in smoothing out the optimization landscape. Our work will further investigate
the issues of different optimizers used in deep RL experiments, and ablation
studies of the interplay of control variates and optimizers in policy gradients from
an optimization perspective. | en_US |
dc.description.statementofresponsibility | Mahir Asaf Khan | |
dc.description.statementofresponsibility | Adib Ashraf | |
dc.description.statementofresponsibility | Tahmid Adib Amin | |
dc.format.extent | 43 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Optimization landscape | en_US |
dc.subject | Policy optimization | en_US |
dc.subject | Deep reinforcement learning | en_US |
dc.subject | Variance reduction | en_US |
dc.subject | Control variates | en_US |
dc.subject.lcsh | Cognitive learning theory | |
dc.subject.lcsh | Machine learning | |
dc.title | Analyzing optimization landscape of recent policy optimization methods in deep RL | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B. Computer Science | |