Analyzing optimization landscape of recent policy optimization methods in deep RL

Khan, Mahir Asaf; Ashraf, Adib; Amin, Tahmid Adib

dc.contributor.advisor	Rashid, Warida
dc.contributor.advisor	Islam, Riashat
dc.contributor.author	Khan, Mahir Asaf
dc.contributor.author	Ashraf, Adib
dc.contributor.author	Amin, Tahmid Adib
dc.date.accessioned	2023-05-23T04:43:23Z
dc.date.available	2023-05-23T04:43:23Z
dc.date.copyright	2022
dc.date.issued	2022-05
dc.identifier.other	ID 22141075
dc.identifier.other	ID 20241063
dc.identifier.other	ID 22141076
dc.identifier.uri	http://hdl.handle.net/10361/18306
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 42-43).
dc.description.abstract	In this work we will analyze control variates and baselines in policy optimization methods in deep reinforcement learning (RL). Recently there has been a lot of progress in policy gradient methods in deep RL, where baselines are typically used for variance reduction. However, there has been recent progress on the mirage of state and state-action dependent baselines in policy gradients. To this end, it is not clear how control variates play a role in the optimization landscape of policy gradients. This work will dive into understanding the landscape issues of policy optimization, to see whether control variates are only for variance reduction or whether they play a role in smoothing out the optimization landscape. Our work will further investigate the issues of different optimizers used in deep RL experiments, and ablation studies of the interplay of control variates and optimizers in policy gradients from an optimization perspective.	en_US
dc.description.statementofresponsibility	Mahir Asaf Khan
dc.description.statementofresponsibility	Adib Ashraf
dc.description.statementofresponsibility	Tahmid Adib Amin
dc.format.extent	43 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Optimization landscape	en_US
dc.subject	Policy optimization	en_US
dc.subject	Deep reinforcement learning	en_US
dc.subject	Variance reduction	en_US
dc.subject	Control variates	en_US
dc.subject.lcsh	Cognitive learning theory
dc.subject.lcsh	Machine learning
dc.title	Analyzing optimization landscape of recent policy optimization methods in deep RL	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B. Computer Science

Files in this item

Name:: 22141075, 20241063, 22141076_C ...
Size:: 1.511Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record