dc.contributor.advisor | Rhaman, Md. Khalilur | |
dc.contributor.advisor | Mukta, Jannatun Noor | |
dc.contributor.author | Bhuiyan, Abir Ahammed | |
dc.contributor.author | Neha, Samiha Afaf | |
dc.contributor.author | Khan, Md. Ishrak | |
dc.date.accessioned | 2024-05-15T04:34:03Z | |
dc.date.available | 2024-05-15T04:34:03Z | |
dc.date.copyright | ©2024 | |
dc.date.issued | 2024-01 | |
dc.identifier.other | ID: 20101197 | |
dc.identifier.other | ID: 20101266 | |
dc.identifier.other | ID: 20101051 | |
dc.identifier.uri | http://hdl.handle.net/10361/22830 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 51-54). | |
dc.description.abstract | microbial ecosystems. This has led to their increased utilization in several research
areas, such as bacterial genome engineering, phage therapy, disease diagnostics, and
viral host identification. The structure of phages is made up of proteins called phage
virion proteins (PVP). Classifying these proteins is important for genomic research,
which in turn helps us understand the complex interactions between phages and their
hosts in the context of making antibacterial drugs. Replacing the tedious traditional
procedures, a growing number of computational strategies are being employed to annotate
phage protein sequences acquired using high-throughput sequencing. Among
these techniques, deep learning approaches demonstrate improved performance in
classification outcomes. Such procedures require special sequence encodings for the
model to perceive the protein sequences with their distinctive features. Numerous
ways have been examined and assessed, while novel methods continue to emerge in
order to optimize the task in terms of resource utilization and prediction accuracy.
The objective of our work, ProteoKnight, is to explore and develop a unique encoding
technique for phage proteins and demonstrate its effectiveness via classification. In
our work, we make use of the time-separated PVP dataset that [47] introduced. Furthermore,
this study aims to address the lack of research conducted on uncertainty
analysis by exploring the domain of uncertainty in binary PVP classification using
Monte Carlo Dropout (MCD) method. The experimental findings demonstrate the
effectiveness of our strategy for binary classification, achieving a prediction accuracy
of 90.2%. However, the accuracy for multi-class classification remains suboptimal.
Furthermore, our uncertainty analysis reveals that the class and sequence length
show variability in prediction confidence for our suggested classification approach. | en_US |
dc.description.statementofresponsibility | Abir Ahammed Bhuiyan | |
dc.description.statementofresponsibility | Samiha Afaf Neha | |
dc.description.statementofresponsibility | Md. Ishrak Khan | |
dc.format.extent | 68 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Phage virion | en_US |
dc.subject | Deep learning | en_US |
dc.subject | DNA-walk | en_US |
dc.subject | Monte Carlo dropout | en_US |
dc.subject | Convolutional neural network (CNN) | en_US |
dc.subject.lcsh | Neural networks (Computer science) | |
dc.subject.lcsh | Deep learning (Machine learning) | |
dc.title | ProteoKnight: phage virion protein classification with CNN and uncertainty quantification | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B.Sc. in Computer Science | |