Show simple item record

dc.contributor.advisorSadeque, Farig Yousuf
dc.contributor.authorAdib, Quazi Adibur Rahman
dc.contributor.authorAlam, Sanjana Binte
dc.date.accessioned2024-05-26T02:57:33Z
dc.date.available2024-05-26T02:57:33Z
dc.date.copyright©2024
dc.date.issued2024-01
dc.identifier.otherID: 21241056
dc.identifier.otherID: 20301455
dc.identifier.urihttp://hdl.handle.net/10361/22910
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 44-47).
dc.description.abstractDespite significant improvements in the general-purpose text summarization task in the past decade, clinical conversion summarization is going through a tough time due to a lack of initiative to provide open-source datasets to the NLP community. In this work, we are presenting the first long and short Bangla Clinical Dialogue to Note Summarization datasets: BnClinical-Sum. Long conversations are detailed conversations with additional medical history. For the long dialogue dataset, we have accumulated around 207 pairs of full conversations and notes. Each note consists of in-depth discussions on previous medical histories, family medical records, and a wide variety of other topics. For the short dialogue version, our dataset consists of 1701 real-life short manually translated clinical conversations and their corresponding notes. The short dialogue dataset consists of subsets of long dialogue where each dialogue snippet addresses one sub-topic like previous medical histories, family medical records, etc. Those conversations are from 20 different categories like labs, assessments, plans, etc. Owing to demonstrating the efficacy of both datasets, we have trained our datasets on current state-of-the-art text summarization and text-to-text generative models to provide a solid benchmark for clinical conversion summarization tasks.en_US
dc.description.statementofresponsibilityQuazi Adibur Rahman Adib
dc.description.statementofresponsibilitySanjana Binte Alam
dc.format.extent60 pages
dc.language.isoenen_US
dc.publisherBrac Universityen_US
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectClinicalNLPen_US
dc.subjectmBARTen_US
dc.subjectDialouge2Noteen_US
dc.subjectBangla languageen_US
dc.subjectmLongT5en_US
dc.subject.lcshNatural language processing (Computer science)
dc.subject.lcshData mining
dc.titleBnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarizationen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB.Sc in Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record