dc.contributor.advisor | Sadeque, Farig Yousuf | |
dc.contributor.author | Adib, Quazi Adibur Rahman | |
dc.contributor.author | Alam, Sanjana Binte | |
dc.date.accessioned | 2024-05-26T02:57:33Z | |
dc.date.available | 2024-05-26T02:57:33Z | |
dc.date.copyright | ©2024 | |
dc.date.issued | 2024-01 | |
dc.identifier.other | ID: 21241056 | |
dc.identifier.other | ID: 20301455 | |
dc.identifier.uri | http://hdl.handle.net/10361/22910 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 44-47). | |
dc.description.abstract | Despite significant improvements in the general-purpose text summarization task in
the past decade, clinical conversion summarization is going through a tough time
due to a lack of initiative to provide open-source datasets to the NLP community.
In this work, we are presenting the first long and short Bangla Clinical Dialogue to
Note Summarization datasets: BnClinical-Sum. Long conversations are detailed
conversations with additional medical history. For the long dialogue dataset, we
have accumulated around 207 pairs of full conversations and notes. Each note consists
of in-depth discussions on previous medical histories, family medical records,
and a wide variety of other topics. For the short dialogue version, our dataset
consists of 1701 real-life short manually translated clinical conversations and their
corresponding notes. The short dialogue dataset consists of subsets of long dialogue
where each dialogue snippet addresses one sub-topic like previous medical histories,
family medical records, etc. Those conversations are from 20 different categories like
labs, assessments, plans, etc. Owing to demonstrating the efficacy of both datasets,
we have trained our datasets on current state-of-the-art text summarization and
text-to-text generative models to provide a solid benchmark for clinical conversion
summarization tasks. | en_US |
dc.description.statementofresponsibility | Quazi Adibur Rahman Adib | |
dc.description.statementofresponsibility | Sanjana Binte Alam | |
dc.format.extent | 60 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | ClinicalNLP | en_US |
dc.subject | mBART | en_US |
dc.subject | Dialouge2Note | en_US |
dc.subject | Bangla language | en_US |
dc.subject | mLongT5 | en_US |
dc.subject.lcsh | Natural language processing (Computer science) | |
dc.subject.lcsh | Data mining | |
dc.title | BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B.Sc in Computer Science | |