BnText2Table – dataset and Text-to-Table generation in Bangla

Zariyat, Tahreema Rahman; Ahmed, Fahim Irfan; Oishi, Tahsina Tajrim; Morshed, Maruf

View/Open

20101433_20101508_20101394_20101299_CSE.pdf (2.439Mb)

Date

2024-01

Publisher

Brac University

Abstract

"In this fast-paced world, everyone relies on technology to get their work done quickly and efficiently, since using technology greatly simplifies every task that needs to be done. The majority of the publications are lengthy and packed with crucial data. However, in many instances, extra words are also added to boost the word count, which causes a number of difficulties when trying to get the desired information. For the English language, numerous tools are available to summarize the text and present it in tabular form. However, it is not the same for our mother tongue, Bangla. Despite being the 5th most-spoken native language in the world, there is no tool available to ease the workload in Bengali language. Our research will assist in such circumstances by summarizing the given information in tabular form within the shortest possible time. Since there is no dataset available that will be suitable for our research, we have prepared the dataset ourselves. Then, we have used the mBART-50-large, mT5-base, mT5-m2m-CrossSum and BanglaT5 models for the implementation. Finding the appropriate table headers in light of the context and order of the data is the most important task in this study. To sum up, our main goal is to develop a benchmark dataset for a text-to-table model for the betterment of the NLP research community."

Keywords

Bangla NLP; Text2Table; Summarizer; mBART; Transformer; Information extraction; T5; mT5

LC Subject Headings

Computation and Language

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 32-34).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1480]