BnText2Table – dataset and Text-to-Table generation in Bangla
Abstract
"In this fast-paced world, everyone relies on technology to get their work done quickly
and efficiently, since using technology greatly simplifies every task that needs to be
done. The majority of the publications are lengthy and packed with crucial data.
However, in many instances, extra words are also added to boost the word count,
which causes a number of difficulties when trying to get the desired information.
For the English language, numerous tools are available to summarize the text and
present it in tabular form. However, it is not the same for our mother tongue,
Bangla. Despite being the 5th most-spoken native language in the world, there is
no tool available to ease the workload in Bengali language. Our research will assist
in such circumstances by summarizing the given information in tabular form within
the shortest possible time. Since there is no dataset available that will be suitable
for our research, we have prepared the dataset ourselves. Then, we have used the
mBART-50-large, mT5-base, mT5-m2m-CrossSum and BanglaT5 models for the
implementation. Finding the appropriate table headers in light of the context and
order of the data is the most important task in this study. To sum up, our main
goal is to develop a benchmark dataset for a text-to-table model for the betterment
of the NLP research community."