Bangla text classification using machine learning and deep learning techniques
Abstract
At present, we have seen everything is getting digitized where technology almost
takes full control over our life. As a result, a massive number of textual documents
are generated on online platforms and news articles are no exception. People prefer
to get connected with online news portals as they are updated every single hour.
Newspaper articles have so many categories such as politics, sports, business, entertainment,
etc. Recently, we have noticed the rapid growth and increase of Bangla
online news portals on the internet. It will be helpful for the online readers to get
recommended the preferable news category which assists them in locating desired
articles. Manually categorizing news articles takes a huge time and e ort. So, text
categorization is necessary for the modern day, as enormous amounts of uncategorized
data are an issue here. Although the study has improved in categorizing news
articles greatly for languages such as English, Arabic, Chinese, Urdu, and Hindi.
Among others, the Bangla language has shown little development. However, some
approaches applied to categorize Bangla news articles, using some machine learning
algorithms where resources were minimum. We have applied ve machine learning
classi ers and two neural networks to categorize Bangla news articles. To show the
comparison between applied algorithms, which one is performing better, we have
used four metrics that measure performance.