Automatic text summarization using fuzzy c–means clustering
Abstract
Automatic text summarization process has been significantly explored throughout the years to cope with the staggering increase of virtual data. Text summarization process is commonly divided into two areas-Extractive and Abstractive. Abstractive summarization processes generate unique sentences that are different from the sentences in original document keeping the same theme, whereas Extractive summarization processes largely depend on sentence extraction techniques- implementing graph models or sentence-based models. In this paper, a sentence-based model has been proposed where the sentence ranking procedure adopts fuzzy C-Means (FCM) clustering, an unsupervised classification method, for sentence extraction purpose. The sentence scoring task relies on five key features, including Topic Sentence which is the first novelty of the proposed model. Furthermore, C-Means clustering is a soft-computing technique that is usually used for pattern recognition tasks but can be improved significantly by hard clustering the membership of the elements which has not been regarded in similar processes in any of the previous works, adding to the novelty of the presented model. Standard summary evaluation techniques have been used to gauge the precision, recall and f-measure of the proposed FCM model and have been compared with different summarizers from different perspectives. Summarizers having different dataset and approaches such as, bushy path, GSM, baseline, TextRank have been compared to the proposed model using ROUGE method. The outcome shows that the FCM model surpasses the previous approaches by a significant margin.