Detecting propagandistic poster title: a machine learning approach
Abstract
Detecting propagandistic content is crucial in today’s digital age where misinformation
spreads rapidly. In this study, we propose a machine learning approach
aimed at identifying propaganda in poster titles. Our methodology encompasses
various text classification techniques, including Random Forest, Logistic Regression,
K-Nearest Neighbor (KNN), Naive Bayes classifier, Support Vector Machine (SVM),
RoBERTa, Stacking Classifier, Stacking Classifier With Feature Engineering, and
RoBERTa XGBoost Hybrid Model. We employ robust feature extraction methods
such as TF-IDF and Word2Vec, along with advanced ensemble learning strategies,
to enhance the accuracy and effectiveness of the classification process. Specifically,
we introduce two hybrid models: the Stacking Classifier With Feature Engineering,
which incorporates word2vec and TF-IDF to improve accuracy, and the RoBERTa
XGBoost Hybrid Model, which utilizes a combination of TF-IDF vectorization and
RoBERTa embeddings followed by XGBoost classification. Through extensive experimentation
and evaluation, we analyze the performance of each model in terms
of accuracy, precision, recall, and F1-score. Our findings demonstrate promising
results, with certain models exhibiting significant improvements over baseline approaches.
Moreover, we conduct a thorough analysis of the models’ strengths and
weaknesses, providing insights into their efficacy in detecting propagandistic content.
Overall, our research contributes to the development of effective tools for combating
propagandistic title and promoting media literacy in the digital landscape.