Interpretable MOOC dropout prediction using different Ensemble Methods and XAI
Abstract
Massive open online course (MOOC) has been around for a while, but started to
gain traction since 2012 when Coursera was established. MOOCs use pre-recorded
lectures and scheduled weekly tests to provide content and access to students over
the internet. Even though there was a high expectation that it would revolutionize
the education system, due to the mode of one-way content delivery, the goal was
too far-fetched. The flexibility in deadlines and no restrictions of classroom exams
meant students were not bound to finish on time. Hence, most students did not finish
the course and dropped out. The dataset used in our research was the KDDCUP
2015 dataset, which was publicly available by the organizers of XuetangX platform.
We used about 12 features namely browser access, navigate, average chapter delays,
server sequential etc to comprehend the possibility of dropout. In this paper, we
aim to predict dropout of a learner so that it can be prevented through manual
interaction. Additionally, we have implemented XAI to interpret our models to
suggest MOOC platforms which feature impact dropout the most. We used different
ensemble learning techniques, namely voting classifier and stacking. Our voting
classifier uses five of our best performing machine learning models. Then we evaluate
the model by using multiple metrics such as precision, recall, F1-score, ROC curve
and AUC score. Finally, we managed to obtain a recall of 97.636% with stacking
and f1-score of 91.603% with hard voting classifier.