Classification of arsenic contamination in water using Machine learning
Abstract
Arsenic is a semi-metal element in the periodic table that is odorless and tasteless. It enters drinking water supplies from natural deposits in the earth or from agriculture and industrial practices. In South Asian countries, especially in Bangladesh, arsenic contamination is a big concern for a mass population because the main sources of drinking water are shallow and deep tube wells. This causes deadly effects to humans as it causes different types of diseases and can also lead to cancer.
An NGO, Asia Arsenic Network, has performed laboratory tests on samples of arsenic
contaminated water from some areas of Bangladesh, and the resulting data has been provided to us. There are 11 features in the data, and one output feature, arsenic level, which has 5 classes. Introducing Machine Learning, a branch of Artificial Intelligence, into the arsenic contamination data will help to produce a better diagnosis of this threat. Algorithms like Neural Networks and Support Vector Machines have been applied on this dataset and the performances of each algorithm has been analyzed to find out which algorithm performs best in the classification of arsenic contamination in the data set provided. Error analysis has been done using precision, recall and F1 score.