Gene expression analysis using machine learning
Abstract
Cancer is a multifactorial disorder that occurs due to the complex interaction between the environment and gene. The susceptibility of a person to cancer depends on his genetic build-up. Recently, the study of genomes in discovering the interaction between disease and genes and how their interaction leads to specific phenotype, has grown exponentially. To analyze the expression of thousands of genes, one of the most important and revolutionary techniques used in genomics and systems biology is high-throughput microarray technology. To produce an accurate prognosis from such high-dimensional gene expressional data, machine learning can be an ideal choice. In this paper, we have tried to apply principal component analysis (PCA) and autoencoder on a brain cancer gene expression data retrieved from CuMiDa database and make an analysis of which technique produce better and more accurate reduced dimensional vectors and how different classical machine learning algorithms performs on these newly generated datasets. Finally, we also discussed how to improve these current techniques and how it can lead to better and sophisticated outcomes.