Inference of gene regulatory metwork (GRN) rrom gene expression data using k-means clustering and entropy based selection of interactions

Galib, Asadullah Al; Rahman, Mohammad Mohaimanur

View/Open

12201098, 13301088_CSE.pdf (2.028Mb)

Date

2017-04

Abstract

Inferring regulatory network from gene expression data only is considered a challenging task in systems biology, and the introduction of various high-throughput DNA microarray technologies in the collection of expression data has significantly increased the amount of data to be analyzed by existing algorithms. All of these algorithms focus on different issues regarding the inference of gene regulatory network (GRN) and their methodologies work better only for certain types of datasets and/or regulatory networks. As a result, they have inherent limitations in dealing with different types of datasets. In this paper, we propose a novel method to infer gene regulatory network from expression data which utilizes K-means Clustering along with some properties of entropy from information theory. The proposed method has two main components, first grouping the genes of a dataset into given number of clusters and then finding statistically significant interactions among genes of each individual cluster and selected nearby clusters. To achieve this, an information theoretic approach based on Entropy Reduction is used to finally generate a regulatory interaction matrix consisting of all genes. The purpose of grouping genes in clusters based on the similarity of expression level is to minimize the search space of regulatory interactions among genes. The Entropy Reduction Technique (ERT) finds regulatory interactions with reduced number of genes. To assess the performance of our algorithm, we used datasets from DREAM5 – Network Inference challenge [6], DREAM4 – In Silico Network challenge [7] and one in silico dataset generated by GeneNetWeaver [8]. The performance of our algorithm was compared with the result of ARACNE, a popular information theoretic approach to reverse engineer gene regulatory network from expression dataset. We used precision and recall as performance measures. Our algorithm showed significant improvement in the precision and recall percentage over the network generated by ARACNE. We also compared our results among different threshold values and different numbers of clusters with three versions of our algorithm -No Clustering, Unmerged Clustering and Selected Merged Clustering.

Keywords

Gene regulatory network; Entropy reduction

Description

This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.

Cataloged from PDF version of thesis report.

Includes bibliographical references (page 56-57).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1586]