Prediction of UTR5’, CDS AND UTR3’ Splice sites in an unknown DNA sequence
Abstract
The recent flood of data from genome sequences and functional genomics has given rise to a new field, bioinformatics, which combines elements of biology and computer science. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in sequencing and annotating genomes. Given a biological sequence, such as a Deoxyribonucleic acid (DNA) sequence, biologists would like to analyze what that sequence represents. A challenging and interesting problem in computational biology at the moment is finding genes in DNA sequences. With so many genomes being sequenced rapidly, it remains important to begin by identifying genes computationally. A DNA sequence consists of four nucleotide bases. There are two untranslated regions UTR5’ and UTR3’, which is not translated during the process of translation. The nucleotide base pair between UTR5’ and UTR3’ is known as the code section (CDS). Our goal is to find and develop a way to determine a likelihood value (using hidden Markov model), based on which the joining sections of these three regions can by identified in any given sequence.