| COMPUTATIONAL BIOLOGY |
|||||
| Home
| Genome Biology | Vaccine
Design |
Protein-Protein interaction | |
|||||
THEORETICAL
MOLECULAR EVOLUTION AND COMPARATIVE GENOMICS |
|||||
All forms
of life contain genetic materials in the form of DNA. Sequencing technologies
provided the rough draft DNA sequences for many prokaryotic (e.g.. Bacteria)
and eukaryotic (e.g.. Human) genomes. The genes that code for proteins
are arranged in the chromosome in a complex but optimal manner. Genes
in eukaryotes generally contain two types of sequences: (1) exon (region
forming the protein); (2) intron (region not forming the protein). Efforts
are still underway in determining the exact number of genes in the human
genome. Likewise, there are many other questions related to genes in
eukaryotic genomes. How are these genes arranged in the genome? How
many paralogs (homologs in the same genome) are present in each genome?
How many introns does each gene in the genome contain? Where are these
introns present in each gene? What are the characteristics of intron-containing
genes in eukaryotes? Are there intronless genes in eukaryotics genomes?
These questions remain largely unaddressed. |
Databases
|
||||
| In order to better understand these questions we mined GenBank and developed specialized datasets and databases. One such database is called ExInt (Sakharkar et al., 2000; Sakharkar et al., 2002). The ExInt database is a collection of intron-containing eukaryotic genes derived from GenBank. Each record in the collection provides information on the different protein sequences coded by intron-containing genes in eukaryotes, the source of sequence, the gene structure (exon-intron arrangement) and other related properties of the sequence under study. This dataset is extremely useful for studying the unified features of gene structures in eukaryotic genes. A concrete understanding of this phenomenon may help to unlock some of the key events in molecular pathogenesis.
|
![]() |
||||
Another
database called SEGE was developed to collect all intronless genes in
eukaryotes (Sakharkar et al., 2002). This database also contains a derived
dataset from GenBank. Intronless genes are of particular interest in
studies related to genome evolution, gene arrangement and gene discovery.
Intronless genes largely circumvent alternative splicing because of
the absence of introns in them. Human proteins encoded by functional
intronless genes (particularly those without intron-containing paralogs)
could be considered as drug targets with less caution. Other databases
like IEKB (Intron-Exon Knowledge base), MIDB (Mismatched Intron Database)
were also developed by our group and their features were described
elsewhere (Sakharkar et al., 2000;
Sakharkar et al., 2001). We are also studying alternative
splicing by exon skipping and protein fusion using computational
procedures (Yiting et al., 2004). |
|||||