Genomic sequence, splicing, and gene annotation

SM Mount - The American Journal of Human Genetics, 2000 - cell.com
The American Journal of Human Genetics, 2000cell.com
The sequence of the human genome is at hand. Most scientists who use the sequence will
rely on annotations that provide information about the number and location of genes and
about their inferred protein products. Traditionally, genes have been annotated by scientists
with a particular interest in them. However, annotation of the complete human genome
sequence will have to be at least partially automated. Gene annotation incorporates cDNA
data (including expressed sequence tags [ESTs]), sequence similarity, and computational …
The sequence of the human genome is at hand. Most scientists who use the sequence will rely on annotations that provide information about the number and location of genes and about their inferred protein products. Traditionally, genes have been annotated by scientists with a particular interest in them. However, annotation of the complete human genome sequence will have to be at least partially automated. Gene annotation incorporates cDNA data (including expressed sequence tags [ESTs]), sequence similarity, and computational predictions based on the recognition of probable splice sites and coding regions (Stormo 2000; also see David Haussler’s Web site, Computational Genefinding). The state of the art was recently surveyed by the Genome Annotation Assessment Project-GASP1 and must be regarded as imperfect (Bork 2000; Reese et al. 2000). This review enumerates aspects of pre-mRNA splicing that limit our ability to predict gene structure from genomic sequence, drawing on the recently annotated complete genome of Drosophila melanogaster (Adams et al. 2000) as an example. In particular, the following four facts will be discussed. First, splice sites do not always conform to consensus. Second, noncoding exons are common. Third, internal exons can be arbitrarily small, and small internal exons confound not only gene finding but also the alignment of cDNA and genomic sequences. Fourth, splice sites are not recognized in isolation, and nucleotides that are far from splice sites can affect splicing. This list and the accompanying analysis should make molecular geneticists aware of the ways in which gene annotations can be wrong and should encourage recourse to the primary data. In addition, the same considerations indicate that inherited disease can
cell.com
以上显示的是最相近的搜索结果。 查看全部搜索结果