查看文章

helsinki.fi 中的 [PDF]

Storage and retrieval of highly repetitive sequence collections

作者

Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, Niko Välimäki

发表日期

2010/3/1

期刊

Journal of Computational Biology

卷号

期号

页码范围

281-308

出版商

Mary Ann Liebert, Inc.

简介

A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible …

引用总数

被引用次数：242

200820092010201120122013201420152016201720182019202020212022202320241 4 4 15 10 17 18 16 16 22 19 17 22 21 18 12 8

学术搜索中的文章

Storage and retrieval of highly repetitive sequence collections

V Mäkinen, G Navarro, J Sirén, N Välimäki - Journal of Computational Biology, 2010

被引用次数：240 相关文章所有 14 个版本

Run-length compressed indexes for repetitive sequence collections*

V Mäkinen, G Navarro, J Sirén, N Välimäki - 2008

被引用次数：2 相关文章所有 6 个版本

Storage and Retrieval of Individual Genomes and other Repetitive Sequence Collections*

V Mäkinen, J Sirén, N Välimäki - 2008