作者
Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, Niko Välimäki
发表日期
2010/3/1
期刊
Journal of Computational Biology
卷号
17
期号
3
页码范围
281-308
出版商
Mary Ann Liebert, Inc.
简介
A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible …
引用总数
20082009201020112012201320142015201620172018201920202021202220232024144151017181616221917222118128