Secure approximation of edit distance on genomic data- 学术资源搜索

Secure approximation of edit distance on genomic data

MMA Aziz, D Alhadidi, N Mohammed - BMC medical genomics, 2017 - Springer

BMC medical genomics, 2017•Springer

Background Edit distance is a well established metric to quantify how dissimilar two strings
are by counting the minimum number of operations required to transform one string into the
other. It is utilized in the domain of human genomic sequence similarity as it captures the
requirements and leads to a better diagnosis of diseases. However, in addition to the
computational complexity due to the large genomic sequence length, the privacy of these
sequences are highly important. As these genomic sequences are unique and can identify …

Background

Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext.

Methods

In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations.

Results

Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method.

Conclusion

The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets.

Springer

展开收起

被引用次数：57 相关文章所有 13 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Secure approximation of edit distance on genomic data

引用