Secure approximation of edit distance on genomic data

MMA Aziz, D Alhadidi, N Mohammed - BMC medical genomics, 2017 - Springer
BMC medical genomics, 2017Springer
Background Edit distance is a well established metric to quantify how dissimilar two strings
are by counting the minimum number of operations required to transform one string into the
other. It is utilized in the domain of human genomic sequence similarity as it captures the
requirements and leads to a better diagnosis of diseases. However, in addition to the
computational complexity due to the large genomic sequence length, the privacy of these
sequences are highly important. As these genomic sequences are unique and can identify …
Background
Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext.
Methods
In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations.
Results
Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method.
Conclusion
The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果