Missing value replacement in strings and applications

G Bernardini, C Liu, G Loukides… - Data Mining and …, 2025 - Springer
Missing values arise routinely in real-world sequential (string) datasets due to:(1) imprecise
data measurements;(2) flexible sequence modeling, such as binding profiles of molecular …

Comparing elastic-degenerate strings: Algorithms, lower bounds, and applications

E Gabory, MN Mwaniki, N Pisanti… - … Matching (CPM 2023 …, 2023 - inria.hal.science
An elastic-degenerate (ED) string T is a sequence of n sets T [1],..., T [n] containing m strings
in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the …

Hide and mine in strings: Hardness, algorithms, and experiments

G Bernardini, A Conte, G Gourdel… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Data sanitization and frequent pattern mining are two well-studied topics in data mining.
Data sanitization is the process of disguising (hiding) confidential information in a given …

Elastic-Degenerate String Comparison

E Gabory, MN Mwaniki, N Pisanti, SP Pissis… - arXiv preprint arXiv …, 2024 - arxiv.org
An elastic-degenerate (ED) string $ T $ is a sequence of $ n $ sets $ T [1],\ldots, T [n] $
containing $ m $ strings in total whose cumulative length is $ N $. We call $ n $, $ m $, and …

Reverse-safe text indexing

G Bernardini, H Chen, G Fici, G Loukides… - Journal of Experimental …, 2021 - dl.acm.org
We introduce the notion of reverse-safe data structures. These are data structures that
prevent the reconstruction of the data they encode (ie, they cannot be easily reversed). A …

Differentially private string sanitization for frequency-based mining tasks

H Chen, C Dong, L Fan, G Loukides… - … Conference on Data …, 2021 - ieeexplore.ieee.org
Strings are used to model genomic, natural language, and web activity data, and are thus
often shared broadly. However, string data sharing has raised privacy concerns stemming …

Constructing strings avoiding forbidden substrings

G Bernardini, A Marchetti-Spaccamela… - CPM 2021-32nd …, 2021 - inria.hal.science
We consider the problem of constructing strings over an alphabet Σ that start with a given
prefix u, end with a given suffix v, and avoid occurrences of a given set of forbidden …

String sanitization under edit distance: improved and generalized

T Mieno, SP Pissis, L Stougie, M Sweering - arXiv preprint arXiv …, 2020 - arxiv.org
Let $ W $ be a string of length $ n $ over an alphabet $\Sigma $, $ k $ be a positive integer,
and $\mathcal {S} $ be a set of length-$ k $ substrings of $ W $. The ETFS problem asks us …

Sketch-based approaches to process massive string data

G Gourdel - 2023 - theses.hal.science
The simplicity of strings and their impactful usage puts their processing at the heart of many
applications, including Bioinformatics, Information Retrieval, and Cybersecurity. Exact …

[PDF][PDF] Algorithms for string and graph data protection

H Chen - ACM Journal of Experimental Algorithmics (JEA), 2021 - kclpure.kcl.ac.uk
The growth in data volume has led to an increasing interest in analyzing data about
individuals in domains ranging from marketing to biomedical informatics. Therefore, data …