where taxonomic annotation takes place prior to association to disease. Albeit effective in
some cases, the approach fails to detect novel pathogens and remote variants not present in
reference databases. We have developed a species independent pipeline that utilises
sequence clustering for the identification of nucleotide sequences that co-occur across
multiple sequencing data instances. We applied the workflow to 686 sequencing libraries …