identifying a publicly accessible first source of the first speech data and its corresponding
first text transcription; extracting a second speech data of an accessible encoding format
from the first speech data; extracting a second text transcription data with at least one
encoding format from the first text transcription data; matching and aligning the transcription
to the extracted second speech data at a sentence, word, phoneme level, or combination …