a step of acquiring alignment between speech of each of a plurality of speakers and a
transcript of the speaker; a step of joining transcripts of the respective ones of the plurality of
speakers along a time axis, creating a transcript of speech of mixed speakers obtained from
synthesized speech of the speakers, and replacing predetermined transcribed portions of
the plurality of speakers overlapping on the time axis with a unit which represents a …