monolingual types and one intra-sententially code-switched type. In this work, we propose a
general framework to jointly model the likelihoods of the monolingual and code-switch sub-
tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with
label-to-frame synchronization, our joint modeling framework can be conditionally factorized
such that the final bilingual output, which may or may not be code-switched, is obtained …