Challenge. In this challenge, our submitted systems are categorized into streaming and
nonstreaming systems based on latency thresholds. Only audio information is utilized in the
submissions. For the streaming system, the same framework as the first baseline system is
used. It covers 150ms-, 350ms-and 1000ms-latency thresholds. For the non-streaming
system (greater than 1000ms), we submit three different systems. Experimental results …