A method for spoken language understanding (SLU) includes generating a first encoded representation of words from a user based on an output of a recurrent neural network (RNN) …
During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input …
V Klimkov, TR Drugman, A Galkin… - US Patent 11,410,684, 2022 - Google Patents
Audio data from a first, source speaker is received and processed to determine linguistic units and vocal characteristics corresponding to those linguistic units. The linguistic units …
WU Xixin, M Wang, S Kang, D Su, D Yu - US Patent 12,014,720, 2024 - Google Patents
This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be …
(57) ABSTRACT A system for use in video game development to generate expressive speech audio comprises a user interface config ured to receive user-input text data and a …
TD Stratton, S Lile - US Patent 11,436,780, 2022 - Google Patents
A method for matching mouth shape and movement in digital video to alternative audio includes deriving a sequence of facial poses including mouth shapes for an actor from a …
C Jitong, P Kainan, P Wei, Z Yanqi - US Patent 11,238,843, 2022 - Google Patents
Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a …
H Lee - US Patent 11,138,963, 2021 - Google Patents
A processor-implemented text-to-speech method includes determining, using a sub- encoder, a first feature vector indicating an utterance characteristic of a speaker from feature …