P
US10354650B2ActiveUtilityPatentIndex 99

Recognizing speech with mixed speech recognition models to generate transcriptions

Assignee: GOOGLE LLCPriority: Jun 26, 2012Filed: Mar 15, 2013Granted: Jul 16, 2019
Est. expiryJun 26, 2032(~6 yrs left)· nominal 20-yr term from priority
Inventors:GRUENSTEIN ALEXANDER HALEKSIC PETAR
G10L 15/197G10L 15/18G10L 15/32G10L 15/22G10L 15/30G10L 15/193G10L 15/26
99
PatentIndex Score
142
Cited by
66
References
19
Claims

Abstract

In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A computer-implemented method comprising:
 accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances; 
 generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is based on user-specific data; 
 generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model independent of user-specific data; 
 determining that the second transcription of the utterances includes a term from a predefined set of one or more terms associated with actions that are performable by the computing device; and 
 based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance. 
 
     
     
       2. The method of  claim 1  wherein the first speech recognizer employs a grammar-based language model. 
     
     
       3. The method of  claim 2  wherein the grammar-based language model includes a context free grammar. 
     
     
       4. The method of  claim 1  wherein the second speech recognizer employs a statistics-based language model. 
     
     
       5. The method of  claim 1  wherein the user-specific data includes a contact list for the user, an applications list of applications installed on the computing device, or a media list of media stored on the computing device. 
     
     
       6. The method of  claim 1  wherein the first speech recognizer is implemented on the computing device and the second speech recognizer is implemented on one or more server devices. 
     
     
       7. A system comprising:
 one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
 accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances; 
 generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is developed based on user-specific data; 
 generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model developed independent of user-specific data; 
 determining that the second transcription of the utterances includes a term from a predefined set of one or more terms associated with actions that are performable by the computing device; and 
 based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance. 
 
 
     
     
       8. The system of  claim 7  wherein the first speech recognizer employs a grammar-based language model. 
     
     
       9. The system of  claim 8  wherein the grammar-based language model includes a context free grammar. 
     
     
       10. The system of  claim 7  wherein the second speech recognizer employs a statistics-based language model. 
     
     
       11. The system of  claim 7  wherein the user-specific data includes a contact list for the user, an applications list of applications installed on the computing device, or a media list of media stored on the computing device. 
     
     
       12. The system of  claim 7  wherein the first speech recognizer is implemented on the computing device and the second speech recognizer is implemented on one or more server devices. 
     
     
       13. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
 accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances; 
 determining a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is developed based on user-specific data; 
 determining a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model developed independent of user-specific data; 
 determining that the second transcription of the utterances includes a term from a predefined set of one or more terms associated with actions that are performable by the computing device; and 
 based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance. 
 
     
     
       14. The medium of  claim 13  wherein the first speech recognizer employs a grammar-based language model. 
     
     
       15. The medium of  claim 13  wherein the second speech recognizer employs a statistics-based language model. 
     
     
       16. The medium of  claim 13  wherein the user-specific data includes a contact list for the user, an applications list of applications installed on the computing device, or a media list of media stored on the computing device. 
     
     
       17. The medium of  claim 13  wherein the first speech recognizer is implemented on the computing device and the second speech recognizer is implemented on one or more server devices. 
     
     
       18. The method of  claim 1 , further comprising determining that the second transcription represents a search query, and
 wherein determining that the second transcription of the utterances includes a term from a predefined set of one or more terms is performed in response to determining that the second transcription represents the search query. 
 
     
     
       19. The method of  claim 1 , further comprising determining that the second transcription represents a search query and that the first transcription represents an action, and
 wherein determining that the second transcription of the utterances includes a term from a predefined set of one or more terms is performed in response to determining that the second transcription represents the search query and that the first transcription represents the action.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.