P
US9620104B2ActiveUtilityPatentIndex 92

System and method for user-specified pronunciation of words for speech synthesis and recognition

Assignee: APPLE INCPriority: Jun 7, 2013Filed: Jun 6, 2014Granted: Apr 11, 2017
Est. expiryJun 7, 2033(~6.9 yrs left)· nominal 20-yr term from priority
Inventors:NAIK DEVANG KGRUBER THOMAS RWEINER LIAMBINDER JUSTIN GSRISUWANANUKORN CHARLESEVERMANN GUNNARWILLIAMS SHAUN ERICCHEN HONGNAPOLITANO LIA T
G10L 13/04G10L 2015/0631G10L 13/08G10L 13/027G10L 2015/0638G10L 15/26G10L 15/063G10L 15/265G10L 15/22
92
PatentIndex Score
34
Cited by
5,384
References
23
Claims

Abstract

The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet. The first set of phonemes is mapped to a second set of phonemes to generate a second phonetic representation, where the second set of phonemes is selected from a speech synthesis phonetic alphabet. The second phonetic representation is stored in association with a text string corresponding to the at least one word.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method for learning word pronunciations, comprising:
 at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors: 
 receiving a first speech input including at least one word; 
 determining a first phonetic representation of the at least one word, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet; 
 mapping the first set of phonemes to a second set of phonemes to generate a second phonetic representation, the second set of phonemes selected from a speech synthesis phonetic alphabet that is different from the speech recognition phonetic alphabet, wherein the speech recognition phonetic alphabet and the speech synthesis phonetic alphabet are phonetic alphabets of a same language; and 
 storing the second phonetic representation in association with a text string corresponding to the at least one word. 
 
     
     
       2. The method of  claim 1 , further comprising, prior to receiving the first speech input, providing the text string. 
     
     
       3. The method of  claim 2 , wherein the text string is a name in a contact list associated with a user. 
     
     
       4. The method of  claim 2 , wherein the text string is input by a user via a keyboard. 
     
     
       5. The method of  claim 2 , wherein the text string is from a webpage displayed by the electronic device. 
     
     
       6. The method of  claim 1 , further comprising determining the text string using the first phonetic representation. 
     
     
       7. The method of  claim 1 , further comprising updating a speech recognizer to associate the first phonetic representation with the text string. 
     
     
       8. The method of  claim 7 , further comprising:
 after updating the speech recognizer, receiving a second speech input including the at least one word; 
 determining a third phonetic representation of the at least one word; and 
 determining that the at least one word corresponds to the text string based on a determination that the third phonetic representation is substantially similar to the first phonetic representation. 
 
     
     
       9. The method of  claim 1 , further comprising, after storing the second phonetic representation in association with the text string, synthesizing a speech output corresponding to the text string using the second phonetic representation. 
     
     
       10. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device with a display, cause the device to perform:
 receiving a first speech input including at least one word; 
 determining a first phonetic representation of the at least one word, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet; 
 mapping the first set of phonemes to a second set of phonemes to generate a second phonetic representation, the second set of phonemes selected from a speech synthesis phonetic alphabet that is different from the speech recognition phonetic alphabet, wherein the speech recognition phonetic alphabet and the speech synthesis phonetic alphabet are phonetic alphabets of a same language; and 
 storing the second phonetic representation in association with a text string corresponding to the at least one word. 
 
     
     
       11. The computer readable storage medium of  claim 10 , further comprising instructions for causing the device to perform, prior to receiving the first speech input, providing the text string. 
     
     
       12. The computer readable storage medium of  claim 11 , wherein the text string is a name in a contact list associated with a user. 
     
     
       13. The computer readable storage medium of  claim 11 , wherein the text string is input by a user via a keyboard. 
     
     
       14. The computer readable storage medium of  claim 11 , wherein the text string is from a webpage displayed by the electronic device. 
     
     
       15. The computer readable storage medium of  claim 10 , further comprising instructions for causing the processor to perform determining the text string using the first phonetic representation. 
     
     
       16. The computer readable storage medium of  claim 10 , further comprising instructions for causing the processor to perform updating a speech recognizer to associate the first phonetic representation with the text string. 
     
     
       17. An electronic device, comprising:
 one or more processors; 
 memory; and 
 one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing: 
 receiving a first speech input including at least one word; 
 determining a first phonetic representation of the at least one word, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet; 
 mapping the first set of phonemes to a second set of phonemes to generate a second phonetic representation, the second set of phonemes selected from a speech synthesis phonetic alphabet that is different from the speech recognition phonetic alphabet, wherein the speech recognition phonetic alphabet and the speech synthesis phonetic alphabet are phonetic alphabets of a same language; and 
 storing the second phonetic representation in association with a text string corresponding to the at least one word. 
 
     
     
       18. The device of  claim 17 , further comprising instructions for performing, prior to receiving the first speech input, providing the text string. 
     
     
       19. The device of  claim 18 , wherein the text string is a name in a contact list associated with a user. 
     
     
       20. The device of  claim 18 , wherein the text string is input by a user via a keyboard. 
     
     
       21. The device of  claim 18 , wherein the text string is from a webpage displayed by the electronic device. 
     
     
       22. The device of  claim 17 , further comprising instructions for performing determining the text string using the first phonetic representation. 
     
     
       23. The device of  claim 17 , further comprising instructions for performing updating a speech recognizer to associate the first phonetic representation with the text string.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.