US7983919B2ActiveUtilityPatentIndex 98

System and method for performing speech synthesis with a cache of phoneme sequences

Assignee: AT & T IP II LPPriority: Aug 9, 2007Filed: Aug 9, 2007Granted: Jul 19, 2011

Est. expiryAug 9, 2027(~1.1 yrs left)· nominal 20-yr term from priority

Inventors:CONKIE ALISTAIR

G10L 13/08G10L 13/04

PatentIndex Score

317

Cited by

References

Claims

Abstract

Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis.

Claims

exact text as granted — not AI-modified

1. A method of performing speech synthesis, the method comprising:
 obtaining at a first time a plurality of phoneme sequences by applying a first part of a speech synthesizer to a text corpus to yield an obtained plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences to be used in synthesizing speech at a second time which is later than the first time; 
 for each respective phoneme sequence of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize the respective phoneme sequence; and 
 adding the identified joins to a cache for use in speech synthesis. 
 
     
     
       2. The method of  claim 1 , the method further comprising:
 recording a frequency of occurrence for each of the obtained plurality of phoneme sequences; and 
 pruning the cache. 
 
     
     
       3. The method of  claim 1 , the method further comprising:
 building a plurality of caches of different sizes based on values or parameters. 
 
     
     
       4. The method of  claim 3 , wherein the values or parameters comprise computational costs or frequency of occurrence. 
     
     
       5. A method of synthesizing a speech signal, the method comprising:
 selecting one or more acoustic units from an acoustic unit database; 
 determining whether a join cost of an acoustic unit sequential pair resides in a cache created by steps comprising:
 obtaining at a first time a plurality of phoneme sequences by applying a first part of a speech synthesizer to a text corpus to yield an obtained plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences to be used in synthesizing speech at a second time which is later than the first time; 
 for each respective phoneme sequence of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize the respective-phoneme sequence; and 
 adding the identified joins to a cache for use in speech synthesis; 
 
 if the cache contains the join, extracting the join from the cache for use in speech synthesis; and 
 if the cache does not contain the join, calculating a value of the join for use in speech synthesis. 
 
     
     
       6. The method of  claim 5 , wherein calculating the value of the join cost is performed to enhance accuracy over speed. 
     
     
       7. A system for performing speech synthesis, the system comprising:
 a first module configured to obtain at a first time a plurality of phoneme sequences by applying a first part of a speech synthesizer to a text corpus to yield an obtained plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences to be used in synthesizing speech at a second time which is later than the first time; 
 a second module configured, for each respective phoneme sequence of the obtained plurality of phoneme sequences, to identify joins that would be calculated to synthesize the respective phoneme sequence; and 
 a third module configured to add the identified joins to a cache for use in speech synthesis. 
 
     
     
       8. The system of  claim 7 , the system further comprising:
 a fourth module configured to record a frequency of occurrence for each of the plurality of phoneme sequences; and 
 a fifth module configured to prune the cache. 
 
     
     
       9. The system of  claim 7 , the system further comprising:
 a fourth module configured to build a plurality of caches of different sizes based on values or parameters. 
 
     
     
       10. The system of  claim 9 , wherein the values or parameters comprise computational costs or frequency of occurrence. 
     
     
       11. A system for synthesizing a speech signal, the system comprising:
 a first module configured to select one or more acoustic units from an acoustic unit database; 
 a second module configured to determine whether a join cost of an acoustic unit sequential pair resides in a cache created by steps comprising:
 obtaining at a first time a plurality of phoneme sequences by applying a first part of a speech synthesizer to a text corpus to yield an obtained plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences to be used in synthesizing speech at a second time which is later than the first time; 
 for each respective phoneme sequence of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize the respective-phoneme sequence; and 
 adding the identified joins to a cache for use in speech synthesis 
 
 a third module configured, if the cache contains the join, to extract the join from the cache for use in speech synthesis; and 
 a fourth module configured, if the cache does not contain the join, to calculate a value of the join for use in speech synthesis. 
 
     
     
       12. The system of  claim 11 , wherein calculating the value of the join cost is performed to enhance accuracy over speed. 
     
     
       13. A non-transitory computer readable medium storing a computer program having instructions for performing speech synthesis, the instructions comprising:
 obtaining at a first time a plurality of phoneme sequences by applying a first part of a speech synthesizer to a text corpus to yield an obtained plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences to be used in synthesizing speech at a second time which is later than the first time; 
 for each respective phoneme sequence of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize the respective phoneme sequence; and 
 adding the identified joins to a cache for use in speech synthesis. 
 
     
     
       14. The non-transitory computer readable medium of  claim 13 , the instructions further comprising:
 recording a frequency of occurrence for each of the obtained plurality of phoneme sequences; and 
 pruning the cache. 
 
     
     
       15. The non-transitory computer readable medium of  claim 13 , the instructions further comprising:
 building a plurality of caches of different sizes based on values or parameters. 
 
     
     
       16. The non-transitory computer readable medium of  claim 15 , wherein the values or parameters comprise computational costs or frequency of occurrence. 
     
     
       17. A non-transitory computer readable medium storing a computer program having instructions for synthesizing a speech signal, the instructions comprising:
 selecting one or more acoustic units from an acoustic unit database; 
 determining whether a join cost of an acoustic unit sequential pair resides in a cache created by steps comprising:
 obtaining at a first time a plurality of phoneme sequences by applying a first part of a speech synthesizer to a text corpus to yield an obtained plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences to be used in synthesizing speech at a second time which is later than the first time; 
 for each respective phoneme sequence of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize the respective-phoneme sequence; and 
 adding the identified joins to a cache for use in speech synthesis 
 
 if the cache contains the join, extracting the join from the cache for use in speech synthesis; and 
 if the cache does not contain the join, calculating a value of the join for use in speech synthesis. 
 
     
     
       18. The non-transitory computer readable medium of  claim 17 , wherein calculating the value of the join cost is performed to enhance accuracy over speed.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.