P
US8868422B2ActiveUtilityPatentIndex 43

Storing a representative speech unit waveform for speech synthesis based on searching for similar speech units

Assignee: HIRABAYASHI GOUPriority: Mar 26, 2010Filed: Sep 13, 2010Granted: Oct 21, 2014
Est. expiryMar 26, 2030(~3.7 yrs left)· nominal 20-yr term from priority
Inventors:HIRABAYASHI GOUKAGOSHIMA TAKEHIKO
G10L 13/08G10L 13/033G10L 13/04G10L 13/06
43
PatentIndex Score
1
Cited by
11
References
7
Claims

Abstract

According to one embodiment, a method for editing speech is disclosed. The method can generate speech information from a text. The speech information includes phonologic information and prosody information. The method can divide the speech information into a plurality of speech units, based on at least one of the phonologic information and the prosody information. The method can search at least two speech units from the plurality of speech units. At least one of the phonologic information and the prosody information in the at least two speech units are identical or similar. In addition, the method can store a speech unit waveform corresponding to one of the at least two speech units as a representative speech unit into a memory.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method for editing speech, comprising:
 inputting a plurality of texts to generate representative speech unit waveforms to be used by a phrase concatenation based speech synthesis method; 
 generating speech information from the texts, the speech information comprising phonologic information and prosody information; 
 generating speech waveforms from the speech information by text-to-speech synthesis; 
 dividing the speech waveforms into a plurality of speech unit waveforms based on the phonologic information; 
 searching at least two speech unit waveforms from the plurality of speech unit waveforms, wherein the at least two speech unit waveforms are identical or similar; 
 selecting a representative speech unit waveform from the at least two speech unit waveforms; and 
 storing the representative speech unit waveform into a memory. 
 
     
     
       2. The method according to  claim 1 , wherein
 the dividing comprises dividing the speech waveforms into the plurality of speech unit waveforms based on amplitudes of the speech waveforms. 
 
     
     
       3. The method according to  claim 2 , further comprising:
 generating the phonologic information comprising a phoneme sequence that represents the text as phonemes, 
 wherein 
 the phoneme sequence comprises an unvoiced sound and a pause sound representing silence, 
 the dividing comprises dividing the speech waveforms at a time in a section corresponding to the unvoiced sound or the pause sound, and 
 the time corresponds to an absolute value of the amplitude being below a threshold. 
 
     
     
       4. The method according to  claim 3 , further comprising:
 generating the prosody information comprising a duration and a fundamental frequency of each of the phonemes, and 
 generating the representative speech unit waveform by averaging at least one of the duration and the fundamental frequency in the at least two speech unit waveforms. 
 
     
     
       5. An apparatus for editing speech, comprising:
 an input unit configured to input a plurality of texts to generate representative speech unit waveforms by a phrase concatenation based speech synthesis method; 
 a generation unit configured to generate speech information from the texts, the speech information comprising phonologic information and prosody information, and to generate speech waveforms from the speech information by text-to-speech synthesis; 
 a division unit configured to divide the speech waveforms into a plurality of speech unit waveforms based on the phonologic information; 
 a search unit configured to search at least two speech unit waveforms, from the plurality of speech unit waveforms, that are identical or similar, and to select a representative speech unit waveform from the at least two speech unit waveforms; and 
 a storing unit configured to store the representative speech unit waveform. 
 
     
     
       6. A method for editing speech, comprising:
 inputting a plurality of texts to generate representative speech unit waveforms to be used by a phrase concatenation based speech synthesis method; 
 generating speech information from the texts, the speech information comprising phonologic information and prosody information; 
 generating speech waveforms from the speech information by text-to-speech synthesis; 
 dividing the speech waveforms into a plurality of speech unit waveforms based on the phonologic information; 
 searching at least two speech unit waveforms, from the plurality of speech unit waveforms, wherein subsets of the phonologic information and the prosody information respectively corresponding to the at least two speech unit waveforms are identical or similar; 
 selecting a representative speech unit waveform from the at least two speech unit waveforms; and 
 storing the representative speech unit waveform into a memory. 
 
     
     
       7. A method for editing speech, comprising:
 inputting a plurality of texts to generate representative speech unit waveforms to be used by a phrase concatenation based speech synthesis method; 
 generating speech information from the texts, the speech information comprising phonologic information and prosody information; 
 dividing the speech information into a plurality of speech information units based on the phonologic information; 
 searching at least two speech information units from the plurality of speech information units, wherein subsets of the phonologic information and the prosody information in the at least two speech information units are respectively identical or similar; 
 generating a representative speech information unit from the at least two speech information units; 
 generating a representative speech unit waveform corresponding to the representative speech information unit by text-to-speech synthesis; and 
 storing the representative speech unit waveform into a memory.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.