P
US8346544B2ExpiredUtilityPatentIndex 62

Selection of encoding modes and/or encoding rates for speech compression with closed loop re-decision

Assignee: QUALCOMM INCPriority: Jan 20, 2006Filed: Jan 22, 2007Granted: Jan 1, 2013
Est. expiryJan 20, 2026(expired)· nominal 20-yr term from priority
Inventors:MANJUNATH SHARATHKANDHADAI ANANTHAPADMANABHAN AASANIPALAICHOY EDDIE L T
G10L 19/24
62
PatentIndex Score
2
Cited by
73
References
46
Claims

Abstract

In a device configurable to encode speech performing an closed loop re-decision may comprise representing a speech signal by amplitude components and phase components for a current frame and a past frame. In a first closed loop stage, a first set of compressed components and a first set of uncompressed components for a current frame may be generated. A first set of features may be generated by comparing current and past frame amplitude and/or phase components. In a second closed loop stage, a second set of compressed components for the current frame may be generated by compressing the first set of compressed components and compressing the first set of uncompressed components. Generation of a second set of features may be based on the second set of compressed components from the current frame and a combination of amplitude and/or phase components from the past frame.

Claims

exact text as granted — not AI-modified
1. A method comprising:
 representing, by using a processing device, a speech signal by amplitude components and phase components for a current frame and a past frame; 
 in a first closed loop stage, generating a first set of compressed components and a first set of uncompressed components for the current frame; 
 retrieving the amplitude components and the phase components from the past frame; 
 generating a first set of features based on the first set of compressed components, the first set of uncompressed components, the amplitude components from the past frame, and the phase components from the past frame; 
 checking the first set of features as part of a closed loop re-decision; 
 determining a final encoding decision based on the checking; and 
 encoding the speech signal based on the final encoding decision. 
 
     
     
       2. The method of  claim 1 , further comprising, in a second closed loop stage, generating a second set of compressed components for the current frame by compressing the first set of uncompressed components and generating a second set of features based on the first compressed set of compressed components, the second set of compressed components, the amplitude components from the past frame, and the phase components from the past frame. 
     
     
       3. The method of  claim 2 , wherein the checking further comprises checking the second set of features as part of the closed loop re-decision. 
     
     
       4. The method of  claim 1 , wherein the final encoding decision indicates an encoding mode. 
     
     
       5. The method of  claim 4 , wherein the encoding mode changes from PPP to CELP. 
     
     
       6. The method of  claim 4 , wherein the final encoding decision indicates an encoding rate. 
     
     
       7. The method of  claim 6 , wherein the encoding rate changes from quarter to full. 
     
     
       8. The method of  claim 6 , wherein the encoding rate changes from half to full. 
     
     
       9. The method of  claim 1 , wherein the generating the first set of features further comprises calculating at least one energy ratio, at least one signal to noise-ratio and calculating at least one correlation. 
     
     
       10. The method of  claim 9 , wherein the at least one energy ratio further comprises at least one energy ratio calculated in the time domain, frequency domain, or perceptually weighted domain. 
     
     
       11. The method of  claim 10 , wherein the at least one energy ratio is calculated from a derived signal from the speech signal. 
     
     
       12. The method of  claim 9 , wherein the derived signal is a residual signal. 
     
     
       13. The method of  claim 1 , wherein the amplitude components from the past frame are compressed and the phase components from the past frame are compressed. 
     
     
       14. The method of  claim 1 , wherein the amplitude components from the past frame are uncompressed and the phase components from the past frame are uncompressed. 
     
     
       15. The method of  claim 1 , wherein the amplitude components from the past frame are compressed and the phase components from the past frame are uncompressed. 
     
     
       16. The method of  claim 1 , wherein the amplitude components from the past frame are uncompressed and the phase components from the past frame are compressed. 
     
     
       17. The method of  claim 1 , wherein the representing a speech signal by amplitude and phase components comprises calculating a fourier series and extracting real and imaginary parts of the fourier series to calculate the amplitude components and the phase components. 
     
     
       18. The method of  claim 1 , wherein checking the first set features further comprises checking at least one feature with at least one or more rules in a set of decision rules. 
     
     
       19. A non-transitory computer-readable storage medium comprising instructions that, when executed by one or more processors, cause the one or more processors to:
 represent a speech signal by amplitude components and phase components for a current frame and a past frame; 
 in a first closed loop stage generate a first set of compressed components and a first set of uncompressed components for a current frame; 
 retrieve the amplitude components and the phase components from the past frame; 
 generate a first set of features based on the first set of compressed components, the first set of uncompressed components, the amplitude components from the past frame, and the phase components from the past frame; 
 check the first set of features as part of a closed loop re-decision; 
 determine a final encoding decision based on the checking; and 
 encode the speech signal based on the final encoding decision. 
 
     
     
       20. The non-transitory computer-readable storage medium of  claim 19 , further comprising instructions that, when executed by the one or more processors, cause the one or more processors to:
 in a second closed loop stage, generate a second set of compressed components for the current frame by compressing the first set of uncompressed components; and 
 generate a second set of features based on the first compressed set of compressed components, the second set of compressed components, the amplitude components from the past frame, and the phase components from the past frame. 
 
     
     
       21. The non-transitory computer-readable storage medium of  claim 20 , wherein the final encoding decision is an encoding mode. 
     
     
       22. The non-transitory computer-readable storage medium of  claim 21 , wherein the encoding mode changes from PPP to CELP. 
     
     
       23. The non-transitory computer-readable storage medium of  claim 20 , wherein generating the second set of features further comprises calculating at least one energy ratio, calculating at least one signal to noise-ratio, and calculating at least one correlation. 
     
     
       24. The non-transitory computer-readable storage medium of  claim 19 , wherein the final encoding decision is an encoding rate. 
     
     
       25. The non-transitory computer-readable storage medium of  claim 24 , wherein the encoding rate changes from quarter to full. 
     
     
       26. The non-transitory computer-readable storage medium of  claim 24 , wherein the encoding rate changes from half to full. 
     
     
       27. The non-transitory computer-readable storage medium of  claim 19 , wherein generating the first set of features further comprises calculating at least one energy ratio, calculating at least one signal to noise-ratio, and calculating at least one correlation. 
     
     
       28. An apparatus comprising an array of logic elements configured to perform a method according to any of  claims 1  to  18 . 
     
     
       29. A mobile device comprising:
 circuitry configured to interact with a network for radio-frequency communications; and 
 a non-transitory computer-readable storage medium comprising instructions that, when executed by one or more processors, cause the one or more processors to:
 represent a speech signal by amplitude components and phase components for a current frame and a past frame; 
 in a first closed loop stage, generate a first set of compressed components and a first set of uncompressed components for a current frame; 
 retrieve the amplitude components and the phase components from the past frame; 
 generate a first set of features based on the first set of compressed components, the first set of uncompressed components, the amplitude components from the past frame, and the phase components from the past frame; 
 check the first set of features as part of a closed loop re-decision; 
 determine a final encoding decision based on the checking, wherein the final encoding decision identifies an encoding rate, wherein the encoding rate changes from half to full; and 
 encode the speech signal based on the final encoding decision. 
 
 
     
     
       30. A device comprising:
 a processing device, and a memory 
 means for representing a speech signal by amplitude components and phase components for a current frame and a past frame; 
 in a first closed loop stage, means for generating a first set of compressed components and a first set of uncompressed components for a current frame; 
 means for retrieving the amplitude components and the phase components from the past frame; 
 means for generating a first set of features based on the first set of compressed components, the first set of uncompressed components, the amplitude components from the past frame, and the phase components from the past frame; 
 means for checking the first set of features as part of a closed loop re-decision; and 
 means for determining a final encoding decision based on the checking. 
 
     
     
       31. The device of  claim 30 , further comprising, in a second closed loop stage, means for generating a second set of compressed components for the current frame by compressing the first set of uncompressed components and generating a second set of features based on the first compressed set of compressed components, the second set of compressed components, the amplitude components from the past frame, and the phase components from the past frame. 
     
     
       32. The device of  claim 31 , wherein generating the second set of features further comprises calculating at least one energy ratio, calculating at least one signal to noise-ratio, and calculating at least one correlation. 
     
     
       33. The device of  claim 31 , wherein the means for checking the second set of features further comprises means for checking at least one feature with at least one or more rules in a set of decision rules. 
     
     
       34. The device of  claim 30 , wherein the final encoding decision indicates an encoding mode. 
     
     
       35. The device of  claim 30 , wherein the encoding mode changes from PPP to CELP. 
     
     
       36. The device  claim 30 , wherein the final encoding decision indicates an encoding rate. 
     
     
       37. The device  claim 36 , wherein the encoding rate changes from quarter to full. 
     
     
       38. The device of  claim 36 , wherein the encoding rate changes from half to full. 
     
     
       39. The device of  claim 30 , wherein the means for generating the first set of features further comprises calculating at least one energy ratio, calculating at least one signal to noise-ratio, and calculating at least one correlation. 
     
     
       40. The device of  claim 30 , wherein the device is a mobile device comprising circuitry configured to interact with a network for cellular radio-frequency communications. 
     
     
       41. The device of  claim 30 , wherein the amplitude components from the past frame are compressed and the phase components from the past frame are compressed. 
     
     
       42. The device of  claim 30 , wherein the amplitude components from the past frame are uncompressed and the phase components from the past frame are uncompressed. 
     
     
       43. The device of  claim 30 , wherein the amplitude components from the past frame are compressed and the phase components from the past frame are uncompressed. 
     
     
       44. The device of  claim 30 , wherein the amplitude components from the past frame are uncompressed and the phase components from the past frame are compressed. 
     
     
       45. The device of  claim 30 , wherein the means for representing a speech signal by amplitude and phase components comprises means for calculating a fourier series and means for extracting real and imaginary parts of the fourier series to calculate the amplitude components and the phase components. 
     
     
       46. The device of  claim 30 , wherein the means for checking the first set of features further comprises means for checking at least one feature with at least one or more rules in a set of decision rules.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.