US11227622B2ActiveUtilityPatentIndex 49

Speech communication system and method for improving speech intelligibility

Assignee: BEIJING DIDI INFINITY TECHNOLOGY & DEV CO LTDPriority: Dec 6, 2018Filed: Dec 6, 2018Granted: Jan 18, 2022

Est. expiryDec 6, 2038(~12.4 yrs left)· nominal 20-yr term from priority

Inventors:ZHANG YI SONG HUI SHA YONGTAO QIN SI

G10L 21/0364G10L 2021/02082G10L 21/0232G10L 15/08

PatentIndex Score

Cited by

References

Claims

Abstract

A speech communication system for improving speech intelligibility may comprise one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: determining a cutoff frequency based on an estimation of a spectrum of noise, wherein the cutoff frequency defines a noise dominant region of frequency; lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech increases by the cutoff frequency; and applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.

Claims

exact text as granted — not AI-modified

What is claimed is: 
     
       1. A speech communication system for improving speech intelligibility, comprising:
 one or more processors; and 
 a memory storing instructions that, when executed by the one or more processors, cause the system to perform: 
 determining a cutoff frequency based on a spectrum of noise in a sound signal and Signal-Noise-Ratios (SNRs) of a plurality of sub-bands of frequency of the sound signal, wherein the cutoff frequency defines a noise dominant region of frequency, and the determining comprises, from a lowest sub-band of frequency to an upper frequency limit of the spectrum of the noise, in each of the plurality of sub-bands of frequency:
 determining whether an SNR of the sub-band is higher than a threshold, and in response to determining that the SNR is higher than the threshold, determining a previous sub-band of frequency as the cutoff frequency; 
 
 lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech is lifted by the cutoff frequency; and 
 applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise. 
 
     
     
       2. The system according to  claim 1 , wherein
 the sound signal is received through a microphone of the system. 
 
     
     
       3. The system of  claim 1 , wherein the SNRs are instantaneous SNRs, and wherein the instantaneous SNRs are smoothed over frames of the sound signal and adjacent sub-bands of frequency. 
     
     
       4. The system of  claim 3 , wherein determining the cutoff frequency based on the spectrum of the noise and the SNRs further comprises:
 in response to determining that the SNR is not higher than the threshold, determining whether a power of the noise below the sub-band of frequency is greater than a threshold percentage of a total power of the noise; and 
 in response to determining that the power of the noise below the sub-band of frequency is greater than the threshold percentage of the total power of the noise, setting the current sub-band of frequency as the cutoff frequency. 
 
     
     
       5. The system of  claim 1 , wherein lifting a spectrum of a speech above the noise dominant region of frequency further comprises:
 classifying a frame of the speech into one of the categories of vowel and consonant; and 
 if the frame of the speech is classified as a vowel, lifting the spectrum of the frame of the speech to a sub-band of frequency based on the cutoff frequency. 
 
     
     
       6. The system of  claim 1 , wherein the instructions, when executed by the one or more processors, further cause the system to perform:
 applying equalization to the lifted spectrum of the speech. 
 
     
     
       7. The system of  claim 6 , wherein applying the equalization on the lifted speech further comprises:
 transforming the spectrum of the speech from a linear frequency domain to critical bands of frequency domain, wherein a critical band of frequency is a band of frequency within which a first tone interferes with perception of a second tone; and 
 performing equalization on the speech in the critical bands of frequency. 
 
     
     
       8. The system of  claim 6 , wherein applying the equalization on the lifted speech further comprises:
 adjusting the lifted spectrum of the speech based on loudness of the speech. 
 
     
     
       9. The system of  claim 6 , wherein the instructions, when executed by the one or more processors, further cause the system to perform:
 applying spectra smoothing to the speech. 
 
     
     
       10. The system of  claim 1 , wherein a sparsity of the adaptive filter increases if the volume of noise increases. 
     
     
       11. A computer-implemented method for speech communication, comprising:
 determining a cutoff frequency based on a spectrum of noise in a sound signal and Signal-Noise-Ratios (SNRs) of a plurality of sub-bands of frequency of the sound signal, wherein the cutoff frequency defines a noise dominant region of frequency, and the determining comprises, from a lowest sub-band of frequency to an upper frequency limit of the spectrum of the noise, in each of the plurality of sub-bands of frequency:
 determining whether an SNR of the sub-band is higher than a threshold, and 
 in response to determining that the SNR is higher than the threshold, determining a previous sub-band of frequency as the cutoff frequency; 
 
 lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech is lifted by the cutoff frequency; and 
 applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise. 
 
     
     
       12. The method of  claim 11 , wherein
 the sound signal is received through a microphone of the system. 
 
     
     
       13. The method of  claim 11 , wherein the SNRs are instantaneous SNRs, and wherein the instantaneous SNRs are smoothed over frames of the sound signal and adjacent sub-bands of frequency. 
     
     
       14. The method of  claim 13 , wherein determining the cutoff frequency based on the spectrum of the noise and the SNRs further comprises:
 in response to determining that the SNR is not higher than the threshold, determining whether a power of the noise below the sub-band of frequency is greater than a threshold percentage of a total power of the noise; and 
 in response to determining that the power of the noise below the sub-band of frequency is greater than the threshold percentage of the total power of the noise, setting the current sub-band of frequency as the cutoff frequency. 
 
     
     
       15. The method of  claim 11 , wherein lifting the spectrum of the speech that is within the noise dominant region of frequency to a sub-band of frequency higher than the cutoff frequency further comprises:
 classifying a frame of the speech into one of the categories of vowel and consonant; and 
 if the frame of the speech is classified as a vowel, lifting the spectrum of the frame of the speech that is within the noise dominant region of frequency to the sub-band of frequency higher than the cutoff frequency. 
 
     
     
       16. The method of  claim 11 , further comprising:
 applying equalization on the lifted spectrum of the speech. 
 
     
     
       17. The method of  claim 16 , wherein applying the equalization on the lifted speech further comprises:
 transforming the spectrum of the speech from a linear frequency domain to critical bands of frequency domain, wherein a critical band of frequency is a band of frequency within which a first tone interferes with perception of a first tone; and 
 performing equalization on the speech in the critical bands of frequency. 
 
     
     
       18. The method of  claim 16 , wherein applying the equalization on the lifted speech further comprises:
 adjusting the lifted spectrum of the speech based on loudness of the speech. 
 
     
     
       19. The method of  claim 16 , further comprising: applying spectra smoothing to the speech. 
     
     
       20. A non-transitory computer-readable storage medium coupled to one or more processors and comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform a method for speech communication, the method comprising:
 determining a cutoff frequency based on a spectrum of noise in a sound signal and Signal-Noise-Ratios (SNRs) of a plurality of sub-bands of frequency of the sound signal, wherein the cutoff frequency defines a noise dominant region of frequency, and the determining comprises, from a lowest sub-band of frequency to an upper frequency limit of the spectrum of the noise, in each of the plurality of sub-bands of frequency:
 determining whether an SNR of the sub-band is higher than a threshold, and 
 in response to determining that the SNR is higher than the threshold, determining a previous sub-band of frequency as the cutoff frequency; 
 
 lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech is lifted by the cutoff frequency; and 
 applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.