US11227622B2ActiveUtilityPatentIndex 49
Speech communication system and method for improving speech intelligibility
Assignee: BEIJING DIDI INFINITY TECHNOLOGY & DEV CO LTDPriority: Dec 6, 2018Filed: Dec 6, 2018Granted: Jan 18, 2022
Est. expiryDec 6, 2038(~12.4 yrs left)· nominal 20-yr term from priority
G10L 21/0364G10L 2021/02082G10L 21/0232G10L 15/08
49
PatentIndex Score
0
Cited by
27
References
20
Claims
Abstract
A speech communication system for improving speech intelligibility may comprise one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to perform: determining a cutoff frequency based on an estimation of a spectrum of noise, wherein the cutoff frequency defines a noise dominant region of frequency; lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech increases by the cutoff frequency; and applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. A speech communication system for improving speech intelligibility, comprising:
one or more processors; and
a memory storing instructions that, when executed by the one or more processors, cause the system to perform:
determining a cutoff frequency based on a spectrum of noise in a sound signal and Signal-Noise-Ratios (SNRs) of a plurality of sub-bands of frequency of the sound signal, wherein the cutoff frequency defines a noise dominant region of frequency, and the determining comprises, from a lowest sub-band of frequency to an upper frequency limit of the spectrum of the noise, in each of the plurality of sub-bands of frequency:
determining whether an SNR of the sub-band is higher than a threshold, and in response to determining that the SNR is higher than the threshold, determining a previous sub-band of frequency as the cutoff frequency;
lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech is lifted by the cutoff frequency; and
applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.
2. The system according to claim 1 , wherein
the sound signal is received through a microphone of the system.
3. The system of claim 1 , wherein the SNRs are instantaneous SNRs, and wherein the instantaneous SNRs are smoothed over frames of the sound signal and adjacent sub-bands of frequency.
4. The system of claim 3 , wherein determining the cutoff frequency based on the spectrum of the noise and the SNRs further comprises:
in response to determining that the SNR is not higher than the threshold, determining whether a power of the noise below the sub-band of frequency is greater than a threshold percentage of a total power of the noise; and
in response to determining that the power of the noise below the sub-band of frequency is greater than the threshold percentage of the total power of the noise, setting the current sub-band of frequency as the cutoff frequency.
5. The system of claim 1 , wherein lifting a spectrum of a speech above the noise dominant region of frequency further comprises:
classifying a frame of the speech into one of the categories of vowel and consonant; and
if the frame of the speech is classified as a vowel, lifting the spectrum of the frame of the speech to a sub-band of frequency based on the cutoff frequency.
6. The system of claim 1 , wherein the instructions, when executed by the one or more processors, further cause the system to perform:
applying equalization to the lifted spectrum of the speech.
7. The system of claim 6 , wherein applying the equalization on the lifted speech further comprises:
transforming the spectrum of the speech from a linear frequency domain to critical bands of frequency domain, wherein a critical band of frequency is a band of frequency within which a first tone interferes with perception of a second tone; and
performing equalization on the speech in the critical bands of frequency.
8. The system of claim 6 , wherein applying the equalization on the lifted speech further comprises:
adjusting the lifted spectrum of the speech based on loudness of the speech.
9. The system of claim 6 , wherein the instructions, when executed by the one or more processors, further cause the system to perform:
applying spectra smoothing to the speech.
10. The system of claim 1 , wherein a sparsity of the adaptive filter increases if the volume of noise increases.
11. A computer-implemented method for speech communication, comprising:
determining a cutoff frequency based on a spectrum of noise in a sound signal and Signal-Noise-Ratios (SNRs) of a plurality of sub-bands of frequency of the sound signal, wherein the cutoff frequency defines a noise dominant region of frequency, and the determining comprises, from a lowest sub-band of frequency to an upper frequency limit of the spectrum of the noise, in each of the plurality of sub-bands of frequency:
determining whether an SNR of the sub-band is higher than a threshold, and
in response to determining that the SNR is higher than the threshold, determining a previous sub-band of frequency as the cutoff frequency;
lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech is lifted by the cutoff frequency; and
applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.
12. The method of claim 11 , wherein
the sound signal is received through a microphone of the system.
13. The method of claim 11 , wherein the SNRs are instantaneous SNRs, and wherein the instantaneous SNRs are smoothed over frames of the sound signal and adjacent sub-bands of frequency.
14. The method of claim 13 , wherein determining the cutoff frequency based on the spectrum of the noise and the SNRs further comprises:
in response to determining that the SNR is not higher than the threshold, determining whether a power of the noise below the sub-band of frequency is greater than a threshold percentage of a total power of the noise; and
in response to determining that the power of the noise below the sub-band of frequency is greater than the threshold percentage of the total power of the noise, setting the current sub-band of frequency as the cutoff frequency.
15. The method of claim 11 , wherein lifting the spectrum of the speech that is within the noise dominant region of frequency to a sub-band of frequency higher than the cutoff frequency further comprises:
classifying a frame of the speech into one of the categories of vowel and consonant; and
if the frame of the speech is classified as a vowel, lifting the spectrum of the frame of the speech that is within the noise dominant region of frequency to the sub-band of frequency higher than the cutoff frequency.
16. The method of claim 11 , further comprising:
applying equalization on the lifted spectrum of the speech.
17. The method of claim 16 , wherein applying the equalization on the lifted speech further comprises:
transforming the spectrum of the speech from a linear frequency domain to critical bands of frequency domain, wherein a critical band of frequency is a band of frequency within which a first tone interferes with perception of a first tone; and
performing equalization on the speech in the critical bands of frequency.
18. The method of claim 16 , wherein applying the equalization on the lifted speech further comprises:
adjusting the lifted spectrum of the speech based on loudness of the speech.
19. The method of claim 16 , further comprising: applying spectra smoothing to the speech.
20. A non-transitory computer-readable storage medium coupled to one or more processors and comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform a method for speech communication, the method comprising:
determining a cutoff frequency based on a spectrum of noise in a sound signal and Signal-Noise-Ratios (SNRs) of a plurality of sub-bands of frequency of the sound signal, wherein the cutoff frequency defines a noise dominant region of frequency, and the determining comprises, from a lowest sub-band of frequency to an upper frequency limit of the spectrum of the noise, in each of the plurality of sub-bands of frequency:
determining whether an SNR of the sub-band is higher than a threshold, and
in response to determining that the SNR is higher than the threshold, determining a previous sub-band of frequency as the cutoff frequency;
lifting a spectrum of a speech above the noise dominant region of frequency, wherein a frequency range of the spectrum of the speech is lifted by the cutoff frequency; and
applying an adaptive filter to the speech to achieve echo cancelation, wherein the adaptive filter is controlled by a volume of the noise.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.