US11538451B2ActiveUtilityPatentIndex 84
Multi-channel acoustic echo cancellation

Assignee: SONOS INCPriority: Sep 28, 2017Filed: Jan 11, 2021Granted: Dec 27, 2022
Est. expirySep 28, 2037(~11.2 yrs left)· nominal 20-yr term from priority
Inventors:SERESHKI SAEED BAGHERI KADRI ROMI
H04B 17/336G10K 2210/3012G10L 21/0208H04M 9/082H04R 27/00G06F 3/165H04R 2227/005G10K 2210/505G10L 2021/02082H04L 65/75G10K 11/178
PatentIndex Score
Cited by
1,723
References
Claims
Abstract

A playback device is configured to receive, via a network interface, a source stream of audio including first and second channel streams of audio, and to produce, via respective first and second speaker drivers, a first channel audio output and a second channel audio output. The playback device is also configured to receive, via one or more microphones, a captured stream of audio including first and second portions corresponding to the respective first and second channel audio outputs. The playback device is also configured to combine at least the first channel stream of audio and the second channel stream of audio into a compound audio signal and perform acoustic echo cancellation on the compound audio signal and thereby produce an acoustic echo cancellation output, then to apply the acoustic echo cancellation output to the captured stream of audio and thereby increase a signal-to noise ratio of the captured stream of audio.
Claims

exact text as granted — not AI-modified
The invention claimed is: 
     
       1. A playback device comprising:
 a first speaker driver; 
 a second speaker driver; 
 at least one processor; 
 a network interface; 
 a non-transitory computer-readable medium; and 
 program instructions stored on the non-transitory computer-readable medium that are executable by the at least one processor such that the playback device is configured to:
 receive, via the network interface, a source stream of audio comprising source audio content to be played back by the playback device, wherein the source audio content comprises a first channel stream of audio and a second channel stream of audio; 
 produce a first channel audio output by playing back, via the first speaker driver, the first channel stream of audio; 
 produce a second channel audio output by playing back, via the second speaker driver, the second channel stream of audio; 
 receive, via one or more microphones, a captured stream of audio comprising (i) a first portion corresponding to the first channel audio output and (ii) a second portion corresponding to the second channel audio output, wherein the captured stream of audio has a first signal-to-noise ratio; 
 determine a set of signal components from at least one of the first channel stream of audio or the second channel stream of audio; 
 select a subset of the set of signal components; 
 perform acoustic echo cancellation on the subset of the set of signal components and thereby produce an acoustic echo cancellation output; and 
 apply the acoustic echo cancellation output to the captured stream of audio and thereby increase the signal-to noise ratio of the captured stream of audio from the first signal-to-noise ratio to a second signal-to-noise ratio that is greater than the first signal-to-noise ratio. 
 
 
     
     
       2. The playback device of  claim 1 , wherein the captured stream of audio comprises a third portion corresponding to a vocal command issued by a user, and wherein application of the acoustic echo cancellation output to the captured stream of audio results in the first portion and second portion being eliminated or minimized in the captured stream of audio. 
     
     
       3. The playback device of  claim 1 , further comprising program instructions stored on the non-transitory computer-readable medium that are executable by the at least one processor such that the playback device is configured to:
 detect a trigger to perform the acoustic echo cancellation on the subset of the set of signal components. 
 
     
     
       4. The playback device of  claim 3 , wherein the program instructions that are executable by the at least one processor such that the playback device is configured to detect the trigger comprise program instructions that are executable by the at least one processor such that the playback device is configured to detect that (a) a playback function is initiated by the playback device or (b) an unmute command is received by the playback device after the playback function is initiated. 
     
     
       5. The playback device of  claim 1 , further comprising program instructions stored on the non-transitory computer-readable medium that are executable by the at least one processor such that the playback device is configured to:
 combine at least the first channel stream of audio and the second channel stream of audio into a compound audio signal, 
 wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the set of signal components from at least one of the first channel stream of audio or the second channel stream of audio comprise program instructions that are executable by the at least one processor such that the playback device is configured to determine the set of signal components from the compound audio signal. 
 
     
     
       6. The playback device of  claim 5 ,
 wherein the source audio content further comprises a third channel stream of audio, 
 wherein the playback device further comprises 
 (i) a third speaker driver and (ii) 
 program instructions stored on the non-transitory computer-readable medium that are executable by the at least one processor such that the playback device is configured to_ 
 produce a third channel audio output by playing back, via the third speaker driver, the third channel stream of audio, 
 wherein the captured stream of audio further comprises a third portion corresponding to the third channel audio output, and 
 wherein the program instructions that are executable by the at least one processor such that the playback device is configured to combine at least the first channel stream of audio and the second channel stream of audio into the compound audio signal comprise program instructions that are executable by the at least one processor such that the playback device is configured to combine the first channel stream of audio, the second channel stream of audio, and the third channel stream of audio into the compound audio signal. 
 
     
     
       7. The playback device of  claim 5 , wherein the program instructions that are executable by the at least one processor such that the playback device is configured to combine at least the first channel stream of audio and the second channel stream of audio into the compound audio signal comprise program instructions that are executable by the at least one processor such that the playback device is configured to sum at least the first channel stream of audio and the second channel stream of audio into the compound audio signal. 
     
     
       8. The playback device of  claim 1 , further comprising program instructions stored on the non-transitory computer-readable medium that are executable by the at least one processor such that the playback device is configured to:
 before performing the acoustic echo cancellation on the subset of the set of signal components, transform the captured stream of audio and at least one of the first channel stream of audio or the second channel stream of audio from a time domain into a frequency domain. 
 
     
     
       9. The playback device of  claim 1 , wherein the one or more microphones are located on the playback device. 
     
     
       10. A method of operating a playback device having a first speaker driver and at least a second speaker driver, the method comprising:
 receiving, via a network interface of the playback device, a source stream of audio comprising source audio content to be played back by the playback device, wherein the source audio content comprises a first channel stream of audio and a second channel stream of audio; 
 producing a first channel audio output by playing back, via the first speaker driver, the first channel stream of audio; 
 producing a second channel audio output by playing back, via the second speaker driver, the second channel stream of audio; 
 receiving, via one or more microphones, a captured stream of audio comprising (i) a first portion corresponding to the first channel audio output and (ii) a second portion corresponding to the second channel audio output, wherein the captured stream of audio has a first signal-to-noise ratio; 
 determining a set of signal components from at least one of the first channel stream of audio or the second channel stream of audio; 
 selecting a subset of the set of signal components; 
 performing acoustic echo cancellation on the subset of the set of signal components and thereby producing an acoustic echo cancellation output; and 
 applying the acoustic echo cancellation output to the captured stream of audio and thereby increasing the signal-to noise ratio of the captured stream of audio from the first signal-to-noise ratio to a second signal-to-noise ratio that is greater than the first signal-to-noise ratio. 
 
     
     
       11. The method of  claim 10 , wherein the captured stream of audio comprises a third portion corresponding to a vocal command issued by a user, and wherein applying the acoustic echo cancellation output to the captured stream of audio results in the first portion and second portion being eliminated or minimized in the captured stream of audio. 
     
     
       12. The method of  claim 10 , further comprising:
 combining at least the first channel stream of audio and the second channel stream of audio into a compound audio signal, 
 wherein determining the set of signal components from at least one of the first channel stream of audio or the second channel stream of audio comprises determining the set of signal components from the compound audio signal. 
 
     
     
       13. The method of  claim 12 ,
 wherein the source audio content comprises the first channel stream of audio, the second channel stream of audio, and a third channel stream of audio, 
 wherein the playback device comprises a third speaker driver, 
 wherein the method further comprises producing a third channel audio output by playing back, via the third speaker driver, the third channel stream of audio, 
 wherein the captured stream of audio further comprises a third portion corresponding to the third channel audio output, and 
 wherein combining at least the first channel stream of audio and the second channel stream of audio into the compound audio signal comprises combining the first channel stream of audio, the second channel stream of audio, and the third channel stream of audio into the compound audio signal. 
 
     
     
       14. The method of  claim 12 , wherein combining at least the first channel stream of audio and the second channel stream of audio into the compound audio signal comprises summing at least the first channel stream of audio and the second channel stream of audio into the compound audio signal. 
     
     
       15. The method of  claim 10 , further comprising:
 before performing the acoustic echo cancellation on the subset of the set of signal components, transforming the captured stream of audio and at least one of the first channel stream of audio or the second channel stream of audio from a time domain into a frequency domain. 
 
     
     
       16. A non-transitory computer-readable medium, wherein the non-transitory computer-readable medium is provisioned with program instructions that, when executed by at least one processor, cause a playback device to:
 receive, via a network interface of the playback device, a source stream of audio comprising source audio content to be played back by the playback device, wherein the source audio content comprises a first channel stream of audio and a second channel stream of audio; 
 produce a first channel audio output by playing back, via a first speaker driver of the playback device, the first channel stream of audio; 
 produce a second channel audio output by playing back, via a second speaker driver of the playback device, the second channel stream of audio; 
 receive, via one or more microphones, a captured stream of audio comprising (i) a first portion corresponding to the first channel audio output and (ii) a second portion corresponding to the second channel audio output, wherein the captured stream of audio has a first signal-to-noise ratio; 
 determine a set of signal components from at least one of the first channel stream of audio or the second channel stream of audio; 
 select a subset of the set of signal components; 
 perform acoustic echo cancellation on the subset of the set of signal components and thereby produce an acoustic echo cancellation output; and 
 apply the acoustic echo cancellation output to the captured stream of audio and thereby increase the signal-to noise ratio of the captured stream of audio from the first signal-to-noise ratio to a second signal-to-noise ratio that is greater than the first signal-to-noise ratio. 
 
     
     
       17. The non-transitory computer-readable medium of  claim 16 , wherein the non-transitory computer-readable medium is also provisioned with program instructions that, when executed by at least one processor, cause the playback device to:
 combine at least the first channel stream of audio and the second channel stream of audio into a compound audio signal, 
 wherein the program instructions that, when executed by at least one processor, cause the playback device to determine the set of signal components from at least one of the first channel stream of audio or the second channel stream of audio comprise program instructions that, when executed by at least one processor, cause the playback device to determine the set of signal components from the compound audio signal. 
 
     
     
       18. The non-transitory computer-readable medium of  claim 17 ,
 wherein the source audio content comprises the first channel stream of audio, the second channel stream of audio, and a third channel stream of audio, 
 wherein the non-transitory computer-readable medium is also provisioned with program instructions that, when executed by at least one processor, cause the playback device to 
 produce a third channel audio output by playing back, via a third speaker driver of the playback device, the third channel stream of audio, 
 wherein the captured stream of audio further comprises a third portion corresponding to the third channel audio output, and 
 wherein the program instructions that, when executed by at least one processor, cause the playback device to combine at least the first channel stream of audio and the second channel stream of audio into the compound audio signal comprise program instructions that, when executed by at least one processor, cause the playback device to combine the first channel stream of audio, the second channel stream of audio, and the third channel stream of audio into the compound audio signal. 
 
     
     
       19. The non-transitory computer-readable medium of  claim 17 , wherein the program instructions that, when executed by at least one processor, cause the playback device to combine at least the first channel stream of audio and the second channel stream of audio into the compound audio signal comprise program instructions that, when executed by at least one processor, cause the playback device to sum at least the first channel stream of audio and the second channel stream of audio into the compound audio signal. 
     
     
       20. The non-transitory computer-readable medium of  claim 16 , wherein the non-transitory computer-readable medium is also provisioned with program instructions that, when executed by at least one processor, cause the playback device to:
 before performing the acoustic echo cancellation on the subset of the set of signal components, transform the captured stream of audio and at least one of the first channel stream of audio or the second channel stream of audio from a time domain into a frequency domain.
Cited by (0)

No later patents cite this yet.
References (0)

No backward citations on record.