US10529359B2ActiveUtilityPatentIndex 49
Conversation detection
Assignee: MICROSOFT TECHNOLOGY LICENSING LLCPriority: Apr 17, 2014Filed: Apr 17, 2014Granted: Jan 7, 2020
Est. expiryApr 17, 2034(~7.8 yrs left)· nominal 20-yr term from priority
Inventors:TOMLIN ARTHUR CHARLESPAULOVICH JONATHANKEIBLER EVAN MICHAELSCOTT JASONBROWN CAMERONPLUMB JONATHAN WILLIAM
G10L 25/48G10L 25/78G02B 27/017G10L 2021/02166
49
PatentIndex Score
0
Cited by
82
References
20
Claims
Abstract
Various embodiments relating to detecting a conversation during presentation of content on a computing device, and taking one or more actions in response to detecting the conversation, are disclosed. In one example, an audio data stream is received from one or more sensors, a conversation between a first user and a second user is detected based on the audio data stream, and presentation of a digital content item is modified by the computing device in response to detecting the conversation.
Claims
exact text as granted — not AI-modifiedThe invention claimed is:
1. A method for detecting a conversation between at least first and second users where the first user is receiving presentation of a digital content item, comprising:
receiving an audio data stream from one or more sensors;
automatically detecting a conversation between the first user and the second user based on the audio data stream, the audio data stream on which the detected conversation is based being independent of the presentation of the digital content item, wherein automatically detecting the conversation includes determining whether alternating segments of speech between the first user and the second user alternate between different source locations and whether the alternating segments of speech are within a threshold period of time; and
automatically modifying the presentation of the digital content item to the first user in response to detecting the conversation.
2. The method of claim 1 , wherein the one or more sensors include a microphone array comprising a plurality of microphones, and the method further comprising determining a source location of a segment of human speech by applying a beamforming spatial filter to a plurality of audio samples of the microphone array to estimate the different source locations.
3. The method of claim 1 , wherein automatically detecting the conversation between the first user and the second user further includes determining that the alternating segments of speech of the first user and the second user occur within a designated cadence range.
4. The method of claim 1 , further comprising:
determining that one or more segments of human speech are provided by an electronic audio device, and
ignoring the one or more segments of human speech provided by the electronic audio device when determining that the alternating segments of speech alternate between the different source locations.
5. The method of claim 1 , wherein the digital content item includes one or more of an audio content item or a video content item, and wherein automatically modifying the presentation of the digital content item includes pausing presentation of the audio content item or the video content item.
6. The method of claim 1 , wherein the digital content item includes an audio content item, and wherein automatically modifying the presentation of the digital content item includes lowering a volume of the audio content item.
7. The method of claim 1 , wherein the digital content item includes one or more visual content items, and wherein automatically modifying the presentation of the digital content item includes one or more of hiding the one or more visual content items from view on a display, moving the one or more visual content items to a different position on the display, changing a translucency of the one or more visual content items, or changing a size of the one or more visual content items on the display.
8. The method of claim 1 , wherein the first user and the second user are within physical proximity of one another.
9. The method of claim 1 , wherein automatically detecting the conversation further includes estimating the source location of the first user and the source location of the second user based on a weighted function of a perceived loudness of the first user and the second user.
10. The method of claim 1 , further comprising:
detecting an end of the conversation between the first user and the second user; and
upon detecting the end of the conversation, returning the presentation of the digital content item to a state of the digital content item that existed before the conversation was detected.
11. A hardware storage machine holding instructions executable by a logic machine to:
receive an audio data stream from one or more sensors;
detect a conversation between a first user and a second user based on the audio data stream and as a function of the sequence of audio source locations and time of said sequence of audio source locations, the audio data stream on which the detected conversation is based being independent of a presentation of a digital content item, wherein detecting the conversation includes determining whether alternating segments of speech between the first user and the second user alternate between different source locations and whether the alternating segments of speech are within a threshold period of time; and
modify the presentation of the digital content item in response to detecting the conversation.
12. The hardware storage machine of claim 11 , wherein detecting the conversation between the first user and the second user further includes determining whether the alternating segments of speech occur within a designated cadence range.
13. The hardware storage machine of claim 11 , further holding instruction executable by the logic machine to
determine that one or more segments of human speech are provided by an electronic audio device, and
ignore the one or more segments of human speech provided by the electronic audio device when determining that the alternating segments of speech alternate between different source locations.
14. The hardware storage machine of claim 11 , wherein the digital content item includes one or more of an audio content item or a video content item, and wherein the instructions are executable to modify the presentation of the digital content item by pausing presentation of the one or more of the audio content item or video content item.
15. The hardware storage machine of claim 11 , wherein the digital content item includes an audio content item, and wherein the instructions are executable to modify the presentation of the digital content item by lowering a volume of the audio content item.
16. The hardware storage machine of claim 11 , wherein the digital content item includes one or more visual content items, and wherein the instructions are executable to modify the presentation of the digital content item by one or more of hiding the one or more visual content items from view on a display, moving the one or more visual content items to a different position on the display, changing a translucency of the one or more visual content items, or changing a size of the one or more visual content items on the display.
17. A head-mounted display device comprising:
one or more audio sensors configured to capture an audio data stream;
an optical sensor configured to capture an image of a scene;
a see-through display configured to display a digital content item;
a logic machine; and
a storage machine holding instructions executable by the logic machine to
while the digital content item is being displayed via the see-through display, receive the stream of audio data from the one or more audio sensors,
detect human speech segments alternating between a wearer of the head-mounted display device and an other person based on the audio data stream,
receive the image of the scene including the other person from the optical sensor,
confirm that the other person is speaking to the wearer of the head-mounted display device based on the image,
in response to confirming that the other person is speaking to the wearer of the head-mounted display device, detect a conversation between the wearer of the head-mounted display device and the other person based on the audio data stream and the image, the audio data stream on which the detected conversation is based being independent of a presentation of the digital content item, wherein to detect the conversation the instructions are further executable to determine whether the human speech segments alternating between the wearer of the head-mounted display device and the other person alternate between different source locations and whether the human speech segments alternating between the wearer of the head-mounted display device and the other person are within a threshold period of time, and
modify the presentation of the digital content item via the see-through display in response to detecting the conversation.
18. The head-mounted display device of claim 17 , wherein the digital content item includes one or more of an audio content item or a video content item, and wherein the instructions are executable to modify the presentation of the digital content item by pausing presentation of the audio content item or the video content item.
19. The head-mounted display device of claim 17 , wherein to detect the conversation the instructions are further executable to determine that human speech segments are spoken by the wearer of the head-mounted display device before and after a human speech segment spoken by the other person, or that human speech segments are spoken by the another person before and after a human speech segment spoken by the wearer of the head-mounted display device.
20. The head-mounted display device of claim 17 , wherein the digital content item includes a plurality of visual content items presented at different positions on the see-through display, and wherein the instructions are executable to modify the presentation of the digital content item by moving a visual content item of the plurality of visual content items away from a position on the see-through display that corresponds with a direction of a source location of a segment of human speech of the other person.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.