Voice interaction method, and device
Abstract
A voice dialogue method performed by a voice dialog system includes: a voice signal generation unit; a voice dialog agent unit; a voice output unit; and a voice input control unit, the method including: a step of, by the voice signal generation unit, receiving a voice input and generating a voice signal based on the received voice input; a step of, by the voice dialog agent unit, performing voice recognition processing on the voice signal and performing processing based on a result of the voice recognition processing to generate a response signal; a step of, by the voice output unit, outputting a voice based on the response signal; and a step of, when the voice output unit outputs the voice, by the voice input control unit, keeping the voice signal generation unit, for predetermined period after output of the voice, a receivable state in which a voice input is receivable.
Claims
exact text as granted — not AI-modifiedThe invention claimed is:
1. A voice dialogue method that is performed by a voice dialogue system, the voice dialogue system including: a voice signal generation unit; a voice dialogue agent unit; an additional voice dialogue agent unit; a voice output unit; and a voice input control unit a first device; a second device; a first voice dialogue agent server; and a second voice dialogue agent server, wherein
the first device is a computer-embedded device that is capable of connecting to a network and performing input and output via a voice with a user,
the first voice dialogue agent server and the second voice dialogue agent server are each a voice dialogue agent server that is accessed by the first device via the network, and is capable of performing, as an agent for the first device, recognition of a voice inputted by the device and synthesizing of a voice to be outputted by the first device, and
the voice dialogue method comprising comprises:
a step of, by the voice signal generation unit, receiving a voice input and at the first device;
generating a voice signal at the first device based on the received voice input, and transmitting the voice signal from the first device to the first voice dialogue agent server;
a step of, by the voice dialogue agent unit, performing voice recognition processing on the generated voice signal and at the first voice dialogue agent server to generate first text input;
determining, at the first voice dialogue agent server based on a result of the voice recognition processing the generated first text input and agent information, which one of the voice dialogue agent unit first voice dialogue agent server and the additional voice dialogue agent unit second voice dialogue agent server is appropriate for performing voice-related processing that is processing based on the voice signal, the agent information being stored in a memory included in the voice dialogue agent unit first voice dialogue agent server and associating the additional voice dialogue agent unit second voice dialogue agent server with one or more keywords;
a step of, when the voice dialogue agent unit determines that the voice dialogue agent unit is appropriate for performing the voice-related processing, by the voice dialogue agent unit, performing processing based on the result of the voice recognition processing to generate a response signal, and by the voice output unit, outputting a voice based on the response signal generated by the voice dialogue agent unit;
a step of, when the voice dialogue agent unit determines that the additional voice dialogue agent unit is appropriate for performing the voice-related processing, by the voice dialogue agent unit, transferring the voice signal to the additional voice dialogue agent unit, by the additional voice dialogue agent unit, performing new voice recognition processing on the transferred voice signal and performing processing based on a result of the new voice recognition processing to generate a response signal, and by the voice output unit, outputting a voice based on the response signal generated by the additional voice dialogue agent unit; and
a step of, when the voice output unit outputs a voice, by the voice input control unit, keeping the voice signal generation unit in a receivable state for a predetermined period after output of the voice, the receivable state being a state in which a voice input is receivable
when the determining determines that the first voice dialogue agent server is appropriate for performing the voice-related processing, (i) generating, from the generated first text input, a first instruction set for the first device or another device associated with the first voice dialogue agent server, (ii) executing the generated first instruction set using the first device or the other device associated with the first voice dialogue agent server, (iii) generating a first response signal based on the execution of the generated first instruction set using the first device or the other device associated with the first voice dialogue agent server, (iv) transmitting the generated first response signal from the first voice dialogue agent server to the first device, and (v) outputting a voice at the first device based on the received first response signal generated at the first voice dialogue agent server;
when the determining determines that the second dialogue agent server is appropriate for performing the voice-related processing, (i) transferring the voice signal from the first voice dialogue agent server to the second voice dialogue agent server, (ii) performing new voice recognition processing on the transferred voice signal at the second voice dialogue agent server to generate second text input, (iii) generating, from the generated second text input, a second instruction set for the second device, (iv) executing the generated second instruction set using the second device, (v) generating a second response signal based on the execution of the generated second instruction set using the second device, (vi) transmitting the generated second response signal from the second voice dialogue agent server to the first device, and (vii) outputting a voice at the first device based on the received second response signal generated at the second voice dialogue agent server; and
displaying, on a screen of the first device or a screen of the second device, a text character string obtained by recognizing voice input from the user and a text character string indicating a response signal by the first device or the second device, while indicating a distinction between the user, the first voice dialogue agent server, and the second voice dialogue agent server.
2. The voice dialogue method of claim 1 6, wherein
the voice dialogue system further includes a display unit, and
the voice dialogue method further comprises
a step of, while the voice signal generation unitwhen the device is in the receivable state, by the display unit, displaying on the display that the voice signal generation unit first device is in the receivable state.
3. The voice dialogue method of claim 2 , further comprising:
a step of, when the voice dialogue agent unit determines that the voice dialogue agent unit is appropriate for performing the voice-related processing, by the display unit,when the determining determines that the first voice dialogue agent server is appropriate for performing the voice-related processing, displaying on the display that the voice dialogue agent unitfirst voice dialogue agent server is appropriate for performing the voice-related processing; and
a step of, when the voice dialogue agent unit determines that the additional voice dialogue agent unit is appropriate for performing the voice-related processing, by the display unit,when the determining determines that the second dialogue agent server is appropriate for performing the voice-related processing, displaying on the display that the additional voice dialogue agent unitsecond voice dialogue agent server is appropriate for performing the voice-related processing.
4. The voice dialogue method of claim 2 , further comprising
a step of, when the voice signal generation unit first device is in the receivable state and a response signal generated by the voice dialogue agent unit first voice dialogue agent server indicates that a new voice input does not need to be received, by the voice input control unit, switching the voice signal generation unit first device to an unreceivable state even during the predetermined period, the unreceivable state being a state in which a voice input is unreceivable.
5. The voice dialogue method of claim 1 , wherein
the additional voice dialogue agent unit second voice dialogue agent server is provided in plural,
the agent information associates each of a plurality of identifiers with one or more keywords, the identifiers each identifying one of the additional voice dialogue agent units second voice dialogue agent servers, and
the voice dialogue method further comprises
a step of, when any of the keywords is included in the result of the voice recognition processing, by the voice dialogue agent unit generated first text input, determining, at the first voice dialogue agent server, that one of the additional voice dialogue agent units second voice dialogue agent servers that is identified by an identifier associated with the included keyword is appropriate for performing the voice-related processing.
6. The voice dialogue method of claim 1, further comprising
when the first device outputs a voice, keeping the first device in a receivable state for a predetermined period after output of the voice, the receivable state being a state in which a voice input is receivable at the first device.
7. The voice dialogue method of claim 1,
wherein the first device or the other device associated with the first voice dialogue agent server is disposed in a home, and the second device is disposed in a vehicle.
8. The voice dialogue method of claim 7,
wherein the first device is one of a television, an air conditioner, a recorder, a washing machine, and a portable smartphone, wherein the other device associated with the first voice dialogue agent server is different from the first device and is one of a television, an air conditioner, a recorder, a washing machine, and a portable smartphone, and wherein second device is one of a car air conditioner and a car navigation system.
9. The voice dialogue method of claim 1,
wherein the first device includes a display which displays (i) when the determining determines that the first voice dialogue agent server is appropriate for performing the voice-related processing, a first character string or a first icon identifying the first voice dialogue agent server and the received first response signal, and (ii) when the determining determines that the second dialogue agent server is appropriate for performing the voice-related processing, a second character string or a second icon identifying the second voice dialogue agent server and the received second response signal.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.