US11244671B2ActiveUtilityPatentIndex 69

Model training method and apparatus

Assignee: SAMSUNG ELECTRONICS CO LTDPriority: May 9, 2019Filed: Aug 23, 2019Granted: Feb 8, 2022

Est. expiryMay 9, 2039(~12.8 yrs left)· nominal 20-yr term from priority

Inventors:KIM HOGYEONG KANG HYOHYEONG NA Hwidong LEE HOSHIK

G06N 3/045G06N 3/044G06N 3/0495G06N 3/0895G06N 3/094G06N 3/096G06N 3/0442G06N 3/0464G06N 3/0475G10L 15/183G10L 15/16G10L 15/063G06N 3/088G06N 3/084G06N 3/08

PatentIndex Score

Cited by

References

Claims

Abstract

A model training method and apparatus is disclosed, where the model training method acquires first output data of a student model for first input data and second output data of a teacher model for second input data and trains the student model such that the first output data and the second output data are not distinguished from each other. The student model and the teacher model have different structures.

Claims

exact text as granted — not AI-modified

What is claimed is:

1. A method of training a model, the method comprising:
acquiring first output data of a student model for first input data and second output data of a teacher model for second input data; and
training the student model such that the first output data and the second output data are not distinguished from each other,
wherein the student model and the teacher model have different structures, and
wherein the first input data is one of text data and speech data and the second input data is the other of the text data and speech data.

2. The method of claim 1 , wherein the student model and the teacher model are configured to process different tasks.

3. The method of claim 1 , wherein the first input data and the second input data are different types of data.

4. The method of claim 1 , wherein the first input data and the second input data are unlabeled data.

5. The method of claim 1 , wherein the training of the student model comprises:
training the student model such that the first output data and the second output data are not distinguished from each other by a discriminator model and
the discriminator model is configured to distinguish between the first output data and the second output data.

6. The method of claim 1 , wherein the first output data and the second output data are a same type of data.

7. The method of claim 1 , wherein the student model is a speech recognition model and
the teacher model is a language model that outputs text data based on an expression of a domain.

8. The method of claim 1 , wherein the first input data, the second input data, the first output data, and the second output data are sequence data.

9. The method of claim 1 , wherein the training of the student model comprises:
determining an adversarial loss based on a degree to which the first output data and the second output data are distinguished from each other and training the student model to reduce the adversarial loss.

10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 .

11. An apparatus for training a model, the apparatus comprising;
a processor configured to acquire first output data of a student model for first input data and second output data of a teacher model for second input data and to train the student model such that the first output data and the second output data are not distinguished from each other; and
a memory configured to store a parameter of the student model,
wherein the student model and the teacher model have different structures, and
wherein the first input data is one of text data and speech data and the second input data is the other of the text data and speech data.

12. The apparatus of claim 11 , wherein the student model and the teacher model are configured to process different tasks.

13. The apparatus of claim 11 , wherein the first input data and the second input data are different types of data.

14. The apparatus of claim 11 , wherein the first input data and the second input data are unlabeled data.

15. The apparatus of claim 11 , wherein the processor is further configured to train the student model such that the first output data and the second output data are not distinguished from each other by a discriminator model, and
the discriminator model is configured to distinguish between the first output data and the second output data.

16. The apparatus of claim 11 , wherein the first output data and the second output data are a same type of data.

17. The apparatus of claim 11 , wherein the student model is a speech recognition model, and
the teacher model is a language model that outputs text data based on an expression of a domain.

18. An apparatus for training a model, the apparatus comprising;
a memory configured to store a student model, a teacher model, and a discriminator model; and
a processor configured to
acquire first output data of the student model for first input data and second output data of the teacher model for second input data,
train the discriminator model to distinguish between the first output data and the second output data, and
train the student model to minimize a distinction between the first output data and the second output data at the discriminator mode,

wherein the first input data is one of text data and speech data and the second input data is the other of the text data and speech data.

19. The apparatus of claim 18 , wherein a number of hidden layers of the student model is lesser than a number of hidden layers of the teacher model.

20. The apparatus of claim 18 , wherein the first input data and the second output data comprise unlabeled data.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.