P
US11244671B2ActiveUtilityPatentIndex 69

Model training method and apparatus

Assignee: SAMSUNG ELECTRONICS CO LTDPriority: May 9, 2019Filed: Aug 23, 2019Granted: Feb 8, 2022
Est. expiryMay 9, 2039(~12.8 yrs left)· nominal 20-yr term from priority
Inventors:KIM HOGYEONGKANG HYOHYEONGNA HwidongLEE HOSHIK
G06N 3/045G06N 3/044G06N 3/0495G06N 3/0895G06N 3/094G06N 3/096G06N 3/0442G06N 3/0464G06N 3/0475G10L 15/183G10L 15/16G10L 15/063G06N 3/088G06N 3/084G06N 3/08
69
PatentIndex Score
5
Cited by
18
References
20
Claims

Abstract

A model training method and apparatus is disclosed, where the model training method acquires first output data of a student model for first input data and second output data of a teacher model for second input data and trains the student model such that the first output data and the second output data are not distinguished from each other. The student model and the teacher model have different structures.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method of training a model, the method comprising:
 acquiring first output data of a student model for first input data and second output data of a teacher model for second input data; and 
 training the student model such that the first output data and the second output data are not distinguished from each other, 
 wherein the student model and the teacher model have different structures, and 
 wherein the first input data is one of text data and speech data and the second input data is the other of the text data and speech data. 
 
     
     
       2. The method of  claim 1 , wherein the student model and the teacher model are configured to process different tasks. 
     
     
       3. The method of  claim 1 , wherein the first input data and the second input data are different types of data. 
     
     
       4. The method of  claim 1 , wherein the first input data and the second input data are unlabeled data. 
     
     
       5. The method of  claim 1 , wherein the training of the student model comprises:
 training the student model such that the first output data and the second output data are not distinguished from each other by a discriminator model and 
 the discriminator model is configured to distinguish between the first output data and the second output data. 
 
     
     
       6. The method of  claim 1 , wherein the first output data and the second output data are a same type of data. 
     
     
       7. The method of  claim 1 , wherein the student model is a speech recognition model and
 the teacher model is a language model that outputs text data based on an expression of a domain. 
 
     
     
       8. The method of  claim 1 , wherein the first input data, the second input data, the first output data, and the second output data are sequence data. 
     
     
       9. The method of  claim 1 , wherein the training of the student model comprises:
 determining an adversarial loss based on a degree to which the first output data and the second output data are distinguished from each other and training the student model to reduce the adversarial loss. 
 
     
     
       10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of  claim 1 . 
     
     
       11. An apparatus for training a model, the apparatus comprising;
 a processor configured to acquire first output data of a student model for first input data and second output data of a teacher model for second input data and to train the student model such that the first output data and the second output data are not distinguished from each other; and 
 a memory configured to store a parameter of the student model, 
 wherein the student model and the teacher model have different structures, and 
 wherein the first input data is one of text data and speech data and the second input data is the other of the text data and speech data. 
 
     
     
       12. The apparatus of  claim 11 , wherein the student model and the teacher model are configured to process different tasks. 
     
     
       13. The apparatus of  claim 11 , wherein the first input data and the second input data are different types of data. 
     
     
       14. The apparatus of  claim 11 , wherein the first input data and the second input data are unlabeled data. 
     
     
       15. The apparatus of  claim 11 , wherein the processor is further configured to train the student model such that the first output data and the second output data are not distinguished from each other by a discriminator model, and
 the discriminator model is configured to distinguish between the first output data and the second output data. 
 
     
     
       16. The apparatus of  claim 11 , wherein the first output data and the second output data are a same type of data. 
     
     
       17. The apparatus of  claim 11 , wherein the student model is a speech recognition model, and
 the teacher model is a language model that outputs text data based on an expression of a domain. 
 
     
     
       18. An apparatus for training a model, the apparatus comprising;
 a memory configured to store a student model, a teacher model, and a discriminator model; and 
 a processor configured to
 acquire first output data of the student model for first input data and second output data of the teacher model for second input data, 
 train the discriminator model to distinguish between the first output data and the second output data, and 
 train the student model to minimize a distinction between the first output data and the second output data at the discriminator mode, 
 
 wherein the first input data is one of text data and speech data and the second input data is the other of the text data and speech data. 
 
     
     
       19. The apparatus of  claim 18 , wherein a number of hidden layers of the student model is lesser than a number of hidden layers of the teacher model. 
     
     
       20. The apparatus of  claim 18 , wherein the first input data and the second output data comprise unlabeled data.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.