US11599774B2ActiveUtilityPatentIndex 56

Training machine learning model

Assignee: IBMPriority: Mar 29, 2019Filed: Mar 29, 2019Granted: Mar 7, 2023

Est. expiryMar 29, 2039(~12.7 yrs left)· nominal 20-yr term from priority

Inventors:ZHAO SHIWAN Wu bing zhe SU ZHONG

G06N 3/09G06N 3/0464G06N 3/047G06V 10/774G06N 3/084G06T 2207/20081G06V 10/776G06V 10/764G06F 17/18G06V 10/82G06N 3/044G06V 2201/03G06N 20/00G06T 2207/20084G06T 7/0012G06F 18/214G06N 3/045G06K 9/6256G06N 3/0472

PatentIndex Score

Cited by

References

Claims

Abstract

Techniques are provided for training machine learning model. According to one aspect, a training data is received by one or more processing units. The machine learning model is trained based on the training data, wherein the training comprises: optimizing the machine learning model based on stochastic gradient descent (SGD) by adding a dynamic noise to a gradient of a model parameter of the machine learning model calculated by the SGD.

Claims

exact text as granted — not AI-modified

What is claimed is:

1. A method for training a machine learning model, comprising:
acquiring, by one or more processing units, a training data; and
training, by one or more processing units, the machine learning model based on the training data, the training comprising:
optimizing, by one or more processing units, the machine learning model based on stochastic gradient descent (SGD), and
minimizing privacy leakage by adding a dynamic noise to a gradient of a model parameter of the machine learning model calculated by the SGD.

2. The method of claim 1 , wherein the machine learning model is a convolutional neural networks (CNN) or a recurrent neural network (RNN).

3. The method of claim 1 , wherein the training data is selected from the group consisting of: pathological data; autopilot data; medical experimental data; biological data; internet of things (IoT) data; social network data; e-commerce data.

4. The method of claim 1 , wherein the optimizing further comprises minimizing a loss function of the machine learning model.

5. The method of claim 4 , wherein the added dynamic noise is selected from a predefined noise set.

6. The method of claim 5 , further comprising assigning a corresponding probability to each of the noises according to the loss function, wherein each of the noises is with a different scale from each other.

7. The method of claim 6 , wherein the added dynamic noise is selected based on the probability assigned.

8. The method of claim 5 , wherein the machine learning model is a CNN, and the predefined noise set comprises noises with three different scales and the training data are labeled pathological images.

9. The method of claim 1 , wherein the noise is a Gaussian noise.

10. A computer system, comprising: a processor;
a non-transitory computer-readable memory coupled to the processor, the memory comprising instructions that when executed by the processor perform actions of:
acquiring, by one or more processing units, a training data; and
training, by one or more processing units, the machine learning model based on the training data, the training comprising:
optimizing, by one or more processing units, the machine learning model based on stochastic gradient descent (SGD), and
minimizing privacy leakage by adding a dynamic noise to a gradient of a model parameter of the machine learning model calculated by the SGD.

11. The system of claim 10 , wherein the machine learning model is a convolutional neural networks (CNN) or a recurrent neural network (RNN).

12. The system of claim 10 , wherein the training data is selected from the group consisting of: pathological data; autopilot data; medical experimental data; biological data; internet of things (IoT) data; social network data; e-commerce data.

13. The system of claim 10 , wherein the optimizing further comprises minimizing a loss function of the machine learning model.

14. The system of claim 13 , wherein the added dynamic noise is selected from a predefined noise set.

15. The system of claim 14 , further comprising assigning a corresponding probability to each of the noises according to the loss function, wherein each of the noises is with a different scale from each other.

16. The system of claim 15 , wherein the added dynamic noise is selected based on the probability assigned.

17. The system of claim 14 , wherein the machine learning model is a CNN, and the predefined noise set comprises noises with three different scales and the training data are labeled pathological images.

18. The system of claim 10 , wherein the noise is a Gaussian noise.

19. A computer program product for training a machine learning model, comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
acquiring, by one or more processing units, a training data;
training, by one or more processing units, the machine learning model based on the training data, the training comprising:
optimizing, by one or more processing units, the machine learning model based on stochastic gradient descent (SGD) by adding a dynamic noise, selected from a predefined noise set, to a gradient of a model parameter of the machine learning model calculated by the SGD wherein the optimizing further comprises minimizing a loss function of the machine learning model, and
assigning a corresponding probability to each of the noises in the predefined noise set according to the loss function, wherein each of the noises is with a different scale from each other.

20. The computer program product of claim 19 , wherein the training data is selected from the group consisting of: pathological data; autopilot data; medical experimental data; biological data; internet of things (IoT) data; social network data; e-commerce data.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.