P
US12401494B2ActiveUtilityPatentIndex 49

Machine learning using secure gradient descent computation

Assignee: NIPPON TELEGRAPH & TELEPHONEPriority: Aug 14, 2019Filed: Aug 14, 2019Granted: Aug 26, 2025
Est. expiryAug 14, 2039(~13.1 yrs left)· nominal 20-yr term from priority
Inventors:MISHINA IBUKIIKARASHI DAIHAMADA KOKI
H04L 2209/046G06N 3/084H04L 2209/46H04L 9/06H04L 9/085
49
PatentIndex Score
0
Cited by
10
References
3
Claims

Abstract

A calculation of a gradient descent method in secure computing is performed at high speed while maintaining accuracy. A secure gradient descent computation method calculates a gradient descent method while keeping a gradient and a parameter concealed. An initialization unit initializes concealed values [M], [V] of matrices M, V (S 11 ). A gradient calculation unit determines concealed value [G] of a matrix G of a gradient g (S 12 ). A parameter update unit calculates [M] β1 [M]+(1−β1) [G] (S 13 - 1 ), calculates [V]←β2 [V]+(1−β2) [G]◯[G] (S 13 - 2 ), calculates [M{circumflex over ( )}]←β{circumflex over ( )}1, t [M] (S 13 - 3 ), calculates [V{circumflex over ( )}]←β{circumflex over ( )}2, t [V] (S 13 - 4 ), calculates [G{circumflex over ( )}]←Adam ([V{circumflex over ( )}]) (S 13 - 5 ), calculates [G{circumflex over ( )}]←[G{circumflex over ( )}]◯[M{circumflex over ( )}] (S 13 - 6 ), and calculates [W]←[W]−[G{circumflex over ( )}] (S 13 - 7 ).

Claims

exact text as granted — not AI-modified
The invention claimed is: 
     
       1. A secure deep learning method performed by a secure deep learning system including a plurality of secure computation apparatuses connected via a network, each secure computation apparatus performing processing while coordinating with other secure computation apparatuses using a secret sharing, the secure deep learning method learning a deep neural network while at least a feature X of learning data, and true data T and a parameter W of the learning data are kept concealed, the secure deep learning method comprising:
 receiving, by the plurality of secure computation apparatuses, input of data; 
 calculating, by a forward propagation calculation circuitry of each of the secure computation apparatuses, [U 1 ]←[W 0 ]▪[X]; 
 calculating, by the forward propagation calculation circuitry, [Y 1 ]←Activation ([U 1 ]); 
 calculating, by the forward propagation calculation circuitry, [U i+1 ]←[W i ]▪[Y i ] for each i greater than or equal to 1 and less than or equal to n−1; 
 calculating, by the forward propagation calculation circuitry, [Y i+1 ]←Activation ([U i+1 ]) for each i greater than or equal to 1 and less than or equal to n−1; 
 calculating, by the forward propagation calculation circuitry, [U n+1 ]←[W n ]▪[Y n ]; 
 calculating, by the forward propagation calculation circuitry, [Y n+1 ]←Activation2 ([U n+1 ]); 
 calculating, by a back propagation calculation circuitry of each of the secure computation apparatuses, [Z n+1 ]←Activation2′ ([Y n+1 ], [T]); 
 calculating, by the back propagation calculation circuitry, [Z n ]←Activation′ ([U n ])◯([Z n+1 ]▪[W n ]); 
 calculating, by the back propagation calculation circuitry, [Z n−i ]←Activation′ ([U n−i ]) ◯([Z n−i+1 ]▪[W n−i ]) for each i greater than or equal to 1 and less than or equal to n−1; 
 calculating, by a gradient calculation circuitry of each of the secure computation apparatuses, [G 0 ]←[Z 1 ]▪[X]; 
 calculating, by the gradient calculation circuitry, [G i ]←[Z i+1 ]▪[Y i ] for each i greater than or equal to 1 and less than or equal to n−1; 
 calculating, by the gradient calculation circuitry, [G n ]←[Z n+1 ]▪[Y n ]; 
 calculating, by a parameter update circuitry of each of the secure computation apparatuses, [G 0 ]←rshift ([G 0 ], H′); 
 calculating, by the parameter update circuitry, [G i ]←rshift ([G i ], H′) for each i greater than or equal to 1 and less than or equal to n−1; 
 calculating, by the parameter update circuitry, [G n ]←rshift ([G n ], H′); 
 learning, by the parameter update circuitry, a parameter [W i ] between an i layer and an i+1 layer using a gradient [G i ] between the i layer and the i+1 layer, for each i greater than or equal to 0 and less than or equal to n; and 
 using the secure gradient descent computation method to predict an outcome based on the data, 
 where β 1 , β 2 , η, and ε are predetermined hyperparameters, ▪ is a product of matrices, ◯ is a product for each element, t is the number of learning times, [G] is a concealed value of a gradient G, [W] is a concealed value of the parameter W, [X] is a concealed value of the feature X of the learning data, [T] is a concealed value of a true label T of the learning data, [M], [M{circumflex over ( )}], [V], [V{circumflex over ( )}], [G{circumflex over ( )}], [U], [Y], and [Z] are concealed values of matrices M, M{circumflex over ( )}, V, V{circumflex over ( )}, G{circumflex over ( )}, U, Y, and Z having the same number of elements as the gradient G, and β{circumflex over ( )} 1, t , β{circumflex over ( )} 2, t , and g{circumflex over ( )} are given by following equations, 
 Adam is a function that calculates a secure batch mapping to output the concealed value [G{circumflex over ( )}] of the matrix G{circumflex over ( )} of the value g{circumflex over ( )} with the concealed value [V{circumflex over ( )}] of the matrix V{circumflex over ( )} of a value v{circumflex over ( )} as an input, rshift is an arithmetic right shift, m is the number of pieces of learning data used for one learning, and H′ is defined by a following equation, and 
 n is the number of hidden layers of the deep neural network, Activation is an activation function of the hidden layers, Activation2 is an activation function of an output layer of the deep neural network, Activation2′ is a loss function corresponding to the activation function Activation2, Activation′ is a derivative of the activation function Activation. 
 
     
     
       2. The secure deep learning method according to  claim 1 ,
 wherein b w  is an accuracy of w, by is an accuracy of an element of Y, b β  is an accuracy of β 1  and β 2 , b β{circumflex over ( )}_1  is an accuracy of β{circumflex over ( )} 1, t , and b g{circumflex over ( )}  is an accuracy of g{circumflex over ( )}, 
 the forward propagation calculation circuitry calculates [Y i+1 ]←Activation ([U i+1 ]) and then calculates [Y i+1 ]←rshift ([Y +1 ], b w ), 
 the back propagation calculation circuitry calculates [Z n ]←Activation′ ([U n ])◯([Z n+1 ]▪[W n ]), and then calculates [Z n ]←rshift ([Z n ], b y ), 
 each back propagation calculation circuitry calculates [Z n−i ]←Activation′ ([U n−i ])◯([Z n−i+1 ]▪[W n−i ]), and then calculates [Z n−i ]←rshift ([Z n−i ], b w ), and 
 the parameter update circuitry learns the parameter [W i ] between the i layer and the i+1 layer using the gradient [G i ] between the i layer and the i+1 layer, in accordance with the secure gradient descent computation method, for each i greater than or equal to 0 and less than or equal to n, 
 wherein the secure gradient descent computation method comprises: 
 calculating, by the parameter update circuitry, [M]←β1 [M]+(1−β1) [G] and then calculates [M]←rshift ([M], bβ), 
 calculating, by the parameter update circuitry, [V]←β2 [V]+(1−β2) [G]◯[G] and then calculates [V]←rshift ([V], bβ), and 
 calculating, by the parameter update circuitry, [G{circumflex over ( )}]←[G{circumflex over ( )}]◯[M{circumflex over ( )}] and then calculates [G{circumflex over ( )}]←rshift ([G{circumflex over ( )}], bg{circumflex over ( )}+bβ{circumflex over ( )}_1). 
 
     
     
       3. A secure deep learning system comprising a plurality of secure computation apparatuses connected via a network, each secure computation apparatus performing processing while coordinating with other secure computation apparatuses using a secret sharing, the secure deep learning system learning a deep neural network while at least a feature X of learning data, and true data T and a parameter W of the learning data are kept concealed,
 each of the secure computation apparatuses including:
 a forward propagation calculation circuitry configured to calculate [U 1 ]←[W 0 ]▪[X], [Y 1 ]←Activation ([U 1 ]), [U i+1 ]←[W i ]▪[Y i ][Y i+1 ]←Activation ([U i+1 ]) for each i greater than or equal to 1 and less than or equal to n−1, [U n+1 ]←[W n ]▪[Y n ], and [Y n+1 ]←Activation2 ([U n+1 ]); 
 a back propagation calculation circuitry configured to calculate [Z n+1 ]←Activation2′ ([Y n+1 ], [T]), [Z n ]←Activation′ ([U n ])◯([Z n+1 ]▪[W n ]), and [Z n−i ]←Activation′ ([U n−i ])◯([Z n−i+1 +]▪[W n−i ]) for each i greater than or equal to 1 and less than or equal to n−1; 
 a gradient calculation circuitry configured to calculate [G 0 ]←[Z 1 ]▪[X], [G i ]←[Z i+1 ]▪[Y i ] for each i greater than or equal to 1 and less than or equal to n−1, and [G n ]←[Z n+1 ][Y n ]; and 
 a parameter update circuitry configured to calculate [G 0 ]←rshift ([G 0 ], H′), [G i ]←rshift ([G i ], H′) for each i greater than or equal to 1 and less than or equal to n−1, and [G n ]←rshift ([G n ], H′), and learn a parameter [W i ] between an i layer and an i+1 layer using a gradient [G i ] between the i layer and the i+1 layer, equal to 0 and less than or equal to n, 
 
 where β 1 , β 2 , η, and ε are predetermined hyperparameters, ▪ is a product of matrices, ◯ is a product for each element, t is the number of learning times, [G] is a concealed value of a gradient G, [W] is a concealed value of the parameter W, [X] is a concealed value of the feature X of the learning data, [T] is a concealed value of a true label T of the learning data, [M], [M{circumflex over ( )}], [V], [V{circumflex over ( )}], [G{circumflex over ( )}], [U], [Y], and [Z] are concealed values of matrices M, M{circumflex over ( )}, V, V{circumflex over ( )}, G{circumflex over ( )}, U, Y, and Z having the same number of elements as the gradient G, and β{circumflex over ( )} 1, t , β{circumflex over ( )} 2, t , and g{circumflex over ( )} are given by following equations, 
 Adam is a function that calculates a secure batch mapping to output the concealed value [G{circumflex over ( )}] of the matrix G{circumflex over ( )} of the value g{circumflex over ( )} with the concealed value [V{circumflex over ( )}] of the matrix V{circumflex over ( )} of a value v{circumflex over ( )} as an input, rshift is an arithmetic right shift, m is the number of pieces of learning data used for one learning, and H′ is defined by a following equation, and 
 n is the number of hidden layers of the deep neural network, Activation is an activation function of the hidden layers, Activation2 is an activation function of an output layer of the deep neural network, Activation2′ is a loss function corresponding to the activation function Activation2, Activation′ is a derivative of the activation function Activation, 
 each of the secure computation apparatuses being configured to
 receive an input of data, and 
 predict an outcome using the calculated gradient descent method based on the data.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.