US10936973B1ActiveUtilityPatentIndex 73

Adversarial example detection method and apparatus, computing device, and non-volatile computer-readable storage medium

Assignee: UNIV DONGGUAN TECHNOLOGYPriority: Aug 14, 2019Filed: Jul 27, 2020Granted: Mar 2, 2021

Est. expiryAug 14, 2039(~13.1 yrs left)· nominal 20-yr term from priority

Inventors:WANG YI HUANG BO

G06F 18/241G06N 7/01G06F 18/214G06N 20/00G06N 5/02

PatentIndex Score

Cited by

References

Claims

Abstract

An adversarial example detection method includes: acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples; inputting the training examples into a target model to obtain a first predicted score vector of the training examples; adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples; respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples; constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples; training a classification model according to the feature data and the training example labels corresponding to the feature to obtain a detector; and detecting input test data according to the detector.

Claims

exact text as granted — not AI-modified

What is claimed is: 
     
       1. An adversarial example detection method, comprising:
 acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples; 
 inputting the training examples into a target model to obtain a first predicted score vector of the training examples; 
 adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples, wherein N is a natural number greater than 0; 
 respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples; 
 constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples; 
 training a classification model according to the feature data and the training example labels corresponding to the feature data to obtain a detector; and 
 detecting input test data according to the detector; 
 wherein the constructing the feature data according to the first predicted score vector and the second predicted score vector of the each group of training examples comprises: 
 computing a difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples; and 
 constructing feature data according to the N difference vectors of the N groups of comparative training examples; 
 wherein the constructing the feature data according to the N difference vectors of the N groups of comparative training examples comprises: 
 performing denoising and dimension-reduction for the N difference vectors to obtain the feature data; 
 wherein the performing the denoising and dimension-reduction for the N difference vectors to obtain the feature data comprises: 
 constructing an N-column difference matrix by the N difference vectors; 
 ranking elements of each row in the difference matrix in an ascending order to obtain a ranked difference matrix; 
 extracting a predetermined quantile of each row in the ranked difference matrix; and 
 taking the predetermined quantiles of all the rows as the feature data. 
 
     
     
       2. The method according to  claim 1 , wherein the inputting the training example into the target model to obtain the first predicted score vector of the training examples comprises:
 inputting the training examples into the target model to obtain a confidence vector corresponding to each training example; 
 acquiring a maximum value of the confidence vectors to obtain a predicted score of the each training example; and 
 taking a vector constituted by the predicted scores of all the training examples as the first predicted score vector of the training examples. 
 
     
     
       3. The method according to  claim 1 , wherein the adding the random perturbation at the N times to the training examples to obtain the N groups of comparative training examples comprises:
 generating the random perturbation according to a predetermined distribution function, the predetermined distribution function being a probability distribution function having predetermined parameters with an average value of 0 and being symmetrically distributed; and 
 adding the random perturbation at N times to the training examples to obtain the N groups of comparative training examples. 
 
     
     
       4. The method according to  claim 3 , wherein the predetermined distribution function is a Gaussian distribution function having an average value of 0. 
     
     
       5. The method according to  claim 1 , wherein the computing the difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples comprises:
 computing a variation rate vector of the second predicted score vector of the each group of comparative training examples relative to the first predicted score vector; and 
 taking the variation rate vector as the difference vector. 
 
     
     
       6. The method according to  claim 1 , wherein when a quantity of normal examples is the same as a quantity of adversarial examples, the training the classification model according to the feature data and the training example label corresponding to the feature to obtain the detector comprises:
 training a binary classification model according to the feature data and the training example label corresponding to the feature to obtain a detector. 
 
     
     
       7. The method according to  claim 1 , wherein the detecting the input test data according to the detector comprises:
 acquiring the test data; 
 inputting the test data into the detector to obtain a detection result; and 
 identifying the test data as the adversarial examples when a label corresponding to the detection result indicates an adversarial example. 
 
     
     
       8. A computing device, comprising: a processor, a memory, a communication interface, and a communication bus; wherein the processor, the memory, and the communication bus communicate with each other via the communication bus; and
 the memory is configured to store at least one executable instruction, wherein the executable instruction causes the processor to perform the steps of; 
 acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples; 
 inputting the training examples into a target model to obtain a first predicted score vector of the training examples; 
 adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples, wherein N is a natural number greater than 0; 
 respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples; 
 constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples; 
 training a classification model according to the feature data and the training example labels corresponding to the feature data to obtain a detector; and 
 detecting input test data according to the detector; 
 wherein the constructing the feature data according to the first predicted score vector and the second predicted score vector of the each group of training examples comprises: 
 computing a difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples; and 
 constructing feature data according to the N difference vectors of the N groups of comparative training examples; 
 wherein the constructing the feature data according to the N difference vectors of the N groups of comparative training examples comprises: 
 performing denoising and dimension-reduction for the N difference vectors to obtain the feature data; 
 wherein the performing the denoising and dimension-reduction for the N difference vectors to obtain the feature data comprises: 
 constructing an N-column difference matrix by the N difference vectors; 
 ranking elements of each row in the difference matrix in an ascending order to obtain a ranked difference matrix; 
 extracting a predetermined quantile of each row in the ranked difference matrix; and 
 taking the predetermined quantiles of all the rows as the feature data. 
 
     
     
       9. The computing device according to  claim 8 , wherein the inputting the training example into the target model to obtain the first predicted score vector of the training examples comprises:
 inputting the training examples into the target model to obtain a confidence vector corresponding to each training example; 
 acquiring a maximum value of the confidence vectors to obtain a predicted score of the each training example; and 
 taking a vector constituted by the predicted scores of all the training examples as the first predicted score vector of the training examples. 
 
     
     
       10. The computing device according to  claim 8 , wherein the adding the random perturbation at the N times to the training examples to obtain the N groups of comparative training examples comprises:
 generating the random perturbation according to a predetermined distribution function, the predetermined distribution function being a probability distribution function having predetermined parameters with an average value of 0 and being symmetrically distributed; and 
 adding the random perturbation at N times to the training examples to obtain the N groups of comparative training examples. 
 
     
     
       11. The computing device according to  claim 10 , wherein the predetermined distribution function is a Gaussian distribution function having an average value of 0. 
     
     
       12. The computing device according to  claim 8 , wherein the computing the difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples comprises:
 computing a variation rate vector of the second predicted score vector of the each group of comparative training examples relative to the first predicted score vector; and 
 taking the variation rate vector as the difference vector. 
 
     
     
       13. The computing device according to  claim 8 , wherein when a quantity of normal examples is the same as a quantity of adversarial examples, the training the classification model according to the feature data and the training example label corresponding to the feature to obtain the detector comprises:
 training a binary classification model according to the feature data and the training example label corresponding to the feature to obtain a detector. 
 
     
     
       14. A non-volatile computer-readable storage medium, the storage medium storing at least one executable instruction; wherein the at least one executable instruction, when being executed by a processor, causes the processor to perform the steps of:
 acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples; 
 inputting the training examples into a target model to obtain a first predicted score vector of the training examples; 
 adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples, wherein N is a natural number greater than 0; 
 respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples; 
 constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples; 
 training a classification model according to the feature data and the training example labels corresponding to the feature data to obtain a detector; and 
 detecting input test data according to the detector; 
 wherein the constructing the feature data according to the first predicted score vector and the second predicted score vector of the each group of training examples comprises: 
 computing a difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples; and 
 constructing feature data according to difference vectors of the N groups of comparative training examples; 
 wherein the constructing the feature data according to the N difference vectors of the N groups of comparative training examples comprises: 
 performing denoising and dimension-reduction for the N difference vectors to obtain the feature data; 
 wherein the performing the denoising and dimension-reduction for the N difference vectors to obtain the feature data comprises: 
 constructing an N-column difference matrix by the N difference vectors; 
 ranking elements of each row in the difference matrix in an ascending order to obtain a ranked difference matrix; 
 extracting a predetermined quantile of each row in the ranked difference matrix; and 
 taking the predetermined quantiles of all the rows as the feature data.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.