Adversarial example detection method and apparatus, computing device, and non-volatile computer-readable storage medium
Abstract
An adversarial example detection method includes: acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples; inputting the training examples into a target model to obtain a first predicted score vector of the training examples; adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples; respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples; constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples; training a classification model according to the feature data and the training example labels corresponding to the feature to obtain a detector; and detecting input test data according to the detector.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. An adversarial example detection method, comprising:
acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples;
inputting the training examples into a target model to obtain a first predicted score vector of the training examples;
adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples, wherein N is a natural number greater than 0;
respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples;
constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples;
training a classification model according to the feature data and the training example labels corresponding to the feature data to obtain a detector; and
detecting input test data according to the detector;
wherein the constructing the feature data according to the first predicted score vector and the second predicted score vector of the each group of training examples comprises:
computing a difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples; and
constructing feature data according to the N difference vectors of the N groups of comparative training examples;
wherein the constructing the feature data according to the N difference vectors of the N groups of comparative training examples comprises:
performing denoising and dimension-reduction for the N difference vectors to obtain the feature data;
wherein the performing the denoising and dimension-reduction for the N difference vectors to obtain the feature data comprises:
constructing an N-column difference matrix by the N difference vectors;
ranking elements of each row in the difference matrix in an ascending order to obtain a ranked difference matrix;
extracting a predetermined quantile of each row in the ranked difference matrix; and
taking the predetermined quantiles of all the rows as the feature data.
2. The method according to claim 1 , wherein the inputting the training example into the target model to obtain the first predicted score vector of the training examples comprises:
inputting the training examples into the target model to obtain a confidence vector corresponding to each training example;
acquiring a maximum value of the confidence vectors to obtain a predicted score of the each training example; and
taking a vector constituted by the predicted scores of all the training examples as the first predicted score vector of the training examples.
3. The method according to claim 1 , wherein the adding the random perturbation at the N times to the training examples to obtain the N groups of comparative training examples comprises:
generating the random perturbation according to a predetermined distribution function, the predetermined distribution function being a probability distribution function having predetermined parameters with an average value of 0 and being symmetrically distributed; and
adding the random perturbation at N times to the training examples to obtain the N groups of comparative training examples.
4. The method according to claim 3 , wherein the predetermined distribution function is a Gaussian distribution function having an average value of 0.
5. The method according to claim 1 , wherein the computing the difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples comprises:
computing a variation rate vector of the second predicted score vector of the each group of comparative training examples relative to the first predicted score vector; and
taking the variation rate vector as the difference vector.
6. The method according to claim 1 , wherein when a quantity of normal examples is the same as a quantity of adversarial examples, the training the classification model according to the feature data and the training example label corresponding to the feature to obtain the detector comprises:
training a binary classification model according to the feature data and the training example label corresponding to the feature to obtain a detector.
7. The method according to claim 1 , wherein the detecting the input test data according to the detector comprises:
acquiring the test data;
inputting the test data into the detector to obtain a detection result; and
identifying the test data as the adversarial examples when a label corresponding to the detection result indicates an adversarial example.
8. A computing device, comprising: a processor, a memory, a communication interface, and a communication bus; wherein the processor, the memory, and the communication bus communicate with each other via the communication bus; and
the memory is configured to store at least one executable instruction, wherein the executable instruction causes the processor to perform the steps of;
acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples;
inputting the training examples into a target model to obtain a first predicted score vector of the training examples;
adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples, wherein N is a natural number greater than 0;
respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples;
constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples;
training a classification model according to the feature data and the training example labels corresponding to the feature data to obtain a detector; and
detecting input test data according to the detector;
wherein the constructing the feature data according to the first predicted score vector and the second predicted score vector of the each group of training examples comprises:
computing a difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples; and
constructing feature data according to the N difference vectors of the N groups of comparative training examples;
wherein the constructing the feature data according to the N difference vectors of the N groups of comparative training examples comprises:
performing denoising and dimension-reduction for the N difference vectors to obtain the feature data;
wherein the performing the denoising and dimension-reduction for the N difference vectors to obtain the feature data comprises:
constructing an N-column difference matrix by the N difference vectors;
ranking elements of each row in the difference matrix in an ascending order to obtain a ranked difference matrix;
extracting a predetermined quantile of each row in the ranked difference matrix; and
taking the predetermined quantiles of all the rows as the feature data.
9. The computing device according to claim 8 , wherein the inputting the training example into the target model to obtain the first predicted score vector of the training examples comprises:
inputting the training examples into the target model to obtain a confidence vector corresponding to each training example;
acquiring a maximum value of the confidence vectors to obtain a predicted score of the each training example; and
taking a vector constituted by the predicted scores of all the training examples as the first predicted score vector of the training examples.
10. The computing device according to claim 8 , wherein the adding the random perturbation at the N times to the training examples to obtain the N groups of comparative training examples comprises:
generating the random perturbation according to a predetermined distribution function, the predetermined distribution function being a probability distribution function having predetermined parameters with an average value of 0 and being symmetrically distributed; and
adding the random perturbation at N times to the training examples to obtain the N groups of comparative training examples.
11. The computing device according to claim 10 , wherein the predetermined distribution function is a Gaussian distribution function having an average value of 0.
12. The computing device according to claim 8 , wherein the computing the difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples comprises:
computing a variation rate vector of the second predicted score vector of the each group of comparative training examples relative to the first predicted score vector; and
taking the variation rate vector as the difference vector.
13. The computing device according to claim 8 , wherein when a quantity of normal examples is the same as a quantity of adversarial examples, the training the classification model according to the feature data and the training example label corresponding to the feature to obtain the detector comprises:
training a binary classification model according to the feature data and the training example label corresponding to the feature to obtain a detector.
14. A non-volatile computer-readable storage medium, the storage medium storing at least one executable instruction; wherein the at least one executable instruction, when being executed by a processor, causes the processor to perform the steps of:
acquiring training examples and training example labels corresponding thereto, wherein the training example labels comprises normal examples and adversarial examples;
inputting the training examples into a target model to obtain a first predicted score vector of the training examples;
adding a random perturbation at N times to the training examples to obtain N groups of comparative training examples, wherein N is a natural number greater than 0;
respectively inputting the N groups of comparative training examples into the target model to obtain a second predicted score vector of each group of comparative training examples;
constructing feature data according to the first predicted score vector and the second predicted score vector of each group of comparative training examples;
training a classification model according to the feature data and the training example labels corresponding to the feature data to obtain a detector; and
detecting input test data according to the detector;
wherein the constructing the feature data according to the first predicted score vector and the second predicted score vector of the each group of training examples comprises:
computing a difference vector between the first predicted score vector and the second predicted score vector of the each group of comparative training examples; and
constructing feature data according to difference vectors of the N groups of comparative training examples;
wherein the constructing the feature data according to the N difference vectors of the N groups of comparative training examples comprises:
performing denoising and dimension-reduction for the N difference vectors to obtain the feature data;
wherein the performing the denoising and dimension-reduction for the N difference vectors to obtain the feature data comprises:
constructing an N-column difference matrix by the N difference vectors;
ranking elements of each row in the difference matrix in an ascending order to obtain a ranked difference matrix;
extracting a predetermined quantile of each row in the ranked difference matrix; and
taking the predetermined quantiles of all the rows as the feature data.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.