P
US12475258B2ActiveUtilityPatentIndex 64

Method, electronic device, and computer program product for data anonymization

Assignee: DELL PRODUCTS LPPriority: Mar 17, 2023Filed: Apr 28, 2023Granted: Nov 18, 2025
Est. expiryMar 17, 2043(~16.7 yrs left)· nominal 20-yr term from priority
Inventors:GONG MINWANG ZIJIALIU ZHISONG
G06F 21/6227G06F 21/6254
64
PatentIndex Score
1
Cited by
48
References
20
Claims

Abstract

Embodiments disclosed herein relate to a method, an electronic device, and a computer program product for data anonymization. The method includes: performing classification on data by a classifier to obtain data types of the data. The method further includes: performing anonymization on the data by a first anonymization model to obtain first anonymized data. The method further includes: determining, based on the data types, using an anonymizer whether re-anonymization needs to be performed on the first anonymized data. The method further includes: performing, based on a determination that the re-anonymization needs to be performed, the re-anonymization on the first anonymized data by a second anonymization model to obtain second anonymized data. Accordingly, anonymization processing may be performed on data using different anonymization models for different types of data to obtain the final anonymized data and to ensure that no data leakage occurs.

Claims

exact text as granted — not AI-modified
The invention claimed is: 
     
         1 . A method for data anonymization, comprising:
 performing classification on data that comprises a knowledge graph by a classifier to obtain data types of the data,
 wherein, upon receiving a request from a querying party, the data is obtained from a graphical database using a client device because the data matches a query criteria, 
 wherein the data comprises company's sales data, customer data, and inventory data, 
 wherein the customer data specifies a unique identifier of the customer, an address of the customer, and a phone number of the customer, and 
 wherein the data types comprise a numeric data type, an enumerated data type, and a free text data type; 
   performing anonymization on the data by a first anonymization model of a plurality of anonymization models to obtain first anonymized data;   determining, based on the data types, using an anonymizer whether re-anonymization needs to be performed on the first anonymized data; and   performing, based on a determination that the re-anonymization needs to be performed, the re-anonymization on the first anonymized data by a second anonymization model of the plurality of anonymization models to obtain second anonymized data and to prevent a risk of the first anonymized data being leaked,
 wherein the re-anonymization needs to be performed because the first anonymization model is not suitable for processing the data. 
   
     
     
         2 . The method according to  claim 1 , wherein performing the classification on the data using the classifier comprises at least one of:
 performing the classification on the data using a regular expression;   performing the classification on the data using a dictionary base; or   performing the classification on the data using a machine learning model.   
     
     
         3 . The method according to  claim 1 , wherein the data comprises tabular data, the tabular data comprising a plurality of data columns that have different ones of said data types. 
     
     
         4 . The method according to  claim 3 , wherein performing the anonymization on the data by the first anonymization model comprises:
 learning data patterns of the plurality of data columns by training a generative adversarial network model; and   performing the anonymization on the plurality of data columns separately using the trained generative adversarial network model, so as to generate the first anonymized data for each of the plurality of data columns.   
     
     
         5 . The method according to  claim 4 , wherein determining using the anonymizer whether the re-anonymization needs to be performed on the first anonymized data comprises:
 obtaining, through a data profile, a data anonymization level for each of the plurality of data columns;   obtaining, by the anonymizer, a query level for the querying party that queries the plurality of data columns; and   determining, based on the data types, the data anonymization level, and the query level, whether the re-anonymization needs to be performed on the first anonymized data of each of the plurality of data columns.   
     
     
         6 . The method according to  claim 5 , wherein determining, based on the data types, the data anonymization level, and the query level, whether the re-anonymization needs to be performed on the first anonymized data of each of the plurality of data columns comprises:
 determining that the re-anonymization does not need to be performed based on a determination that the data type of each of the plurality of data columns conforms to a data processing type of the first anonymization model; and   determining that the re-anonymization does not need to be performed based on a determination that the data anonymization level of each of the plurality of data columns is lower than the query level.   
     
     
         7 . The method according to  claim 6 , wherein performing the re-anonymization by using the second anonymization model comprises:
 obtaining a profile of the plurality of anonymization models, wherein the profile indicates each anonymization model of the plurality of anonymization models and the data processing type corresponding to said each anonymization model;   selecting, based on the profile and the data type, the second anonymization model from the plurality of anonymization models for the data type of each of the plurality of data columns; and   performing the re-anonymization on the first anonymized data of the plurality of data columns using the second anonymization model.   
     
     
         8 . The method according to  claim 6 , wherein the plurality of anonymization models comprises at least two of:
 a pseudo-data generation model;   a statistical model; or   a text generative adversarial network model.   
     
     
         9 . An electronic device, comprising:
 a processor; and   a memory coupled to the processor, wherein the memory has instructions stored therein which, when executed by the processor, cause the device to perform actions comprising:
 performing classification on data that comprises a knowledge graph by a classifier to obtain data types of the data,
 wherein, upon receiving a request from a querying party, the data is obtained from a graphical database using a client device because the data matches a query criteria, 
 wherein the data comprises company's sales data, customer data, and inventory data, 
 wherein the customer data specifies a unique identifier of the customer, an address of the customer, and a phone number of the customer, and 
 wherein the data types comprise a numeric data type, an enumerated data type, and a free text data type; 
 
 performing anonymization on the data by a first anonymization model of a plurality of anonymization models to obtain first anonymized data; 
 determining, based on the data types, using an anonymizer whether re-anonymization needs to be performed on the first anonymized data; and 
 performing, based on a determination that the re-anonymization needs to be performed, the re-anonymization on the first anonymized data by a second anonymization model of the plurality of anonymization models to obtain second anonymized data and to prevent a risk of the first anonymized data being leaked,
 wherein the re-anonymization needs to be performed because the first anonymization model is not suitable for processing the data. 
 
   
     
     
         10 . The electronic device according to  claim 9 , wherein performing the classification on the data using the classifier comprises at least one of:
 performing the classification on the data using a regular expression;   performing the classification on the data using a dictionary base; or   performing the classification on the data using a machine learning model.   
     
     
         11 . The electronic device according to  claim 9 , wherein the data comprises tabular data, the tabular data comprising a plurality of data columns that have different ones of said data types. 
     
     
         12 . The electronic device according to  claim 11 , wherein performing the anonymization on the data by the first anonymization model comprises:
 learning data patterns of the plurality of data columns by training a generative adversarial network model; and   performing the anonymization on the plurality of data columns separately using the trained generative adversarial network model, so as to generate the first anonymized data for each of the plurality of data columns.   
     
     
         13 . The electronic device according to  claim 12 , wherein determining using the anonymizer whether the re-anonymization needs to be performed on the first anonymized data comprises:
 obtaining, through a data profile, a data anonymization level for each of the plurality of data columns;   obtaining, by the anonymizer, a query level for the querying party that queries the plurality of data columns; and   determining, based on the data types, the data anonymization level, and the query level, whether the re-anonymization needs to be performed on the first anonymized data of each of the plurality of data columns.   
     
     
         14 . The electronic device according to  claim 13 , wherein determining, based on the data types, the data anonymization level, and the query level, whether the re-anonymization needs to be performed on the first anonymized data of each of the plurality of data columns comprises:
 determining that the re-anonymization does not need to be performed based on a determination that the data type of each of the plurality of data columns conforms to a data processing type of the first anonymization model; and   determining that the re-anonymization does not need to be performed based on a determination that the data anonymization level of each of the plurality of data columns is lower than the query level.   
     
     
         15 . The electronic device according to  claim 14 , wherein performing the re-anonymization by using the second anonymization model comprises:
 obtaining a profile of the plurality of anonymization models, wherein the profile indicates each anonymization model of the plurality of anonymization models and the data processing type corresponding to said each anonymization model;   selecting, based on the profile and the data type, the second anonymization model from the plurality of anonymization models for the data type of each of the plurality of data columns; and   performing the re-anonymization on the first anonymized data of the plurality of data columns using the second anonymization model.   
     
     
         16 . The electronic device according to  claim 14 , wherein the plurality of anonymization models comprises at least two of:
 a pseudo-data generation model;   a statistical model; or   a text generative adversarial network model.   
     
     
         17 . A computer program product that is tangibly stored on a non-volatile non-transitory computer-readable medium and comprises machine-executable instructions, wherein the machine-executable instructions, when executed, cause a machine to perform the following actions:
 performing classification on data that comprises a knowledge graph by a classifier to obtain data types of the data,
 wherein, upon receiving a request from a querying party, the data is obtained from a graphical database using a client device because the data matches a query criteria, 
 wherein the data comprises company's sales data, customer data, and inventory data, 
 wherein the customer data specifies a unique identifier of the customer, an address of the customer, and a phone number of the customer, and 
 wherein the data types comprise a numeric data type, an enumerated data type, and a free text data type; 
   performing anonymization on the data by a first anonymization model to obtain first anonymized data;   determining, based on the data types, using an anonymizer whether re-anonymization needs to be performed on the first anonymized data; and   performing, based on a determination that the re-anonymization needs to be performed, the re-anonymization on the first anonymized data by a second anonymization model to obtain second anonymized data and to prevent a risk of the first anonymized data being leaked,
 wherein the re-anonymization needs to be performed because the first anonymization model is not suitable for processing the data. 
   
     
     
         18 . The computer program product according to  claim 17 , wherein performing the classification on the data using the classifier comprises at least one of:
 performing the classification on the data using a regular expression;   performing the classification on the data using a dictionary base; or   performing the classification on the data using a machine learning model.   
     
     
         19 . The computer program product according to  claim 17 , wherein the data comprises tabular data, the tabular data comprising a plurality of data columns that have different ones of said data types. 
     
     
         20 . The computer program product according to  claim 19 , wherein performing the anonymization on the data by the first anonymization model comprises:
 learning data patterns of the plurality of data columns by training a generative adversarial network model; and   performing the anonymization on the plurality of data columns separately using the trained generative adversarial network model, so as to generate the first anonymized data for each of the plurality of data columns.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.