P
US9984006B2ActiveUtilityPatentIndex 83

Data storage systems and methods

Assignee: COMMVAULT SYSTEMS INCPriority: Sep 17, 2014Filed: Jun 23, 2017Granted: May 29, 2018
Est. expirySep 17, 2034(~8.2 yrs left)· nominal 20-yr term from priority
Inventors:AMARENDRAN ARUN PRASADCHATTERJEE TIRTHANKARYUAN YUNLIU YONGTAO
G06F 2212/402G06F 2221/2107G06F 21/6218H04L 67/10G06F 16/164G06F 21/602H04L 63/0428G06F 2212/1052G06N 20/00G06N 5/02G06F 17/27G06N 99/005G06F 12/1408G06F 17/3012G06F 40/20
83
PatentIndex Score
8
Cited by
295
References
20
Claims

Abstract

Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sensitive information. Further, the system may apply the heuristic algorithms to the content of the files, as determined by using natural language processing algorithms, to generate the encryption rules. Moreover, systems are disclosed that are capable of automatically determining whether to encrypt a file based on the generated encryption rules. The content of the file may be determined using natural language processing algorithms and then the encryption rules may be applied to the content of the file to determine whether to encrypt the file.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A data storage system comprising:
 a computing system comprising one or more hardware processors programmed to:
 determine a set of file portions from a plurality of training files, at least some of the set of file portions comprising content designated as sensitive information, each of the file portions comprising a subset of content of at least one file from the plurality of training files; 
 generate a prospective encryption rule for addition to a set of available encryption rules based at least in part on an aggregated set of the file portions, the aggregated set of the file portions including file portions that appear in more than one file from the plurality of training files; 
 determine a number of files from the plurality of training files identified for encryption by performance of the prospective encryption rule; and 
 when the number of files identified for encryption does not satisfy a threshold number of files, iteratively modify the prospective encryption rule until the threshold number of files of the plurality of training files are identified for encryption by performance of the modified prospective encryption rule, and store the modified prospective encryption rule at a non-volatile repository. 
 
 
     
     
       2. The data storage system of  claim 1 , wherein the one or more hardware processors are further programmed to add the prospective encryption rule to the set of available encryption rules when the number of files identified for encryption satisfies the threshold number of files. 
     
     
       3. The data storage system of  claim 1 , further comprising the non-volatile repository that stores the set of available encryption rules. 
     
     
       4. The data storage system of  claim 1 , wherein the one or more hardware processors are further programmed to:
 apply the set of available encryption rules to a set of files; 
 identify a file from the set of files based on the application of the set of available encryption rules, wherein the file is identified based at least in part on a correspondence between a portion of the file and one or more file portions from the plurality of training files used to generate at least one encryption rule from the set of available encryption rules; and 
 encrypt the file using the at least one encryption rule. 
 
     
     
       5. The data storage system of  claim 1 , wherein the one or more hardware processors are further programmed to:
 monitor creation of a file; and 
 determine whether the file satisfies an encryption rule from the set of available encryption rules. 
 
     
     
       6. The data storage system of  claim 1 , wherein the one or more hardware processors are further programmed to:
 determine a context condition for the prospective encryption rule, the context condition specifying when to apply the prospective encryption rule to a file; and 
 associate the context condition with the prospective encryption rule. 
 
     
     
       7. The data storage system of  claim 6 , wherein the context condition comprises at least one of an identity of a user, an identity of a department that includes the user within an entity, a geographic location of a computing device storing the file, a network location of the computing device storing the file, and a device type of the computing device. 
     
     
       8. The data storage system of  claim 1 , wherein the one or more hardware processors are further programmed to:
 present the prospective encryption rule to a user; 
 receive an input from the user responsive to presenting the prospective encryption rule to the user; and 
 determine whether to include the prospective encryption rule in the set of available encryption rules based at least in part on the input received from the user. 
 
     
     
       9. The data storage system of  claim 1 , wherein the one or more hardware processors are further programmed to remove a file portion from the set of file portions based at least in part on an identified set of non-sensitive file portions. 
     
     
       10. A method of automatically generating encryption rules, the method comprising:
 by a rules generation system comprising one or more hardware processors,
 determining a set of file portions from a plurality of training files, at least some of the set of file portions comprising content designated as sensitive information, each of the file portions comprising a subset of content of at least one file from the plurality of training files; 
 generating a prospective encryption rule for addition to a set of available encryption rules based at least in part on an aggregated set of the file portions, the aggregated set of the file portions including at least one file portion that appears in more than one file from the plurality of training files; 
 determining that a number of files from the plurality of training files identified for encryption by performance of the prospective encryption rule does not satisfy a threshold number of files; and 
 in response to said determining, iteratively modifying the prospective encryption rule until the threshold number of files of the plurality of training files are identified for encryption by performance of the modified prospective encryption rule, and storing the modified prospective encryption rule at a non-volatile repository. 
 
 
     
     
       11. The method of  claim 10 , further comprising applying one or more natural language processing algorithms or heuristic algorithms to the plurality of training files to determine the set of file portions. 
     
     
       12. The method of  claim 10 , further comprising adding the prospective encryption rule to the set of available encryption rules when the number of files identified for encryption satisfies the threshold number of files. 
     
     
       13. The method of  claim 10 , further comprising:
 applying the set of available encryption rules to a set of files; 
 identifying a file from the set of files based on the application of the set of available encryption rules, wherein the file is identified based at least in part on a correspondence between a portion of the file and one or more file portions from the plurality of training files used to generate at least one encryption rule from the set of available encryption rules; and 
 encrypting the file using the at least one encryption rule. 
 
     
     
       14. The method of  claim 10 , further comprising:
 monitoring creation of a file; and 
 determining whether the file satisfies an encryption rule from the set of available encryption rules. 
 
     
     
       15. The method of  claim 10 , further comprising:
 determining a context condition for the prospective encryption rule, the context condition specifying when to apply the prospective encryption rule to a file; and 
 associating the context condition with the prospective encryption rule. 
 
     
     
       16. The method of  claim 10 , further comprising:
 presenting the prospective encryption rule to a user; 
 receiving an input from the user responsive to presenting the prospective encryption rule to the user; and 
 determining whether to include the prospective encryption rule in the set of available encryption rules based at least in part on the input received from the user. 
 
     
     
       17. The method of  claim 10 , further comprising removing a file portion from the set of file portions based at least in part on an identified set of non-sensitive file portions. 
     
     
       18. The method of  claim 10 , further comprising storing the set of encryption rules at the non-volatile repository, the non-volatile repository accessible by a plurality of networked computing systems. 
     
     
       19. The method of  claim 10 , wherein generating the prospective encryption rule comprises filtering file portions identified as non-sensitive from the set of file portions for each file from the plurality of training files. 
     
     
       20. The method of  claim 10 , further comprising:
 presenting a file identified as including sensitive content to a user; 
 receiving confirmation from the user that the file includes sensitive content; and 
 adding the file to the plurality of training files.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.