P
US7849059B2ExpiredUtilityPatentIndex 99

Data classification systems and methods for organizing a metabase

Assignee: COMMVAULT SYSTEMS INCPriority: Nov 28, 2005Filed: Nov 28, 2006Granted: Dec 7, 2010
Est. expiryNov 28, 2025(expired)· nominal 20-yr term from priority
Inventors:PRAHLAD ANANDSCHWARTZ JEREMY ALANNGO DAVIDBROCKWAY BRIANMULLER MARCUS S
G06F 3/0605G06F 16/119G06F 11/1471G06F 16/22G06F 11/1464G06F 16/182G06F 16/1734G06F 16/134G06F 3/0604G06F 16/11G06F 11/1451G06F 16/1727G06F 11/1461G06F 2201/80G06F 3/0649G06F 11/1435G06F 16/2228G06F 16/16G06F 3/0685G06F 16/122G06F 16/285G06F 16/14G06F 16/24575G06F 16/245Y10S707/99948G06F 11/1466G06F 16/148Y10S707/99942Y10S707/99955
99
PatentIndex Score
81
Cited by
250
References
23
Claims

Abstract

Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.

Claims

exact text as granted — not AI-modified
1. A method of identifying data to store in a metabase, the method comprising:
 monitoring with a journaling module, data interactions between at least one application and one or more of a plurality of data objects stored in a file system, wherein the journaling module is separate from the application and wherein the journaling module populates an index with entries about the data interactions; 
 displaying with one or more computer processors a user interface that allows a user to input a user-defined tag expression, wherein the user-defined tag expression comprises information associated with data interactions the user desires to track; 
 tagging entries in the index that meet the user-defined tag expression, wherein the tagging associates a tag identifier with the entries; 
 scanning entries in the index to identify at least a first entry from the index associated with the tag identifier, wherein the first entry corresponds to a first data interaction with a first data object meeting the tag expression; 
 obtaining from the index first metadata about the data interaction associated with the first entry; 
 accessing the first data object associated with the first entry, to obtain second metadata, wherein the second metadata comprises information about the first data object that the user desires to track; 
 obtaining from the index third metadata, wherein the third metadata comprises the tag identifier associated with the first entry; 
 updating a metabase with the first, second and third metadata such that the metabase associates the tag identifier with first metadata obtained from the index and the second metadata obtained from the first data object, wherein the metabase is stored separately from the first data object and separately from the file system containing the first data object, wherein said updating further comprises determining which of a plurality of metabases comprises records storing first, second or third metadata associated with the first data object; and 
 in response to a user request for information about data interactions associated with the user-defined tag expression, accessing the first, the second or the third metadata in the metabase to determine data interactions that meet the user-defined tag expression without accessing either the plurality of data objects or the file system. 
 
     
     
       2. The method of  claim 1 , wherein information about the selected entry comprises information indicative of modifications to the first data object. 
     
     
       3. The method of  claim 1 , wherein the first, second and third metadata in the metabase is stored separately from the entire contents of the data objects. 
     
     
       4. The method of  claim 1 , further comprising accessing one or more of the first, second or third metadata associated with the data objects one or more times to update the metabase. 
     
     
       5. The method of  claim 1 , wherein said updating comprises:
 determining whether the selected entry in the index of data interactions has an existing record in the metabase; 
 if no record exists corresponding to the selected entry, creating a new record in the metabase; and 
 updating the existing record or the new record with at least a part of the information obtained from the selected entry. 
 
     
     
       6. The method of  claim 1 , wherein said selecting comprises determining whether the entry is a new entry in the index of data interactions. 
     
     
       7. The method of  claim 6 , wherein the entry is considered to be new if a time stamp of the entry is later than a time at which a previous entry was analyzed. 
     
     
       8. The method of  claim 6 , wherein the entry is considered to be new based on an identifier of the entry. 
     
     
       9. The method of  claim 8 , wherein the identifier comprises an update sequence number that identifies the entry in the index of data interactions. 
     
     
       10. The method of  claim 1 , further comprising initially populating the metabase by accessing the data objects so as to access available first, second or third metadata associated with the data objects. 
     
     
       11. The method of  claim 10 , additionally comprising:
 quiescing the data interactions associated with the at least one storage device; and 
 performing said populating during said quiescing. 
 
     
     
       12. The method of  claim 11 , wherein said populating is performed during operation of the at least one storage device. 
     
     
       13. The method of  claim 12 , additionally comprising queuing the data interactions generated during said populating to allow capture of the data interactions during the accessing process. 
     
     
       14. The method of  claim 1 , additionally comprising receiving input regarding the user-defined tag expression, wherein said obtaining information is based at least in part on said user-defined tag expression. 
     
     
       15. A system for managing electronic data in a storage network, the system comprising:
 a journaling module executing in one or more processors that is configured to monitor data interactions between at least one application and one or more the plurality of data objects associated with a file system, wherein the journaling module is separate from the application and wherein the journaling module is further configured to populate an index with entries about the data interactions; 
 a user interface executing in one or more computer processors, wherein the user interface allows a user to input a user-defined tag expression that comprises information associated with data interactions the user desires to track; 
 a data classification module executing in one or more processors configured to:
 entries in the index that meet the user-defined tag expression, wherein the data classification module associates a tag identifier with the entries; 
 scan entries in the index to identify at least a first entry from the index associated with the tag identifier, wherein the first entry corresponds to a first data interaction with a first data object meeting the tag expression; 
 obtain from the index first metadata about the data interaction associated with the first entry; 
 access the first data object associated with the first entry, to obtain second metadata, wherein the second metadata comprises information about the first data object that the user desires to track; 
 obtain from the index third metadata, wherein the third metadata comprises the tag identifier associated with the first entry; 
 update in a metabase the first, second and third metadata such that the metabase associates the tag identifier with the first metadata obtained from the index and the second metadata obtained from the first data object, wherein the metabase is stored separately from the first data object and separately from the file system containing the first data object, 
 wherein said updating further comprises determining which of a plurality of metabases comprises records storing first, second or third metadata associated with the first data object; 
 in response to a user request for information about data interactions associated with the user-defined tag expression, the data classification module is configured to access the first, the second or the third metadata in the metabase to determine data interactions that meet the user-defined tag expression without accessing either the plurality of data objects or the file system. 
 
 
     
     
       16. The system of  claim 15 , wherein the journal file is populated by a monitoring module. 
     
     
       17. The system of  claim 15 , wherein the data classification module is further configured to access the one or more data objects one or more times to update the metabase. 
     
     
       18. The system of  claim 15 , wherein the properties of the data objects are stored in the metabase separately from entire content of the data objects. 
     
     
       19. The system of  claim 15 , wherein the information obtained from the selected entry is indicative of modifications to metadata of the first data object resulting from the first data interaction. 
     
     
       20. The system of  claim 19 , wherein the first, second or third metadata comprises at least one of: a data owner, a last modified time, a last accessed time, a data object size and an application type. 
     
     
       21. The system of  claim 15 , wherein the data classification module is further configured to classify the one or more properties of the data object based on the user-defined tag expression. 
     
     
       22. The system of  claim 15 , wherein the data classification module is further configured to periodically scan the entries in the index. 
     
     
       23. The system of  claim 22 , wherein the data classification module is further configured to allow analysis of the one or more properties of the data objects based on a selected criteria without accessing the data objects.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.