P
US6728752B1ExpiredUtilityPatentIndex 99

System and method for information browsing using multi-modal features

Assignee: XEROX CORPPriority: Jan 26, 1999Filed: Oct 19, 1999Granted: Apr 27, 2004
Est. expiryJan 26, 2019(expired)· nominal 20-yr term from priority
Inventors:CHEN FRANCINE RSCHUETZE HINRICHGARGI ULLAS
G06F 16/5838G06F 16/353G06F 16/34
99
PatentIndex Score
214
Cited by
7
References
26
Claims

Abstract

A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

Claims

exact text as granted — not AI-modified
What is claimed is:  
     
       1. A method for information browsing using multi-modal features, comprising the steps of: 
       automatically isolating a plurality of surface multi-modal features associated with each of a first plurality of objects in a collection, each object in the collection being untagged, the plurality of surface multi-modal features including at least one image mode feature;  
       generating a feature vector for each of the first plurality of objects from the isolated surface multi-modal features;  
       searching the collection using a first feature of the plurality of surface multi-modal features to obtain search results;  
       clustering a second plurality of objects into a plurality of clusters using a second feature of the plurality of surface multi-modal features; and  
       presenting the clusters to a user.  
     
     
       2. The method of  claim 1 , wherein the clustering step uses a vector similarity measure to identify similarities between the feature vectors for the second plurality of objects. 
     
     
       3. The method of  claim 2 , wherein the second plurality of objects consists of the search results. 
     
     
       4. The method of  claim 2 , wherein the vector similarity measure employs a single feature for each object. 
     
     
       5. The method of  claim 2 , wherein the vector similarity measure employs a plurality of features for each object in an aggregate similarity measure. 
     
     
       6. The method of  claim 1 , further comprising the step of selecting at least one cluster from the plurality of clusters to obtain second results. 
     
     
       7. The method of  claim 1 , further comprising the step of identifying at least one object in a subcollection excluding the second plurality of objects having a first surface multi-modal feature with a first value that is similar to at least a second object in a cluster having the first surface multi-modal feature with a second value similar to the first value. 
     
     
       8. The method of  claim 7 , wherein the identifying step uses a vector similarity measure to identify similarities between values for the first multi-modal surface features. 
     
     
       9. The method of  claim 1  wherein the image mode feature is a one of a color histogram feature and a complexity feature. 
     
     
       10. The method of  claim 1  wherein the plurality of surface multi-modal features includes at least a text mode feature, a document genre mode feature, and an image mode feature. 
     
     
       11. A system adapted for information browsing using multi-modal features, comprising: 
       storage for a document collection, wherein the document collection includes a plurality of untagged documents each having a plurality of multi-modal surface features, the plurality of multi-modal surface features including at least one image mode feature;  
       a database adapted to store a quantitative representation of each feature corresponding to each document in the document collection;  
       a processor adapted to execute instructions; and  
       a computer readable memory storing instructions for causing the processor to browse information, the instructions comprising:  
       automatically generating a multi-modal feature vector for each document in the collection of untagged documents using the multi-modal surface features;  
       associating the multi-modal feature vector for each document with the document;  
       searching the collection using a first feature of the plurality of surface multi-modal features and the multi-modal feature vectors associated with each document in the collection to obtain search results;  
       clustering a second plurality of documents into a plurality of clusters using a second feature of the plurality of surface multi-modal surface features;  
       adding documents from the collection to the second plurality of documents based upon closeness of values for the second feature of the plurality of surface multi-modal features;  
       presenting the clusters to a user.  
     
     
       12. The system of  claim 11 , further comprising a communication network interface. 
     
     
       13. The system of  claim 12 , wherein the communication network interface couples the storage, the database, and the processor to a communication network. 
     
     
       14. The system of  claim 13 , wherein the communication network comprises the Internet. 
     
     
       15. The system of  claim 13 , wherein the communication network comprises an intranet. 
     
     
       16. The system of  claim 11  wherein the image mode feature is a one of a color histogram feature and a complexity feature. 
     
     
       17. The system of  claim 11  wherein the plurality of multi-modal surface features further includes at least a text mode feature and a document genre mode feature. 
     
     
       18. The system of  claim 17  wherein the first feature comprises an image mode feature. 
     
     
       19. The system of  claim 18  wherein the second feature comprises a document genre feature. 
     
     
       20. A computer readable medium storing instructions for causing a computer system to browse information in a collection of untagged documents using multi-modal surface features, the instructions comprising: 
       automatically generating a multi-modal feature vector for each document in the collection of untagged documents using the multi-modal surface features, the plurality of multi-modal surface features including at least an image mode feature;  
       associating the multi-modal feature vector for each document with the document;  
       searching the collection using a first feature of the plurality of surface multi-modal features and the multi-modal feature vectors associated with each document in the collection to obtain search results;  
       clustering a second plurality of documents into a plurality of clusters using a second feature of the plurality of surface multi-modal surface features;  
       adding documents to the second plurality of documents based upon closeness of values for the second feature of the plurality of surface multi-modal features;  
       presenting the clusters to a user.  
     
     
       21. The computer readable medium of  claim 20  wherein the multi-modal surface features include document genre, and image modes. 
     
     
       22. The computer readable medium of  claim 21  wherein the first feature is a text mode feature. 
     
     
       23. The computer readable medium of  claim 22  wherein the second feature is a one of the document genre and the image mode feature. 
     
     
       24. The computer readable medium of  claim 23  wherein the second feature is an image mode feature. 
     
     
       25. The method of  claim 24  wherein the image mode feature is a one of a color histogram feature and a complexity feature. 
     
     
       26. The computer readable medium of  claim 24  wherein the instructions further comprise: 
       clustering a third plurality of documents into clusters using a document genre mode feature.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.