P
US10366126B2ActiveUtilityPatentIndex 41

Data extraction based on multiple meta-algorithmic patterns

Assignee: HEWLETT PACKARD DEVELOPMENT COPriority: May 28, 2014Filed: May 28, 2014Granted: Jul 30, 2019
Est. expiryMay 28, 2034(~7.9 yrs left)· nominal 20-yr term from priority
Inventors:SIMSKE STEVEN JVANS A MARIESTURGILL MALGORZATA M
G06F 16/90335G06F 16/24535G06F 16/93G06F 16/345G06F 16/14
41
PatentIndex Score
0
Cited by
22
References
14
Claims

Abstract

One example is a system including a plurality of combinations of summarization engines and/or meta-algorithmic patterns used to combine a plurality of summarizers, an extractor, an evaluator, and a selector. Each of the plurality of combinations of summarization engines and/or meta-algorithmic patterns receives content to provide a meta-summary of the content. The extractor generates a collection of search queries based on the content. The evaluator determines a similarity value of each combination of summarization engines and/or meta-algorithmic patterns for the collection of search queries. The selector selects an optimal combination of summarization engines and/or meta-algorithmic patterns based on the similarity value.

Claims

exact text as granted — not AI-modified
The invention claimed is: 
     
       1. A system comprising:
 a processor; and 
 a non-transitory computer readable medium storing instructions that are executed by the processor, the instructions comprising instructions to:
 receive, at each summarization engine of a plurality of summarization engines, a collection of documents to provide a summary of each document of the collection of documents; 
 provide, via a plurality of meta-algorithmic patterns, each meta-algorithmic pattern to be applied to at least two summaries, a collection of meta-summaries, each meta-summary of the collection of meta-summaries provided using at least two summaries; 
 to generate a plurality of search queries from the collection of documents; 
 determine a similarity score for each combination of meta-algorithmic patterns and summarization engines, the similarity score indicative of a difference in search behaviors of the plurality search queries when applied to the collection of documents and the collection of meta-summaries; and 
 select for deployment in a data mining application, via the processing system, a combination of the meta-algorithmic patterns and the summarization engines, the selection based on a minimum similarity score. 
 
 
     
     
       2. The system of  claim 1 , wherein the instructions are further to generate a meta-summary of a given document of the collection of documents by applying the selected combination of the meta-algorithmic patterns and summarization engines to the given document. 
     
     
       3. The system of  claim 1 , wherein the evaluation of each combination of meta-algorithmic patterns and summarization engines comprises comparing each combination of meta-algorithmic patterns and summarization engines to training data. 
     
     
       4. The system of  claim 1 , wherein the similarity score is based on a difference between a first action of the plurality of search queries on the collection of documents, and a second action of the plurality of search queries on the collection of meta-summaries. 
     
     
       5. The system of  claim 4 , wherein the first action and the second action are based on a ranking of the plurality of search queries. 
     
     
       6. The system of  claim 4 , wherein the first action and the second action are based on a weighting of the plurality of search queries. 
     
     
       7. The system of  claim 1 , wherein the plurality of meta-algorithmic patterns are selected from the group comprising weighted voting, predictive selection, tessellation and recombination, tessellation and recombination with a decisioner, predictive selection with a secondary engine, and majority voting. 
     
     
       8. A method to extract data from documents based on meta-algorithm patterns, the method comprising:
 filtering content to provide a collection of documents; 
 generating a plurality of search queries from the collection of documents; 
 applying a plurality of combinations of meta-algorithmic patterns and summarization engines, wherein:
 each summarization engine provides a summary of each document of the collection of documents, 
 each meta-algorithmic pattern is applied to at least two summaries to provide, via a processor, a collection of meta-summaries, each meta-summary of the collection of meta-summaries provided using the at least two summaries; 
 
 evaluating the plurality of combinations to determine a similarity score of each combination, the similarity score based on a difference between a first action of the plurality of search queries on the collection of documents, and a second action of the plurality of search queries on the collection of meta-summaries; and 
 selecting a combination of the meta-algorithmic patterns and the summarization engines having a minimum similarity score for a data mining application. 
 
     
     
       9. The method of  claim 8 , further comprising:
 generating a meta-summary of a given document of the collection of documents by applying the selected combination of the meta-algorithmic patterns and summarization engines to the given document; and 
 associating, in a database, the generated meta-summary with the given document. 
 
     
     
       10. The method of  claim 9 , further comprising:
 receiving a search query directed at a document; 
 retrieving, from the database, a meta-summary associated with the document; and 
 generating, based on the retrieved meta-summary, search results responsive to the search query. 
 
     
     
       11. The method of  claim 8 , wherein the plurality of meta-algorithmic patterns are selected from the group comprising weighted voting, predictive selection, tessellation and recombination, tessellation and recombination with a decisioner, predictive selection with a secondary engine, and majority voting. 
     
     
       12. A non-transitory computer readable medium comprising executable instructions to:
 receive a collection of documents via a processor; 
 summarize the collection of documents to provide a plurality of summaries via the processor; 
 summarize the plurality of summaries using a plurality of meta-algorithmic patterns to provide a collection of meta-summaries via the processor; 
 generate a plurality of search queries from the collection of documents; 
 determine a similarity score of each combination of a plurality of combinations of meta-algorithmic patterns and summarization engines, the similarity score based on a difference between a first action of the plurality of search queries on the collection of documents, and a second action of the plurality of search queries on the collection of meta-summaries; and 
 select for deployment in a data mining application, via the processor, a combination of the meta-algorithmic patterns and the summarization engines having a minimum similarity score. 
 
     
     
       13. The non-transitory computer readable medium of  claim 12 , wherein the first action and the second action are based on a ranking of the collection of search queries. 
     
     
       14. The non-transitory computer readable medium of  claim 12 , wherein the plurality of meta-algorithmic patterns are selected from the group comprising weighted voting, predictive selection, tessellation and recombination, tessellation and recombination with a decisioner, predictive selection with a secondary engine, and majority voting.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.