P
US7430504B2ExpiredUtilityPatentIndex 86

Method and system for ranking words and concepts in a text using graph-based ranking

Assignee: MICROSOFT CORPPriority: Mar 2, 2004Filed: Apr 15, 2004Granted: Sep 30, 2008
Est. expiryMar 2, 2024(expired)· nominal 20-yr term from priority
Inventors:VANDERWENDE LUCRETIA HMENEZES AURL ABANKO MICHELE L
G06F 16/367G06F 17/00
86
PatentIndex Score
30
Cited by
5
References
25
Claims

Abstract

The present invention is a method and system for identifying words, text fragments, or concepts of interest in a corpus of text. A graph is built which covers the corpus of text. The graph includes nodes and links, where nodes represent a word or a concept and links between the nodes represent directed relation names. A score is then computed for each node in the graph. Scores can also be computed for larger sub-graph portions of the graph (such as tuples) The scores are used to identify desired sub-graph portions of the graph, those sub-graph portions being referred to as graph fragments.

Claims

exact text as granted — not AI-modified
1. A method of identifying a characteristic of interest represented by a textual input, comprising:
 building a graph with nodes and links corresponding to the textual input, a pair of nodes and a link between the nodes comprising a tuple; 
 scoring sub-graph components of the graph by assigning a score to each node and each tuple in the graph, the score for each tuple being based on a score of an initial node in the tuple, scores for nodes linking to a target node in the tuple, and a frequency of the tuple in the textual input; 
 identifying graph fragments of interest based on the scores; and 
 performing text manipulation based on the identified graph fragments. 
 
   
   
     2. The method of  claim 1  wherein the nodes correspond to words in the textual input or concepts represented by the textual input. 
   
   
     3. The method of  claim 2  wherein building the graph further comprises generating the links as directed semantic relation names. 
   
   
     4. The method of  claim 3  wherein building the graph further comprises generating a set of abstract analyses for the textual input. 
   
   
     5. The method of  claim 4  wherein generating a set of abstract analyses comprises:
 generating a set of directed acyclic graphs based on the textual input; and 
 connecting the set of directed acyclic graphs to one another. 
 
   
   
     6. The method of  claim 1  wherein building the graph comprises:
 generating a syntactic parse for text portions in the textual input; 
 generating a dependency structure from the syntactic parse; and 
 generating the graph from the syntactic parse. 
 
   
   
     7. The method of  claim 1  wherein building the graph comprises:
 identifying the nodes as adjacent or collocated words; and 
 identifying the links between the nodes. 
 
   
   
     8. The method of  claim 7  wherein identifying the links comprises:
 assigning directionality of the links arbitrarily. 
 
   
   
     9. The method of  claim 7  wherein identifying the links comprises identifying the links and assigning directionality of the links based on a given part-of-speech associated with the nodes, using a heuristic. 
   
   
     10. The method of  claim 7  wherein identifying the links comprises identifying the links and assigning directionality of the links based on a given part-of-speech associated with the nodes, using a machine learned method. 
   
   
     11. The method of  claim 1  wherein identifying graph fragments of interest comprises:
 matching sub-graph components of the graph to nodes and tuples having a sufficient score. 
 
   
   
     12. The method of  claim 11  wherein identifying graph fragments of interest comprises:
 identifying nodes, having a sufficient score, that are linked to the matched sub-graph components. 
 
   
   
     13. The method of  claim 12  wherein identifying graph fragments comprises:
 identifying a node outside a matched sub-graph component that has a predetermined relation to a node in the matched sub-graph component. 
 
   
   
     14. The method of  claim 13  wherein identifying graph fragments comprises:
 identifying certain relations, given a predetermined specific node type. 
 
   
   
     15. The method of  claim 14  wherein all the matched sub-graph components and identified nodes and relations comprise the graph fragment. 
   
   
     16. The method of  claim 15  wherein performing text manipulation comprises:
 extracting the set of sub-graph components identified for a given portion of the textual input as a graph fragment. 
 
   
   
     17. The method of  claim 16  wherein building a graph comprises:
 generating a separate graph for each sentence in the textual input; and 
 connecting the separate graphs together to form an overall graph. 
 
   
   
     18. The method of  claim 17  wherein extracting comprises:
 extracting sub-graph portions, that have a sufficient score, from the overall graph. 
 
   
   
     19. The method of  claim 17  wherein high scoring sub-graph portions of the overall graph comprise sub-graph portions of the overall graph that have a score that meets a threshold score value, and wherein extracting sub-graph portions comprises:
 extracting portions of the separate graphs that spawned the high scoring sub-graph portions of the overall graph. 
 
   
   
     20. The method of  claim 1  wherein performing text manipulation comprises one of summarization, information retrieval, question answering, document clustering, and indexing. 
   
   
     21. The method of  claim 1  wherein performing text manipulation comprises: generating a textual output based on the extracted graph fragments. 
   
   
     22. The method of  claim 1  and further comprising:
 ordering the graph fragments based on scores corresponding to the graph fragments. 
 
   
   
     23. The method of  claim 22  wherein ordering further comprises:
 ordering the graph fragments based on factors in addition to the scores. 
 
   
   
     24. The method of  claim 23  wherein the factors comprise one of placement of nodes and the order in which two nodes related through part of speech will occur, an event timeline determined from the textual input, and a topic determined for the textual input. 
   
   
     25. A method of identifying a characteristic of interest comprising one of words, text fragments, concepts, events, entities and topics, said characteristic of interest represented by a textual input, said method comprising:
 building a graph comprising nodes linked by links corresponding to the textual input; 
 scoring sub-graph components of the graph; 
 identifying graph fragments of interest based on the scores; 
 ordering the graph fragments based on factors in addition to the scores, the factors comprising at least one of placement of nodes and an order in which two nodes related through part-of-speech will occur, an event timeline determined from the textual input, and a topic determined for the textual input; and 
 performing text manipulation based on the identified graph fragments.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.