P
US9830381B2ActiveUtilityPatentIndex 83

Scoring candidates using structural information in semi-structured documents for question answering systems

Assignee: FAN JAMES JPriority: Sep 24, 2010Filed: Sep 6, 2012Granted: Nov 28, 2017
Est. expirySep 24, 2030(~4.2 yrs left)· nominal 20-yr term from priority
Inventors:FAN JAMES JFERRUCCI DAVID A
G06F 16/334G06F 17/30675
83
PatentIndex Score
6
Cited by
114
References
16
Claims

Abstract

A system, program product, and methodology automatically scores candidate answers to questions in a question and answer system. In the candidate answer scoring method, a processor device performs one or more of receiving one or more candidate answers associated with a query string, the candidates obtained from a data source having semi-structured content; identifying one or more documents with semi-structured content from the data source having a candidate answer; and for each identified document: extracting one or more entity structures embedded in the identified document; determining a number of the entity structures in the identified document that appear in the received input query; and, computing a score for a candidate answer in the document as a function of the number Overall system efficiency is improved by giving the correct candidate answers higher scores through leveraging context-dependent structural information such as links to other documents and embedded tags.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A computer program product for automatically scoring candidate answers to questions in a question and answer system, the computer program product comprising a storage medium readable by a processing circuit and storing instructions run by the processing circuit for performing a method, the method comprising:
 receiving plural candidate answers associated with a query string, said plural candidate answers obtained from at least one document in a data corpus using query terms; 
 identifying one or more entity structures embedded in said at least one document; and for each at least one document: 
 extracting said one or more entity structures embedded in said at least one document, said embedded entity structures comprising user embedded tags or embedded links to other documents; 
 determining a number of said entity structures having terms in said embedded tags or embedded links to other documents that match query terms in the received input query string; 
 computing a score for each of said plural candidate answers in said document as a function of a count of said number of entity structures having terms in said embedded tags or said embedded links to other documents that match query terms in the query string; 
 said score computing comprising: assigning an associated weight to a count of said matching query terms associated with each said score for each said plural candidate answer; and 
 computing a final score by combining each weighted match count associated with each of the candidate answers. 
 
     
     
       2. The computer program product as in  claim 1 , wherein said method further comprises:
 accessing a table having entries, each entry including an identified document's corresponding identifier information, and a corresponding value of entity identifiers from the one or more entity structures embedded in that identified document, said determining a number of said entity structures comprises: traversing said table to identify said number of associated entity structures associated with a document. 
 
     
     
       3. The computer program product as in  claim 1 , wherein said determining a number of said entity structures having terms in said embedded tags or out-links that match query terms comprises:
 determining a similarity of a term in an embedded tag or said embedded link to another document and said query terms; and 
 estimating a relevance of the candidate answer to the question based on the determined similarity. 
 
     
     
       4. The computer program product as in  claim 3 , wherein said determining a similarity comprises: applying a comparison function to generate a resulting comparison score, said candidate score comprising said comparison score. 
     
     
       5. A system for automatically scoring candidate answers to questions in a question and answer system comprising:
 a memory storage device; 
 a processor device in communication with the memory device that performs a method comprising: 
 receiving plural candidate answers associated with a query string, said plural candidate answers obtained from at least one document in a data corpus using query terms; 
 identifying one or more entity structures embedded in said at least one document; and for each at least one document: 
 extracting said one or more entity structures embedded in said at least one document, said embedded entity structures comprising user embedded tags or embedded links to other documents; 
 determining a number of said entity structures having terms in said embedded tags or said embedded links to other documents that match query terms in the received input query string; 
 computing a score for each of said plural candidate answers in said document as a function of a count of said number of entity structures having terms in said embedded tags or said embedded links to other documents that match query terms in the query string; 
 said score computing comprising: assigning an associated weight to a count of said matching query terms associated with each said score for each said plural candidate answer; and 
 computing a final score by combining each weighted match count associated with each of the candidate answers. 
 
     
     
       6. The system as in  claim 5 , wherein said processor further performs:
 accessing a table having entries, each entry including an identified document's corresponding identifier information, and a corresponding value of entity identifiers from the one or more entity structures embedded in that identified document, wherein said determining a number of said entity structures comprises: traversing said table to identify said number of associated entity structures associated with a document. 
 
     
     
       7. The system as in  claim 5 , wherein said determining a number of said entity structures having terms in said embedded tags or links that match query terms comprises:
 determining a similarity of a term in an embedded tag or said embedded link to another document and said query terms; and 
 estimating a relevance of the candidate answer to the question based on the determined similarity. 
 
     
     
       8. The system as in  claim 7 , wherein said determining a similarity comprises: applying a comparison function to generate a resulting comparison score, said candidate score comprising said comparison score. 
     
     
       9. The computer program product of  claim 1 , wherein each said query string associated with an untyped question. 
     
     
       10. The system of  claim 5 , wherein each said query string associated with an untyped question. 
     
     
       11. The computer program product of  claim 1 , wherein said assigning comprises: implementing a machine learning algorithm for associating said weights with a match count. 
     
     
       12. The system of  claim 5 , wherein said assigning comprises: implementing a machine learning algorithm for associating said weights with a match count. 
     
     
       13. The computer program product of  claim 4 , wherein said comparison function comprises one or more of: a string equal function, an edit distance or a synonym matching. 
     
     
       14. The system of  claim 8 , wherein said comparison function comprises one or more of: a string equal function, an edit distance or a synonym matching. 
     
     
       15. The computer program product of  claim 1 , further comprising:
 receiving a specification of a type of entity structure to look for in a document; and 
 implementing search engine and retrieval functions to specifically search the document for said entity structure type. 
 
     
     
       16. The system of  claim 1 , wherein the method further comprises:
 receiving a specification of a type of entity structure to look for in a document; and 
 implementing search engine and retrieval functions to specifically search the document for said entity structure type.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.