P
US11216739B2ActiveUtilityPatentIndex 50

System and method for automated analysis of ground truth using confidence model to prioritize correction options

Assignee: IBMPriority: Jul 25, 2018Filed: Jul 25, 2018Granted: Jan 4, 2022
Est. expiryJul 25, 2038(~12.1 yrs left)· nominal 20-yr term from priority
Inventors:FREED ANDREW RCHRISTIANSON KYLE GPHIPPS CHRISTOPHER
G06N 3/042G06N 20/00G06N 3/006G06N 5/048G06N 5/022G06N 5/02G06N 5/041
50
PatentIndex Score
0
Cited by
23
References
15
Claims

Abstract

A method, system and computer-usable medium are disclosed for automated analysis of ground truth using confidence model to prioritize correction options. In certain embodiments, the ground truth data is analyzed to identify review-candidates. A confidence level may be assigned to each of the identified review-candidates and the review-candidates are prioritized, at least in part, using the assigned confidence levels. The review-candidates are electronically presented in prioritized order to solicit verification or correction feedback for updating the ground truth data.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A computer-implemented method for automated analysis of ground truth using an information processing system having a processor and a memory, the method comprising:
 receiving, by the information processing system, ground truth data; 
 analyzing, by the information processing system, the ground truth data to identify review-candidates; 
 assigning, by the information processing system, a confidence level to each of the identified review-candidates; 
 prioritizing, by the information processing system, the review-candidates based at least on the assigned confidence levels; 
 electronically presenting, by the information processing system, the review-candidates in prioritized order to solicit corrective feedback for updating the ground truth data; 
 generating, by the information processing system, suggested fixes for the review-candidates; and 
 grouping identified review candidates having the same suggested fixes; 
 electronically presenting the grouped review-candidates in prioritized order along with the suggested fixes to solicit corrective feedback for updating the ground truth data using the suggested fixes; and, 
 training a question answer (QA) system using the suggested fixes. 
 
     
     
       2. The computer-implemented method of  claim 1 , wherein prioritizing the review-candidates further comprises:
 prioritizing a review-candidate based on an impact of changing the review-candidate in the ground truth data using one or more of the respective suggested fixes. 
 
     
     
       3. The computer-implemented method of  claim 2 , wherein
 the impact of changing the review-candidate in the ground truth data is based, at least in part, on a number of ground truth data entries that would be changed using the respective suggested fixes. 
 
     
     
       4. The computer-implemented method of  claim 1 , further comprising:
 identifying, by the information processing system, review-candidates based on similarities between different attribute names; and 
 assigning, by the information processing system, a high confidence level to review-candidates having different attribute names within a predetermined edit distance. 
 
     
     
       5. The computer-implemented method of  claim 1 , further comprising:
 identifying, by the information processing system, review-candidates based on differences in data types in ground truth entries for a given attribute; and 
 assigning, by the information processing system, a high confidence level to review-candidates having different data types for the given attribute. 
 
     
     
       6. A system comprising:
 a processor; 
 a data bus coupled to the processor; and 
 a non-transitory, computer-readable storage medium embodying computer program code, the non-transitory, computer-readable storage medium being coupled to the data bus, the computer program code interacting with a plurality of computer operations and comprising instructions executable by the processor and configured for: 
 receiving ground truth data; 
 analyzing the ground truth data to identify review-candidates; 
 assigning a confidence level to each of the identified review-candidates; 
 prioritizing the review-candidates based at least on the assigned confidence levels; 
 electronically presenting the review-candidates in prioritized order to solicit corrective feedback for updating the ground truth data; 
 generating, by the information processing system, suggested fixes for the review-candidates; and 
 grouping identified review candidates having the same suggested fixes; 
 electronically presenting the grouped review-candidates in prioritized order along with the suggested fixes to solicit corrective feedback for updating the ground truth data using the suggested fixes; and, 
 training a question answer (QA) system using the suggested fixes. 
 
     
     
       7. The system of  claim 6 , wherein prioritizing the review-candidates further comprises:
 prioritizing a review-candidate based on an impact of changing the review-candidate in the ground truth data using one or more of the respective suggested fixes. 
 
     
     
       8. The system of  claim 7 , wherein:
 the impact of changing the review-candidate in the ground truth data is based, at least in part, on a number of ground truth data entries that would be changed using the respective suggested fixes. 
 
     
     
       9. The system of  claim 6 , wherein the instructions are further configured for:
 identifying review-candidates based on similarities between different attribute names; and 
 assigning a high confidence level to review-candidates having different attribute names within a predetermined edit distance. 
 
     
     
       10. The system of  claim 6 , wherein the instructions are further configured for:
 identifying review-candidates based on differences in data types in ground truth entries for a given attribute; and 
 assigning a high confidence level to review-candidates having different data types for the given attribute. 
 
     
     
       11. A non-transitory, computer-readable storage medium embodying computer program code, the computer program code comprising computer executable instructions configured for:
 receiving ground truth data; 
 analyzing the ground truth data to identify review-candidates; 
 assigning a confidence level to each of the identified review-candidates; 
 prioritizing the review-candidates based at least on the assigned confidence levels; 
 electronically presenting the review-candidates in prioritized order to solicit corrective feedback for updating the ground truth data; 
 generating, by the information processing system, suggested fixes for the review-candidates; and 
 grouping identified review candidates having the same suggested fixes; 
 electronically presenting the grouped review-candidates in prioritized order along with the suggested fixes to solicit corrective feedback for updating the ground truth data using the suggested fixes; and, 
 training a question answer (QA) system using the suggested fixes. 
 
     
     
       12. The non-transitory, computer-readable storage medium of  claim 11 , wherein prioritizing the review-candidates further comprises:
 prioritizing a review-candidate based on an impact of changing the review-candidate in the ground truth data using one or more of the respective suggested fixes. 
 
     
     
       13. The non-transitory, computer-readable storage medium of  claim 12 , wherein
 the impact of changing the review-candidate in the ground truth data is based, at least in part, on a number of ground truth data entries that would be changed using the respective suggested fixes. 
 
     
     
       14. The non-transitory, computer-readable storage medium of  claim 11 , wherein the instructions are further configured for:
 identifying review-candidates based on similarities between different attribute names; and 
 assigning a high confidence level to review-candidates having different attribute names within a predetermined edit distance. 
 
     
     
       15. The non-transitory, computer-readable storage medium of  claim 11 , wherein the instructions are further configured for:
 identifying review-candidates based on differences in data types in ground truth entries for a given attribute; and 
 assigning a high confidence level to review-candidates having different data types for the given attribute.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.