P
US7330811B2ExpiredUtilityPatentIndex 85

Method and system for adapting synonym resources to specific domains

Assignee: AXONWAVE SOFTWARE INCPriority: Sep 29, 2000Filed: Sep 28, 2001Granted: Feb 12, 2008
Est. expirySep 29, 2020(expired)· nominal 20-yr term from priority
Inventors:TURCATO DAVIDEPOPOWICH FREDERICK PTOOLE JANINE TFASS DANIEL CNICHOLSON JAMES DEVLANTISHER GORDON W
G06F 16/3344G06F 40/253G06F 40/268G06F 16/3329G06F 40/169G06F 40/30Y10S707/99933G06F 40/247G06F 40/295G06F 40/211G06F 16/38Y10S707/99932Y10S707/99934Y10S707/99931Y10S707/99935
85
PatentIndex Score
20
Cited by
11
References
60
Claims

Abstract

A method and system for processing synonyms that adapts a general-purpose synonym resource to a specific domain. The method selects out a domain-specific subset of synonyms from the set of general-purpose synonyms. The synonym processing method in turn comprises two methods that can be used either together or on their own. A method of synonym pruning eliminates those synonyms that are inappropriate in a specific domain. A method of synonym optimization eliminates those synonyms that are unlikely to be used in a specific domain. The method has many applications including, but not limited to, information retrieval and domain-specific thesauri as a writer's aid.

Claims

exact text as granted — not AI-modified
1. A method of adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
 a plurality of target terms each having one or more meanings and 
 a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning, 
 said method comprising the steps of: 
 ranking said synonymy relations in relation to said domain; 
 identifying in said linguistic resource one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain; 
 setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and 
 removing said synonymy relations from said linguistic resource according to said threshold value. 
 
   
   
     2. The method of  claim 1  wherein said ranking is a binary judgment. 
   
   
     3. The method of  claim 1  wherein said ranking step comprises:
 ranking said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain. 
 
   
   
     4. The method of  claim 3  wherein said ranking step comprises:
 automatic ranking of said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain. 
 
   
   
     5. The method of  claim 4  wherein said ranking step comprises:
 providing a first automatic ranking of said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain; and 
 human evaluators, acting on the rankings produced by said first automatic ranking, further rank said synonymy relations in relation to said domain. 
 
   
   
     6. The method of  claim 4  wherein said threshold value is selected from one of:
 a pre-determined value; 
 a produced value; or 
 a value set by one or more users of the method. 
 
   
   
     7. The method of  claim 3  wherein said ranking step comprises:
 ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain. 
 
   
   
     8. The method of  claim 7  wherein said ranking step comprises:
 automatic ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain. 
 
   
   
     9. The method of  claim 8  wherein said ranking step comprises:
 automatic ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in a plurality of corporal of data in said domain. 
 
   
   
     10. The method of  claim 9  wherein said plurality of corporal of data of said ranking step comprises an inventory of previous queries and a searchable corpus of data. 
   
   
     11. The method of  claim 8  wherein said ranking step comprises:
 providing a first automatic ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain; 
 human evaluators, acting on the rankings produced by said first automatic ranking, further rank said synonymy relations in relation to said domain; and further ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from: 
 a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain; and 
 b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain. 
 
   
   
     12. The method of  claim 7  wherein said ranking step comprises:
 providing a first ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain; 
 human evaluators, acting on the rankings produced by said first ranking, further rank said synonymy relations in relation to said domain; and 
 further ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from 
 a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain; and 
 b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain. 
 
   
   
     13. The method of  claim 3  wherein said ranking step comprises:
 providing a first ranking of said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain; and 
 human evaluators, acting on the rankings produced by said first ranking, further rank said synonymy relations in relation to said domain. 
 
   
   
     14. The method of  claim 3  wherein said threshold value is selected from one of:
 a pre-determined value; 
 a produced value; or 
 a value set by one or more users of the method. 
 
   
   
     15. The method of  claim 1  wherein said ranking step comprises:
 ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value is produced from 
 a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain, and 
 b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain. 
 
   
   
     16. The method of  claim 15  wherein said semantically related words in said ranking step are selected from:
 the sets of synonymous terms associated with said target term; 
 the set of words contained in dictionary definitions of said target term; and 
 superordinate and subordinate terms for said target term. 
 
   
   
     17. The method of  claim 16  wherein said semantically related words in said ranking step come from linguistic resources including one or more machine-readable dictionaries or machine-readable thesauri. 
   
   
     18. The method of  claim 15  wherein the frequency of occurrence of said synonymous terms and semantically related words in said ranking step come from a plurality of corporal of data and linguistic resources including one or more machine-readable dictionaries or machine-readable thesauri. 
   
   
     19. The method of  claim 15  wherein said ranking step comprises:
 ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value is produced from: 
 a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain, and 
 b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain; and 
 human evaluators, acting on the rankings produced by said ranking step, further ranking said synonymy relations in relation to said domain. 
 
   
   
     20. The method of  claim 19  wherein said threshold value is selected from one of:
 a pre-determined value; 
 a produced value; or 
 a value set by one or more users of the method. 
 
   
   
     21. The method of  claim 15  wherein said threshold value is selected from one of:
 a pre-determined value; 
 a produced value; or 
 a value set by one or more users of the method. 
 
   
   
     22. The method of  claim 1  wherein said ranking step comprises:
 human evaluators ranking said synonymy relations in relation to said domain. 
 
   
   
     23. The method of  claim 22  wherein said ranking is a binary judgment. 
   
   
     24. The method of  claim 1  wherein said threshold value is selected from one of:
 a pre-determined value; 
 a produced value; 
 a value set by one or more users of the method. 
 
   
   
     25. The method of  claim 1  wherein said linguistic resource adapted by said method is a machine-readable dictionary or a machine-readable thesaurus. 
   
   
     26. The method of  claim 1  wherein said linguistic resource produced by the method is used for information retrieval or as a writer's aid. 
   
   
     27. The method of  claim 1  carried out at least in part on a computer. 
   
   
     28. The method of  claim 1  wherein said identifying step comprises:
 identifying as either (1) redundant or (2) likely not to be used said synonymy relations in said linguistic resource which contain a single term that is the same as the target term. 
 
   
   
     29. The method of  1  wherein said identifying step comprises:
 identifying as either (1) redundant or (2) likely not to be used said synonymy relations which are identical to each other in said linguistic resource; 
 and wherein removing said synonymy relations comprises removing all but one of said synonymy relations from said linguistic resource. 
 
   
   
     30. The method of  1  wherein said identifying step comprises:
 identifying said synonymy relations that are irrelevant in said linguistic resource by producing the frequency of occurrence of said synonymous terms in synonymy relations in said domain. 
 
   
   
     31. The method of  claim 1  carried out at least in part on a computer. 
   
   
     32. The method of  claim 1  wherein said removal step comprises:
 removing said synonymy relations from said linguistic resource if said frequency of occurrence is equal to or less than said threshold value. 
 
   
   
     33. The method of  claim 24  wherein said pre-determined threshold value is selected from one of:
 the value is set at 0; 
 the value is variable depending on the size of the domain. 
 
   
   
     34. A method of adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
 a plurality of target terms each having one or more meanings and 
 a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning, 
 said method comprising the steps of: 
 identifying one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain; and 
 removing said synonymy relations from said linguistic resource; 
 wherein said identifying includes identifying as either (1) redundant or (2) likely not to be used said synonymy relations in said linguistic resources which contain a single term that is the same as the largest term. 
 
   
   
     35. The method of  claim 34  comprising the steps of:
 identifying as either (1) redundant or (2) likely not to be used said synonymy relations which are identical to each other in said linguistic resource; 
 and wherein removing said synonymy relations comprises 
 removing all but one of said synonymy relations from said linguistic resource. 
 
   
   
     36. The method of  claim 34  wherein said identifying step comprises:
 identifying said synonymy relations that are irrelevant in said linguistic resource by producing the frequency of occurrence of said synonymous terms in synonymy relations in said domain. 
 
   
   
     37. The method of  claim 34  wherein said linguistic resource adapted by said method is a machine-readable dictionary or a machine-readable thesaurus. 
   
   
     38. The method of  claim 34  wherein said linguistic resource produced by said method is used for information retrieval or as a writer's aid. 
   
   
     39. The method of  claim 34  carried out at least in part on a computer. 
   
   
     40. The method of  claim 34  further comprising the steps of:
 setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and 
 wherein said removal step further comprises removing said synonymy relations from said linguistic resource according to said threshold value. 
 
   
   
     41. The method of  claim 40  wherein said identifying step comprises:
 identifying said synonymy relations that are irrelevant in said linguistic resource by producing the frequency of occurrence of said synonymous terms in synonymy relations in said domain. 
 
   
   
     42. The method of  claim 41  wherein said removal step comprises:
 removing said synonymy relations from said linguistic resource if said frequency of occurrence is equal to or less than said threshold value. 
 
   
   
     43. The method of  claim 40  wherein said threshold value is selected from one of:
 a pre-determined value; 
 a produced value; or 
 a value set by users of the method. 
 
   
   
     44. The method of  claim 43  wherein said pre-determined threshold value is selected from one of:
 the value is value set at 0; 
 the value is variable depending on the size of the domain. 
 
   
   
     45. The method of  claim 40  wherein said linguistic resource adapted by said method is a machine-readable dictionary or a machine-readable thesaurus. 
   
   
     46. The method of  claim 40  wherein said linguistic resource produced by said method is used for information retrieval or as a writer's aid. 
   
   
     47. The method of  claim 40  carried out at least in part on a computer. 
   
   
     48. The method of  claim 34  further comprising the steps of:
 identifying said synonymy relations that are irrelevant in said knowledge domain; 
 removing said synonymy relations from said linguistic resource; 
 identifying in said linguistic resource from said removal step said synonymy relations that are either (1) redundant or (2) likely not to be used in said knowledge domain; and 
 removing said synonymy relations from said linguistic resource; 
 wherein the order of the two identifying steps can be transposed. 
 
   
   
     49. The method of  claim 48  carried out at least in part on a computer. 
   
   
     50. A computer program product for adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
 a plurality of target terms each having one or more meanings, and 
 a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning, 
 said computer program product comprising: 
 a computer usable medium having computer readable program code means embodied in said medium for the steps of: 
 ranking said synonymy relations in relation to said domain; 
 identifying one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain; 
 setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and 
 removing said synonymy relations from said linguistic resource according to said threshold value. 
 
   
   
     51. The computer program product of  claim 50  wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus. 
   
   
     52. The computer program product of  claim 51  wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer's aid. 
   
   
     53. The computer program product of  claim 50  wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus. 
   
   
     54. The computer program product of  claim 50  wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer's aid. 
   
   
     55. A computer program product for adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
 a plurality of target terms each having one or more, and 
 a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning, 
 said computer program product comprising: 
 a computer usable medium having computer readable program code means embodied in said medium for: 
 identifying one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain; 
 setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and 
 removing said synonymy relations from said linguistic resource according to said threshold value. 
 
   
   
     56. The computer program product of  claim 55  wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus. 
   
   
     57. The computer program product of  claim 55  wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer's aid. 
   
   
     58. The computer program product of  claim 55  further comprising:
 identifying said synonymy relations that are irrelevant in said knowledge domain; 
 removing said synonymy relations from said linguistic resource; 
 identifying in said linguistic resource from said removal step said synonymy relations that are either (1) redundant or (2) likely not to be used in said knowledge domain; and 
 removing said synonymy relations from said linguistic resource; 
 wherein the order of the two said identifying steps can be transposed. 
 
   
   
     59. The computer program product of  claim 58  wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus. 
   
   
     60. The computer program product of  claim 58  wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer's aid.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.