US7330811B2ExpiredUtilityPatentIndex 85

Method and system for adapting synonym resources to specific domains

Assignee: AXONWAVE SOFTWARE INCPriority: Sep 29, 2000Filed: Sep 28, 2001Granted: Feb 12, 2008

Est. expirySep 29, 2020(expired)· nominal 20-yr term from priority

Inventors:TURCATO DAVIDE POPOWICH FREDERICK P TOOLE JANINE T FASS DANIEL C NICHOLSON JAMES DEVLAN TISHER GORDON W

G06F 16/3344G06F 40/253G06F 40/268G06F 16/3329G06F 40/169G06F 40/30Y10S707/99933G06F 40/247G06F 40/295G06F 40/211G06F 16/38Y10S707/99932Y10S707/99934Y10S707/99931Y10S707/99935

PatentIndex Score

Cited by

References

Claims

Abstract

A method and system for processing synonyms that adapts a general-purpose synonym resource to a specific domain. The method selects out a domain-specific subset of synonyms from the set of general-purpose synonyms. The synonym processing method in turn comprises two methods that can be used either together or on their own. A method of synonym pruning eliminates those synonyms that are inappropriate in a specific domain. A method of synonym optimization eliminates those synonyms that are unlikely to be used in a specific domain. The method has many applications including, but not limited to, information retrieval and domain-specific thesauri as a writer's aid.

Claims

exact text as granted — not AI-modified

1. A method of adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
a plurality of target terms each having one or more meanings and
a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning,
said method comprising the steps of:
ranking said synonymy relations in relation to said domain;
identifying in said linguistic resource one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain;
setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and
removing said synonymy relations from said linguistic resource according to said threshold value.

2. The method of claim 1 wherein said ranking is a binary judgment.

3. The method of claim 1 wherein said ranking step comprises:
ranking said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain.

4. The method of claim 3 wherein said ranking step comprises:
automatic ranking of said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain.

5. The method of claim 4 wherein said ranking step comprises:
providing a first automatic ranking of said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain; and
human evaluators, acting on the rankings produced by said first automatic ranking, further rank said synonymy relations in relation to said domain.

6. The method of claim 4 wherein said threshold value is selected from one of:
a pre-determined value;
a produced value; or
a value set by one or more users of the method.

7. The method of claim 3 wherein said ranking step comprises:
ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain.

8. The method of claim 7 wherein said ranking step comprises:
automatic ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain.

9. The method of claim 8 wherein said ranking step comprises:
automatic ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in a plurality of corporal of data in said domain.

10. The method of claim 9 wherein said plurality of corporal of data of said ranking step comprises an inventory of previous queries and a searchable corpus of data.

11. The method of claim 8 wherein said ranking step comprises:
providing a first automatic ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain;
human evaluators, acting on the rankings produced by said first automatic ranking, further rank said synonymy relations in relation to said domain; and further ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from:
a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain; and
b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain.

12. The method of claim 7 wherein said ranking step comprises:
providing a first ranking of said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from the frequency of occurrence of said synonymous terms in said synonymy relations in said domain;
human evaluators, acting on the rankings produced by said first ranking, further rank said synonymy relations in relation to said domain; and
further ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value for each said synonymy relation is produced from
a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain; and
b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain.

13. The method of claim 3 wherein said ranking step comprises:
providing a first ranking of said synonymy relations according to the frequency of occurrence of said synonymous terms in said synonymy relations in said domain; and
human evaluators, acting on the rankings produced by said first ranking, further rank said synonymy relations in relation to said domain.

14. The method of claim 3 wherein said threshold value is selected from one of:
a pre-determined value;
a produced value; or
a value set by one or more users of the method.

15. The method of claim 1 wherein said ranking step comprises:
ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value is produced from
a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain, and
b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain.

16. The method of claim 15 wherein said semantically related words in said ranking step are selected from:
the sets of synonymous terms associated with said target term;
the set of words contained in dictionary definitions of said target term; and
superordinate and subordinate terms for said target term.

17. The method of claim 16 wherein said semantically related words in said ranking step come from linguistic resources including one or more machine-readable dictionaries or machine-readable thesauri.

18. The method of claim 15 wherein the frequency of occurrence of said synonymous terms and semantically related words in said ranking step come from a plurality of corporal of data and linguistic resources including one or more machine-readable dictionaries or machine-readable thesauri.

19. The method of claim 15 wherein said ranking step comprises:
ranking said synonymy relations according to a numerical value for each said synonymy relation, where said numerical value is produced from:
a) the frequency of occurrence of the synonymous terms in each said synonymy relation in said domain, and
b) the frequency of occurrence of words which are semantically related to the target term in said synonymy relation in said domain; and
human evaluators, acting on the rankings produced by said ranking step, further ranking said synonymy relations in relation to said domain.

20. The method of claim 19 wherein said threshold value is selected from one of:
a pre-determined value;
a produced value; or
a value set by one or more users of the method.

21. The method of claim 15 wherein said threshold value is selected from one of:
a pre-determined value;
a produced value; or
a value set by one or more users of the method.

22. The method of claim 1 wherein said ranking step comprises:
human evaluators ranking said synonymy relations in relation to said domain.

23. The method of claim 22 wherein said ranking is a binary judgment.

24. The method of claim 1 wherein said threshold value is selected from one of:
a pre-determined value;
a produced value;
a value set by one or more users of the method.

25. The method of claim 1 wherein said linguistic resource adapted by said method is a machine-readable dictionary or a machine-readable thesaurus.

26. The method of claim 1 wherein said linguistic resource produced by the method is used for information retrieval or as a writer&#39;s aid.

27. The method of claim 1 carried out at least in part on a computer.

28. The method of claim 1 wherein said identifying step comprises:
identifying as either (1) redundant or (2) likely not to be used said synonymy relations in said linguistic resource which contain a single term that is the same as the target term.

29. The method of 1 wherein said identifying step comprises:
identifying as either (1) redundant or (2) likely not to be used said synonymy relations which are identical to each other in said linguistic resource;
and wherein removing said synonymy relations comprises removing all but one of said synonymy relations from said linguistic resource.

30. The method of 1 wherein said identifying step comprises:
identifying said synonymy relations that are irrelevant in said linguistic resource by producing the frequency of occurrence of said synonymous terms in synonymy relations in said domain.

31. The method of claim 1 carried out at least in part on a computer.

32. The method of claim 1 wherein said removal step comprises:
removing said synonymy relations from said linguistic resource if said frequency of occurrence is equal to or less than said threshold value.

33. The method of claim 24 wherein said pre-determined threshold value is selected from one of:
the value is set at 0;
the value is variable depending on the size of the domain.

34. A method of adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
a plurality of target terms each having one or more meanings and
a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning,
said method comprising the steps of:
identifying one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain; and
removing said synonymy relations from said linguistic resource;
wherein said identifying includes identifying as either (1) redundant or (2) likely not to be used said synonymy relations in said linguistic resources which contain a single term that is the same as the largest term.

35. The method of claim 34 comprising the steps of:
identifying as either (1) redundant or (2) likely not to be used said synonymy relations which are identical to each other in said linguistic resource;
and wherein removing said synonymy relations comprises
removing all but one of said synonymy relations from said linguistic resource.

36. The method of claim 34 wherein said identifying step comprises:
identifying said synonymy relations that are irrelevant in said linguistic resource by producing the frequency of occurrence of said synonymous terms in synonymy relations in said domain.

37. The method of claim 34 wherein said linguistic resource adapted by said method is a machine-readable dictionary or a machine-readable thesaurus.

38. The method of claim 34 wherein said linguistic resource produced by said method is used for information retrieval or as a writer&#39;s aid.

39. The method of claim 34 carried out at least in part on a computer.

40. The method of claim 34 further comprising the steps of:
setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and
wherein said removal step further comprises removing said synonymy relations from said linguistic resource according to said threshold value.

41. The method of claim 40 wherein said identifying step comprises:
identifying said synonymy relations that are irrelevant in said linguistic resource by producing the frequency of occurrence of said synonymous terms in synonymy relations in said domain.

42. The method of claim 41 wherein said removal step comprises:
removing said synonymy relations from said linguistic resource if said frequency of occurrence is equal to or less than said threshold value.

43. The method of claim 40 wherein said threshold value is selected from one of:
a pre-determined value;
a produced value; or
a value set by users of the method.

44. The method of claim 43 wherein said pre-determined threshold value is selected from one of:
the value is value set at 0;
the value is variable depending on the size of the domain.

45. The method of claim 40 wherein said linguistic resource adapted by said method is a machine-readable dictionary or a machine-readable thesaurus.

46. The method of claim 40 wherein said linguistic resource produced by said method is used for information retrieval or as a writer&#39;s aid.

47. The method of claim 40 carried out at least in part on a computer.

48. The method of claim 34 further comprising the steps of:
identifying said synonymy relations that are irrelevant in said knowledge domain;
removing said synonymy relations from said linguistic resource;
identifying in said linguistic resource from said removal step said synonymy relations that are either (1) redundant or (2) likely not to be used in said knowledge domain; and
removing said synonymy relations from said linguistic resource;
wherein the order of the two identifying steps can be transposed.

49. The method of claim 48 carried out at least in part on a computer.

50. A computer program product for adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
a plurality of target terms each having one or more meanings, and
a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning,
said computer program product comprising:
a computer usable medium having computer readable program code means embodied in said medium for the steps of:
ranking said synonymy relations in relation to said domain;
identifying one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain;
setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and
removing said synonymy relations from said linguistic resource according to said threshold value.

51. The computer program product of claim 50 wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus.

52. The computer program product of claim 51 wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer&#39;s aid.

53. The computer program product of claim 50 wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus.

54. The computer program product of claim 50 wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer&#39;s aid.

55. A computer program product for adapting a linguistic resource to a specific knowledge domain, wherein said linguistic resource comprises:
a plurality of target terms each having one or more, and
a plurality of synonymy relations where each synonymy relation forms a relation between two synonymous terms with respect to a meaning,
said computer program product comprising:
a computer usable medium having computer readable program code means embodied in said medium for:
identifying one or more of said synonymy relations from a group comprising: (1) irrelevant or (2) redundant or (3) likely not to be used in said knowledge domain;
setting a threshold value wherein said setting of said threshold value occurs either prior or subsequent to said identifying step; and
removing said synonymy relations from said linguistic resource according to said threshold value.

56. The computer program product of claim 55 wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus.

57. The computer program product of claim 55 wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer&#39;s aid.

58. The computer program product of claim 55 further comprising:
identifying said synonymy relations that are irrelevant in said knowledge domain;
removing said synonymy relations from said linguistic resource;
identifying in said linguistic resource from said removal step said synonymy relations that are either (1) redundant or (2) likely not to be used in said knowledge domain; and
removing said synonymy relations from said linguistic resource;
wherein the order of the two said identifying steps can be transposed.

59. The computer program product of claim 58 wherein said linguistic resource adapted by said computer program product comprises a machine-readable dictionary or a machine-readable thesaurus.

60. The computer program product of claim 58 wherein said linguistic resource produced by said computer program product is used for information retrieval or as a writer&#39;s aid.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.