P
US8489385B2ActiveUtilityPatentIndex 57

Use of lexical translations for facilitating searches

Assignee: ETZIONI ORENPriority: Nov 21, 2007Filed: Jun 25, 2012Granted: Jul 16, 2013
Est. expiryNov 21, 2027(~1.4 yrs left)· nominal 20-yr term from priority
Inventors:ETZIONI ORENREITER KOBISAMMER MARCUSSCHMITZ MICHAELSODERLAND STEPHEN
G06F 16/3332G06F 16/951G06F 40/40G06F 40/242
57
PatentIndex Score
4
Cited by
101
References
15
Claims

Abstract

A translation graph is created using a plurality of reference sources that include translations between a plurality of different languages. Each entry in a source is used to create a wordsense entry, and each new word in a source is used to create a wordnode entry. A pair of wordnode and wordsense entries corresponds to a translation. In addition, a probability is determined for each wordsense entry and is decreased for each translation entry that includes more than a predefined number of translations into the same language. Bilingual translation entries are removed if subsumed by a multilingual translation entry. Triangulation is employed to identify pairs of common wordsense translations between a first, second, and third language. Translations not found in reference sources can also be inferred from the data comprising the translation graph. The translation graph can then be used for searches of a data collection in different languages.

Claims

exact text as granted — not AI-modified
The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows: 
     
       1. A method for using a translation graph to search for objects, entities, or resources related to an input word input by a user in a given language,
 wherein the translation graph is usable to find words in a plurality of languages different from the given language that have wordsense meanings corresponding to wordsense meanings of the input word, and wherein the translation graph includes:
 a plurality of wordnodes; and 
 a plurality of edges; 
 wherein each wordnode is associated with a word, a language, and one or more wordsenses; and 
 wherein each edge connects tow wordnodes and is associated with a wordsense with which the two connected wordnodes are the both associated; 
 
 the method comprising:
 searching the translation graph for candidate wordnodes, wherein each candidate wordnode is associated with a language different from the given language and is connected to a wordnode of the input word via a path comprising one or more edges of the translation graph; 
 for one or more of the candidate wordnodes, determining a probability that the candidate wordnode is associated with a wordsense corresponding to a wordsense associated with the wordnode of the input word; 
 returning a set of result wordnodes that is determined based on a comparison of the determined probabilities of the one or more candidate wordnodes to a predetermined threshold; 
 supplying the result wordnodes for creating a query of a search engine; and 
 using the query, employing the search engine to search a collection of data to identify objects, entities, or resources included in the data that are relevant to a wordsense meaning of the input word, the search engine searching for tags assigned to objects, entities, or resources that include at least one word of the result wordnodes. 
 
 
     
     
       2. The method of  claim 1 , further comprising building a wordsense cluster for each wordsense associated with at least one candidate wordnode by combining all of the candidate wordnodes associated with each such wordsense that can be reached by a path starting at the wordnode of the input word and comprising no greater than a predefined maximum number of edges of the translation graph. 
     
     
       3. The method of  claim 2 , further comprising removing wordnodes in each wordsense cluster that have a probability less than a predetermined threshold value. 
     
     
       4. The method of  claim 2 , further comprising merging together wordsense clusters based upon a size of the wordsense clusters and upon a number of wordnodes that the wordsense clusters have in common. 
     
     
       5. The method of  claim 1 , wherein determining the probability that the candidate wordnode is associated with a wordsense corresponding to a wordsense associated with the wordnode of the input word comprises:
 determining a path probability for each path in the translation graph that ends at a candidate wordnode by multiplying together probabilities associated with each wordnode of the path; and 
 using either maximum of the path probabilities or a noisy-or calculation to determine the probability that the candidate wordnode is associated with a wordsense corresponding to a wordsense associated with the wordnode of the input word. 
 
     
     
       6. The method of  claim 1 , wherein employing the search engine includes:
 using the search engine to search for relevant images based upon keyword tags associated with the images that are in languages which are different from the given language. 
 
     
     
       7. The method of  claim 1 , wherein the collection of data comprises an indexed database of the Internet, and wherein employing the search engine includes searching the indexed database of the Internet to identify objects, entities, or resources relevant to the input word based upon the result wordnodes. 
     
     
       8. The method of  claim 1 , wherein employing the search engine includes:
 using the search engine to search for one or more ads associated with a keyword, wherein the keyword has been identified as having a common wordsense with the input word, and wherein the keyword is associated with a language other than the given language. 
 
     
     
       9. A system for using a translation graph to search for objects; entities, or resources related to an input word input by a user in a given language, wherein the translation graph is usable to find words in a plurality of languages different from the given language that have wordsense meanings corresponding to wordsense meanings of the input word
 wherein the translation graph includes:
 a plurality of wordnodes; and 
 a plurality of edges; 
 wherein each wordnode is associated with a word, a language, and one or more wordsenses; and 
 wherein each edge connects two wordnodes and is associated with a wordsense with which the two connected wordnodes are both associated; 
 
 and wherein the system comprises:
 a memory for storing data and machine instructions; 
 a user input device enabling a user to input text and control the system; 
 a display for displaying text and graphics; and 
 a processor that is coupled to the memory, the user input device, and the display, the processor configured to execute the machine instructions stored in the memory to cause the system to carry out a plurality of functions, including:
 searching the translation graph for candidate wordnodes, wherein each candidate wordnode is associated with a language different from the given language and is connected to a wordnode of the input word via a path comprising one or more edges of the translation graph; 
 for one or more of the candidate wordnodes determining a probability that the candidate wordnode has a wordsense corresponding to a wordsense of the wordnode of the input word; 
 returning a set of result wordnodes that is determined based on a comparison of the determined probabilities of the one or more candidate wordnodes to a predetermined threshold; 
 supplying the result wordnodes for creating a query of a search engine; and 
 using the query, employing the search engine to search a collection of data to identify objects, entities, or resources included in the data that are relevant to a wordsense meaning of the input word, the search engine searching for tags assigned to objects, entities, or resources that include at least one word of the result wordnodes. 
 
 
 
     
     
       10. The system of  claim 9 , wherein the plurality of functions further include building a wordsense cluster for each wordsense associated with at least one candidate wordnode by combining all of the candidate wordnodes associated with each such wordsense that can be reached by a path starting at the wordnode of the input word and comprising no greater than a predefined maximum number of edges of the translation graph. 
     
     
       11. The system of  claim 10 , wherein the plurality of functions further include removal of wordnodes in each wordsense cluster that have a probability less than a predetermined threshold value. 
     
     
       12. The system of  claim 10 , wherein the plurality of functions further include merging together wordsense clusters based upon a size of the wordsense clusters and upon a number of wordnodes that the wordsense clusters have in common. 
     
     
       13. The system of  claim 9 , wherein determining the probability that the candidate wordnode has a wordsense corresponding to a wordsense of the wordnode of the input word comprises:
 determining a path probability for each path in the translation graph that ends at a candidate wordnode by multiplying together probabilities associated with each wordnode of the path; and 
 using either a maximum of the path probabilities or a noisy-or calculation to determine the probability that the candidate wordnode has a wordsense corresponding to a wordsense of the wordnode of the input word. 
 
     
     
       14. The system of  claim 9 , wherein employing the search engine includes:
 using the search engine to search for relevant images based upon keyword tags associated with the images that are in languages which are different from the given language. 
 
     
     
       15. The system of  claim 9 , wherein employing the search engine includes:
 using the search engine to search for one or more ads associated with a keyword, wherein the keyword has been identified as having a common wordsense with the input word, and wherein the keyword is associated with a language other than the given language.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.