P
US8290945B2ActiveUtilityPatentIndex 59

Web searching

Assignee: CHELLAPILLA KUMARPriority: Mar 27, 2008Filed: Sep 29, 2010Granted: Oct 16, 2012
Est. expiryMar 27, 2028(~1.7 yrs left)· nominal 20-yr term from priority
Inventors:CHELLAPILLA KUMARMITYAGIN ANTONWANG XUANHUI
G06F 16/953G06F 16/951G06F 16/9566
59
PatentIndex Score
2
Cited by
24
References
20
Claims

Abstract

A human or hand-labeled ranking of URL results for a search query is compared against actual click data for the respective query/URL pairs (e.g., which URLs were actually clicked on by users when the URLs were presented to users when the search query was run in the real world). The human ranking or ordering of the URL results (e.g., pre-existing relevance ranking) for the query can then be adjusted, if necessary, based upon the real world click data (e.g., click relevance ranking). The modified pre-existing relevance ranking can be used in providing future search results.

Claims

exact text as granted — not AI-modified
1. A method for improving relevance of web searches for a query, comprising:
 computing a click relevance ranking of a plurality of query and URL pairs based upon user log data comprising user click data, the computing comprising:
 aggregating the user log data by query and URL; 
 reducing click bias by determining a normalized click rate; 
 creating a click relevance ordering of the plurality of query and URL pairs; 
 creating a directed acyclic graph of a relevance relationship between the plurality of query and URL pairs; and 
 mapping the directed acyclic graph into a linear ordering; and 
 
 identifying and correcting mislabeled query and URL pairs within a pre-existing relevance ranking based upon the click relevance ranking, at least some of at least one of the computing or the identifying performed at least in part via a processing unit. 
 
     
     
       2. The method of  claim 1 , the user log data comprising at least one of:
 respective ranks assigned to query and URL pairs at one or more times; 
 respective total numbers of impressions associated with the ranks; or 
 respective total number of clicks associated with the ranks. 
 
     
     
       3. The method of  claim 1 , the identifying and correcting comprising evaluating at least one of: total number of clicks, click rates, normalized click rates, or total number of impressions corresponding to a first URL and a second URL to determine whether the first URL or the second URL is more relevant to a query. 
     
     
       4. The method of  claim 1 , comprising modifying the pre-existing relevance ranking based upon the corrected query and URL pairs. 
     
     
       5. The method of  claim 1 , comprising mapping the directed acyclic graph into the linear ordering using a flooding technique. 
     
     
       6. The method of  claim 1 , the user log data associated with a specific individual. 
     
     
       7. The method of  claim 1 , comprising:
 determining a longest common subsequence (LCS) of query and URL pairs that is decreasing in both the pre-existing and click relevance ranking; and 
 removing labels from query and URL pairs which are not in the LCS. 
 
     
     
       8. The method of  claim 1 , comprising:
 determining a longest common subsequence (LCS) of query and URL pairs that is decreasing in both the pre-existing and click relevance ranking; 
 assigning pre-existing relevance ranking labels associated with the LCS of query and URL pairs to the click relevance ranking; and 
 relabeling a label associated with a query and URL pair not in the LCS with a new label interpolated from the click relevance ranking. 
 
     
     
       9. The method of  claim 1 , comprising:
 computing a distribution of labels in the pre-existing relevance ranking; and 
 relabeling one or more labels associated with the query and URL pairs in the click relevance ranking according to the distribution of labels in the pre-existing relevance ranking. 
 
     
     
       10. A system for improving relevance of web searches for a query, comprising:
 one or more processing units; and 
 memory comprising instructions that when executed by at least some of the one or more processing units implement the following alone or in combination with hardware:
 a click relevance ranking component configured to:
 compute a click relevance ranking of a plurality of query and URL pairs based upon user log data comprising user click data, the computing comprising:
 aggregating the user log data by query and URL pair; 
 reducing click bias by determining normalized click rate; 
 creating a click relevance ordering of the plurality of query and URL pairs; 
 creating a directed acyclic graph of a relevance relationship between the plurality of query and URL pairs; and 
 mapping the directed acyclic graph into a linear ordering; and 
 
 
 a dynamic program component configured to:
 identify and correct mislabeled query and URL pairs within a pre-existing relevance ranking based upon the click relevance ranking. 
 
 
 
     
     
       11. The system of  claim 10 , the user log data comprising at least one of:
 respective ranks assigned to query and URL pairs at one or more times; 
 respective total numbers of impressions associated with the ranks; or 
 respective total number of clicks associated with the ranks. 
 
     
     
       12. The system of  claim 10 , the dynamic program component configured to:
 evaluate at least one of: total number of clicks, click rates, normalized click rates, or total number of impressions corresponding to a first URL and a second URL to determine whether the first URL or the second URL is more relevant to a query. 
 
     
     
       13. The system of  claim 10 , the dynamic program component configured to:
 modify the pre-existing relevance ranking based upon the corrected query and URL pairs. 
 
     
     
       14. The system of  claim 10 , the click relevance ranking component configured to:
 map the directed acyclic graph into the linear ordering using a flooding technique. 
 
     
     
       15. The system of  claim 10 , the dynamic program component configured to:
 determine a longest common subsequence (LCS) of query and URL pairs that is decreasing in both the pre-existing and click relevance ranking; and 
 remove a label from a query and URL pair which is not in the LCS. 
 
     
     
       16. The system of  claim 10 , the dynamic program component configured to:
 determine a longest common subsequence (LCS) of query and URL pairs that is decreasing in both the pre-existing and click relevance ranking; 
 assign pre-existing relevance ranking labels associated with the LCS of query and URL pairs to the click relevance ranking; and 
 relabel a label associated with a query and URL pair not in the LCS with a new label interpolated from the click relevance ranking. 
 
     
     
       17. The system of  claim 10 , the dynamic program component configured to:
 compute a distribution of labels in the pre-existing relevance ranking; and 
 relabel one or more labels associated with the query and URL pairs in the click relevance ranking according to the distribution of labels in the pre-existing relevance ranking. 
 
     
     
       18. A tangible computer-readable storage medium comprising processor-executable instructions that when executed perform a method for improving relevance of web searches for a query, comprising:
 computing a click relevance ranking of a plurality of query and URL pairs based upon user log data comprising user click data, the computing comprising:
 aggregating the user log data by query and URL pair; 
 reducing click bias by determining normalized click rate; 
 creating a click relevance ordering of the plurality of query and URL pairs; 
 creating a directed acyclic graph of a relevance relationship between the plurality of query and URL pairs; and 
 mapping the directed acyclic graph into a linear ordering; and 
 
 identifying and correcting mislabeled query and URL pairs within a pre-existing relevance ranking based upon the click relevance ranking. 
 
     
     
       19. The tangible computer-readable storage medium of  claim 18 , the user log data comprising at least one of:
 respective ranks assigned to query and URL pairs at one or more times; 
 respective total numbers of impressions associated with the ranks; or 
 respective total number of clicks associated with the ranks. 
 
     
     
       20. The tangible computer-readable storage medium of  claim 18 , the identifying and correcting comprising:
 evaluating at least one of: total number of clicks, click rates, normalized click rates, or total number of impressions corresponding to a first URL and a second URL to determine whether the first URL or the second URL is more relevant to a query.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.