P
US7937264B2ExpiredUtilityPatentIndex 84

Leveraging unlabeled data with a probabilistic graphical model

Assignee: MICROSOFT CORPPriority: Jun 30, 2005Filed: Jun 30, 2005Granted: May 3, 2011
Est. expiryJun 30, 2025(expired)· nominal 20-yr term from priority
Inventors:BURGES CHRISTOPHER J CPLATT JOHN C
G06F 16/355
84
PatentIndex Score
12
Cited by
21
References
17
Claims

Abstract

A general probabilistic formulation referred to as ‘Conditional Harmonic Mixing’ is provided, in which links between classification nodes are directed, a conditional probability matrix is associated with each link, and where the numbers of classes can vary from node to node. A posterior class probability at each node is updated by minimizing a divergence between its distribution and that predicted by its neighbors. For arbitrary graphs, as long as each unlabeled point is reachable from at least one training point, a solution generally always exists, is unique, and can be found by solving a sparse linear system iteratively. In one aspect, an automated data classification system is provided. The system includes a data set having at least one labeled category node in the data set. A semi-supervised learning component employs directed arcs to determine the label of at least one other unlabeled category node in the data set.

Claims

exact text as granted — not AI-modified
1. An automated data classification system, comprising:
 a processor; 
 a memory accessed by the processor, the memory comprising:
 a data set stored in the memory and having at least one labeled category node in the data set; and 
 a semi-supervised learning component stored in the memory and configured to be executed by the processor that employs directed arcs to determine at least one other category node from the data set, the category nodes configured for determining the posterior of any adjacent category node, the directed arcs having an associated conditional probability matrix, the semi-supervised learning component further employing multiple classifier outputs and a smoothing component to alter a weight applied to the multiple classifier outputs by adjusting conditionals of the directed arcs, the semi-supervised learning component further computing a distribution at a selected node of the category nodes, given the distributions of neighbor nodes of the selected node and the conditional probability matrix associated with each of the directed arcs from the neighbor nodes to the selected node, such that a number of bits to describe the distribution is minimized, wherein a subset of the directed arcs comprise loops. 
 
 
     
     
       2. The system of  claim 1 , the learning component is employed to perform at least one of data classification, data regression, data clustering, and data ranking. 
     
     
       3. The system of  claim 1 , further comprising a plurality of nodes where each node in the plurality of nodes are associated with a differing or similar number of classes. 
     
     
       4. The system of  claim 1 , wherein each directed arc is associated with a conditional probability matrix. 
     
     
       5. The system of  claim 1 , wherein each node is coupled to one or more other nodes, each coupling performed by the directed arcs. 
     
     
       6. The system of  claim 1 , further comprising one or more learning models that are associated with inductive or transductive algorithms. 
     
     
       7. The system of  claim 6 , the one or more learning models are associated with at least one classifier. 
     
     
       8. The system of  claim 1 , wherein the classifier outputs are mapped to posterior probabilities. 
     
     
       9. The system of  claim 8 , wherein the classifier outputs are thresholded. 
     
     
       10. The system of  claim 1 , further comprising a plurality of graphs. 
     
     
       11. The system of  claim 10 , further comprising a model averaging component that is applied over posteriors assigned to each node across the plurality of graphs. 
     
     
       12. The system of  claim 1 , the learning component is applied to single-class or multi-class training sets. 
     
     
       13. The system of  claim 1 , wherein the semi-supervised learning component computes the distribution such that the number of bits to describe the distribution is minimized by minimizing a divergence. 
     
     
       14. A method comprising:
 determining, by one or more processors configured with executable instructions, at least one labeled category node from a data set; 
 determining, by the one or more processors, directed arcs to at least one other non-labeled node in the data set; 
 assigning, by the one or more processors, a conditional probability matrix to each directed arc, wherein the category nodes are configured for determining the posterior of any adjacent category node; 
 applying, by the one or more processors, model smoothing over a plurality of graphs by altering a weight applied to multiple classifier outputs by adjusting conditionals of the directed arcs; and 
 computing, by the one or more processors, a distribution at a selected node of the category nodes, given the distributions of neighbor nodes of the selected node and the conditional probability matrix associated with each of the directed arcs from the neighbor nodes to the selected node, such that a number of bits to describe the distribution is minimized, wherein a subset of the directed arcs comprise loops. 
 
     
     
       15. The method of  claim 14 , the program further comprising incorporating the outputs of at least one classifier from a set of classifiers. 
     
     
       16. The method of  claim 14 , the program further comprising applying model averaging over a plurality of graphs. 
     
     
       17. A system to facilitate automated data categorization, comprising:
 a processor; 
 a memory accessed by the processor, the memory comprising:
 a data set stored in the memory and having at least one labeled category node in the data set, each set of the data set associated with a differing class; and 
 a semi-supervised learning component stored in the memory and configured to be executed by the processor to:
 determine at least one labeled node from the data set; 
 determine a respective conditional probability matrix for each of the directed arcs in order to automatically determine categories for a set of unlabeled data nodes, the respective matrix being rectangular, wherein the category nodes are configured for determining the posterior of any adjacent category node thereby automatically categorizing the unlabeled data node; 
 apply model smoothing over a plurality of graphs by altering a weight applied to multiple classifier outputs by adjusting conditionals of the directed arcs; and 
 compute a distribution at a selected node of the category nodes, given the distributions of neighbor nodes of the selected node and the conditional probability matrix associated with each of the directed arcs from the neighbor nodes to the selected node, such that a number of bits to describe the distribution is minimized, wherein a subset of the directed arcs comprise loops.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.