P
US8370326B2ActiveUtilityPatentIndex 52

System and method for parallel computation of frequency histograms on joined tables

Assignee: IBMPriority: Mar 24, 2009Filed: Mar 24, 2009Granted: Feb 5, 2013
Est. expiryMar 24, 2029(~2.7 yrs left)· nominal 20-yr term from priority
Inventors:BENDEL PETERDRAESE OLIVERRAMAN VIJAYSHANKARSTOLZE KNUT
G06F 16/284
52
PatentIndex Score
0
Cited by
39
References
21
Claims

Abstract

According to one embodiment of the present invention, a method for the parallel computation of frequency histograms in joined tables is provided. The method includes reading data in a table row-by-row from a database system using a coordinator unit and distributing each read row to separate worker units. Each worker unit computes a partial frequency histogram for each column in the table in parallel. The partial histograms from the worker units are then merged and the coordinator unit sends the merged frequency histograms to the worker units.

Claims

exact text as granted — not AI-modified
1. A method comprising:
 reading rows from a table at a first join depth by a coordinator unit, said table part of a database system; 
 distributing each read row to separate worker units; 
 computing a partial frequency histogram for each column in said table at said first join depth using each worker unit in parallel; 
 merging partial histograms from said worker units; 
 sending said merged frequency histograms to said worker units using said coordinator unit; 
 reading rows from said table at a second join depth by said coordinator unit; 
 distributing each read row at said second join depth to one of said worker units; 
 said worker units computing frequency histograms for each column in said table at said second join depth using said merged frequency histograms from said first join depth tables; and 
 outputting said computed frequency histograms at said second join depth. 
 
     
     
       2. A method according to  claim 1 , wherein said table at a first join depth is a fact table and said table at a second join depth is a dimension table. 
     
     
       3. A method according to  claim 1  wherein said coordinator unit distributes rows evenly to worker units regardless of the data therein. 
     
     
       4. A method according to  claim 1  wherein said coordinator unit distributes rows to worker units based on the data in each row. 
     
     
       5. A method according to  claim 1  wherein said worker units begin said computing as soon as a first row is received from said coordinator unit. 
     
     
       6. A method according to  claim 1  wherein aid worker units perform said computing before a join operation is performed on said table. 
     
     
       7. A method according to  claim 1  wherein said computing further comprises, determining if a column value is already in said histogram, and if so, increasing a frequency count by one, and if not, adding said column value to said histogram value with a frequency count equal to one. 
     
     
       8. A method according to  claim 1  wherein said merging comprises summing up the frequencies from said partial histograms. 
     
     
       9. A method according to  claim 1 , wherein said worker units compute frequency histograms for each column in said second join depth by determining if a value in said column is already in said histogram, and if so, then increasing the frequency count for the column value in the histogram by a dependent-frequency value, and if not, adding said column value to said histogram with a frequency count equal to said dependent-frequency value. 
     
     
       10. A method according to  claim 9 , wherein said dependent-frequency value is the frequency of a primary key value in a frequency histogram of a foreign key column in said first join depth table. 
     
     
       11. A method according to  claim 1 , wherein said table is at a second join depth, said method further comprising:
 said coordinator unit reading rows from a table at a third join depth and distributing each said third join depth table to one of said worker units; and 
 said worker units computing frequency histograms for each column in said third join depth table using said merged frequency histograms from said second join depth tables. 
 
     
     
       12. A method comprising:
 reading rows in a table at a first join depth by a coordinator unit, said table part of a database system, wherein said table at a first join depth is a fact table; 
 distributing each read row to separate worker units; 
 computing a partial frequency histogram for each column in said table at said first join depth using each worker in parallel; 
 merging partial histograms from said worker units; 
 sending said merged frequency histograms to said worker units using said coordinator unit; 
 reading rows from said table at a second join depth by said coordinator unit, wherein said table at a second join depth is a dimension table; 
 distributing each read row at said second join depth to one of said worker units; 
 said worker units computing frequency histograms for each column in said table at said second join depth using said merged frequency histograms from said first join depth tables; and 
 outputting said computed frequency histograms at said second join depth. 
 
     
     
       13. A method according to  claim 12  further comprising
 determining column partitions based on said histograms; 
 determining a dictionary for each for each column partition; 
 making said dictionaries available to each worker units; 
 encoding data from which said frequency histograms were originally generated. 
 
     
     
       14. A method according to  claim 13  wherein said coordinator unit distributes rows evenly to worker units regardless of the data therein. 
     
     
       15. A method according to  claim 13  wherein said coordinator unit distributes rows to worker units based on the data in each row. 
     
     
       16. A database system comprising:
 a processor; and 
 computer storage storing computer readable program code, said computer readable program code executed by processor to:
 read rows from a table at a first join depth by a coordinator unit; 
 distribute each read row to separate worker units; 
 compute a partial frequency histogram for each column in said table at said first join depth using each worker unit in parallel; 
 merge partial histograms from said worker units; and 
 send said merged frequency histograms to said worker units using said coordinator unit; 
 read rows from said table at a second join depth by said coordinator unit; 
 distribute each read row at said second join depth to one of said worker units; 
 compute frequency histograms for each column in said table at said second join depth using said merged frequency histograms from said first join depth tables; and 
 output said computed frequency histograms at said second join depth. 
 
 
     
     
       17. A computer program product comprising a non-transitory computer useable medium having computer readable program code embodied therein which, when executed by a computer, implements a method for computing frequency histograms, said medium comprising:
 a computer usable medium having computer usable program code embodied therewith, said computer usable program code comprising: 
 computer usable program code configured to: 
 read rows in a table at a first join depth using a coordinator unit, said table part of a database system; 
 distribute each read row to separate worker units; 
 compute a partial frequency histogram for each column in said table at said first join depth using each worker unit in parallel; 
 merge partial histograms from said worker units; 
 send said merged frequency histograms to said worker units using said coordinator unit; 
 read rows from said table at a second join depth by said coordinator unit; 
 distribute each read row at said second join depth to one of said worker units; 
 compute frequency histograms for each column in said table at said second join depth using said merged frequency histograms from said first join depth tables; and 
 output said computed frequency histograms at said second join depth. 
 
     
     
       18. A computer program product according to  claim 17  wherein said coordinator unit distributes rows evenly to worker units regardless of the data therein. 
     
     
       19. A computer program product according to  claim 17  wherein said coordinator unit distributes rows to worker units based on the data in each row. 
     
     
       20. A computer program product according to  claim 17  wherein said worker units begin said computing as soon as a first row is received from said coordinator unit. 
     
     
       21. A computer program product according to  claim 17  wherein said worker units perform said computing before a join operation is performed on said table.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.