P
US10831747B2ActiveUtilityPatentIndex 52

Multi stage aggregation using digest order after a first stage of aggregation

Assignee: IBMPriority: Apr 7, 2014Filed: Oct 26, 2018Granted: Nov 10, 2020
Est. expiryApr 7, 2034(~7.8 yrs left)· nominal 20-yr term from priority
Inventors:DICKIE GARTH A
G06F 7/36G06F 16/244
52
PatentIndex Score
0
Cited by
16
References
15
Claims

Abstract

According to embodiments of the present invention, methods, systems and computer-readable media are presented for processing a database query. The query may specify an arrangement for resulting data. A digest is generated for each of a plurality of database object elements. The plurality of database object elements are grouped or mapped into one or more groups based on the digest to arrange the database object elements in digest order. The database object elements from the one or more groups are extracted and/or processed in order of the digest, in accordance with the specified arrangement.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method of processing a database query including an aggregation operation and table grouping columns by a plurality of processing nodes each including a processor, comprising:
 generating, at each processing node, a digest for each of a plurality of database object elements of that processing node based on a first mathematical hash function applied to the table grouping columns that provides unordered digests across the database object elements; 
 mapping, at each processing node, the plurality of database object elements of that processing node into a plurality of groups within a hash table based on a second mathematical hash function applied to a corresponding digest, wherein the second mathematical hash function preserves order of the digests within the hash table; 
 extracting, at each processing node, the database object elements of that processing node sequentially from the plurality of groups in the hash table in digest order; 
 performing, at each processing node, the aggregation operation on the database object elements of that processing node in digest order to produce aggregation information; 
 transferring the database object elements and aggregation information in digest order between the plurality of processing nodes, wherein the database object elements within a same group and including a same digest are transferred to a same processing node; 
 applying, at each processing node, a merge sort to the transferred database object elements in digest order and aggregating the sorted database object elements with a same digest; and 
 producing resulting data for the database query grouped by the database table grouping columns based on the aggregated sorted database object elements. 
 
     
     
       2. The method of  claim 1 , wherein the query specifies an arrangement for resulting data. 
     
     
       3. The method of  claim 2 , wherein mapping the plurality of database object elements at each of the plurality of processing nodes includes:
 applying data of the database table grouping columns from the plurality of database object elements of that processing node to the first mathematical hash function to determine the digest; and 
 determining groups for the database object elements based on the digest. 
 
     
     
       4. The method of  claim 1 , further comprising:
 applying, at each processing node, data of the database object elements of that processing node to the hash table to determine database object elements within a same aggregation bucket. 
 
     
     
       5. The method of  claim 1 , further comprising:
 applying, at each processing node, data of additional database object elements of that processing node to the hash table to determine database object elements within a same aggregation bucket, wherein the hash table is in an approximate digest order based on the additional database object elements; and 
 extracting, at each processing node, the database object elements of that processing node from the hash table and placing the extracted database object elements in a fully-sorted digest order. 
 
     
     
       6. A system for processing a database query including an aggregation operation and table grouping columns comprising:
 a memory; and 
 a plurality of processing nodes each including a processor, wherein the plurality of processing nodes is configured to:
 generate at each processing node a digest for each of a plurality of database object elements of that processing node based on a first mathematical hash function applied to the table grouping columns that provides unordered digests across the database object elements; 
 map at each processing node the plurality of database object elements of that processing node into a plurality of groups within a hash table based on a second mathematical hash function applied to a corresponding digest, wherein the second mathematical hash function preserves order of the digests within the hash table; 
 extract at each processing node the database object elements of that processing node sequentially from the plurality of groups in the hash table in digest order; 
 perform at each processing node the aggregation operation on the database object elements of that processing node in digest order to produce aggregation information; 
 transfer the database object elements and aggregation information in digest order between the plurality of processing nodes, wherein the database object elements within a same group and including a same digest are transferred to a same processing node; 
 apply at each processing node a merge sort to the transferred database object elements in digest order and aggregate the sorted database object elements with a same digest; and 
 produce resulting data for the database query grouped by the database table grouping columns based on the aggregated sorted database object elements. 
 
 
     
     
       7. The system of  claim 6 , wherein the query specifies an arrangement for resulting data. 
     
     
       8. The system of  claim 7 , wherein each of the plurality of processing nodes is further configured to:
 apply data of the database table grouping columns from the plurality of database object elements of that processing node to the first mathematical hash function to determine the digest; and 
 determine groups for the database object elements based on the digest. 
 
     
     
       9. The system of  claim 6 , wherein each of the plurality of processing nodes is configured to apply data of the database object elements of that processing node to the hash table to determine database object elements within a same aggregation bucket. 
     
     
       10. The system of  claim 6 , wherein each of the plurality of processing nodes is configured to:
 apply data of additional database object elements of that processing node to the hash table to determine database object elements within a same aggregation bucket, wherein the hash table is in an approximate digest order based on the additional database object elements; and 
 extract the database object elements of that processing node from the hash table and place the extracted database object elements in a fully-sorted digest order. 
 
     
     
       11. A computer program product for processing a database query including an aggregation operation and table grouping columns, comprising one or more non-transitory computer readable storage media collectively having computer readable program code embodied therewith, the computer readable program code, when executed by a processor of a plurality of processing nodes, causes the plurality of processing nodes to: generate at each processing node a digest for each of a plurality of database object elements of that processing node based on a first mathematical hash function applied to the table grouping columns that provides unordered digests across the database object elements; map at each processing node the plurality of database object elements of that processing node into a plurality of groups within a hash table based on a second mathematical hash function applied to a corresponding digest, wherein the second mathematical hash function preserves order of the digests within the hash table; extract at each processing node the database object elements of that processing node sequentially from the plurality of groups in the hash table in digest order; perform at each processing node the aggregation operation on the database object elements of that processing node in digest order to produce aggregation information; transfer the database object elements and aggregation information in digest order between the plurality of processing nodes, wherein the database object elements within a same group and including a same digest are transferred to a same processing node; apply at each processing node a merge sort to the transferred database object elements in digest order and aggregate the sorted database object elements with a same digest; and produce resulting data for the database query grouped by the database table grouping columns based on the aggregated sorted database object elements. 
     
     
       12. The computer program product of  claim 11 , wherein the query specifies an arrangement for resulting data. 
     
     
       13. The computer program product of  claim 12 , wherein the computer readable program code is further configured to cause each of the plurality of processing nodes to:
 apply data of the database table grouping columns from the plurality of database object elements of that processing node to the first mathematical hash function to determine the digest; and 
 determine groups for the database object elements based on the digest. 
 
     
     
       14. The computer program product of  claim 11 , wherein the computer readable program code is further configured to cause each of the plurality of processing nodes to:
 apply data of the database object elements of that processing node to the hash table to determine database object elements within a same aggregation bucket. 
 
     
     
       15. The computer program product of  claim 11 , wherein the computer readable program code is further configured to cause each of the plurality of processing nodes to:
 apply data of additional database object elements of that processing node to the hash table to determine database object elements within a same aggregation bucket, wherein the hash table is in an approximate digest order based on the additional database object elements; and 
 extract the database object elements of that processing node from the hash table and place the extracted database object elements in a fully-sorted digest order.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.