P
US8856071B2ActiveUtilityPatentIndex 58

Minimizing staleness in real-time data warehouses

Assignee: GOLAB LUKASZPriority: Aug 11, 2009Filed: Aug 11, 2009Granted: Oct 7, 2014
Est. expiryAug 11, 2029(~3.1 yrs left)· nominal 20-yr term from priority
Inventors:GOLAB LUKASZBATENI MOHAMMAD HOSSEINHAJIAGHAYI MOHAMMADKARLOFF HOWARD
G06F 16/24532G06F 17/30445
58
PatentIndex Score
3
Cited by
29
References
16
Claims

Abstract

Data tables in data warehouses are updated to minimize staleness and stretch of the data tables. New data is received from external sources and, in response, update requests are generated. Accumulated update requests may be batched. Data tables may be weighted to affect the order in which update requests are serviced.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method of updating data tables stored in a data warehouse, the method comprising:
 storing, in memory, a plurality of data tables; 
 detecting, by a processor, incoming data for updating the plurality of data tables; 
 generating, by the processor, an update request associated with each data table in the plurality of data tables; 
 determining a calculated staleness for a portion of the plurality of data tables; 
 scheduling updates to the portion of the plurality of data tables based on the calculated staleness; 
 determining a stretch value for each one of the data tables in the portion of the plurality of data tables, the stretch value indicative of a maximum ratio between a duration of time a corresponding one of the updates waits until processing is finished and a length of the corresponding one of the updates, wherein scheduling data table updates is based at least in part on the stretch value; and 
 distributing the updates among a plurality of processors to minimize the calculated staleness; 
 transforming the portion of the plurality of data tables to include a different portion of the incoming data based on the scheduling. 
 
     
     
       2. The method of  claim 1 , further comprising determining a previous update of the portion of the plurality of data tables. 
     
     
       3. The method of  claim 1 , further comprising determining the update request is non-preemptible. 
     
     
       4. The method of  claim 1 , further comprising batching an accumulation of update requests. 
     
     
       5. The method of  claim 1 , further comprising performing the updates in a real-time data. 
     
     
       6. The method of  claim 5 , further comprising responsive to receiving a user request for the portion of the plurality of data tables. 
     
     
       7. The method of  claim 1 , further comprising weighting a first portion of the plurality of data tables higher than a second portion of the plurality of data tables. 
     
     
       8. The method of  claim 1 , further comprising determining the calculated staleness is a priority weighted staleness. 
     
     
       9. The method of  claim 8 , further comprising multiplying a first data table staleness by a first weight and multiplying a second data table staleness by a second weight. 
     
     
       10. The method of  claim 1 , further comprising appending new data to one of the plurality of data tables. 
     
     
       11. The method of  claim 1 , further comprising performing the updates at variable intervals. 
     
     
       12. A non-transitory computer readable medium storing computer instructions that when executed cause a processor to perform a method for managing a plurality of data tables in a data warehouse, the method comprising:
 maintaining the plurality of data tables in the data warehouse; 
 receiving requests to update, with incoming data, a portion of the plurality of data tables; 
 generating update requests corresponding to the requests to update; 
 determining calculated stalenesses for individual data tables of the portion of the plurality of data tables; 
 ranking the calculated stalenesses; 
 scheduling updates to the portion of the plurality of data tables based on the calculated stalenesses; 
 determining a stretch value for each one of the individual data tables in the portion of the plurality of data tables, the stretch value indicative of a maximum ratio between a duration of time a corresponding one of the updates waits until processing is finished and a length of the corresponding one of the updates, wherein scheduling data table updates is based at least in part on the stretch value; 
 distributing the updates among a plurality of processors to minimize the calculated stalenesses; and 
 transforming the portion of the plurality of data tables to include a different portion of the incoming data based on scheduling of the updates and the update requests. 
 
     
     
       13. The non-transitory computer readable medium of  claim 12 , further comprising weighting a first portion of the plurality of data tables higher than a second portion of the plurality of data tables, and wherein the scheduling is at least in part responsive to a result of the weighting. 
     
     
       14. The non-transitory computer readable medium of  claim 12 , further comprising appending the incoming data to the portion of the plurality of data tables. 
     
     
       15. A server for managing a data warehouse, the server comprising:
 a processor; and 
 a memory storing instructions that when executed cause the processor to perform operations, the operations comprising: 
 receiving incoming data for updating a plurality of data tables; 
 an interface for receiving incoming data for updating the plurality of data tables 
 determining calculated stalenesses for a portion of the plurality of data tables; 
 weighting a portion of the calculated stalenesses; 
 scheduling updates to the portion of the plurality of data tables based on the calculated stalenesses; 
 determining a stretch value for each one of the individual data tables in the portion of the plurality of data tables, the stretch value indicative of a maximum ratio between a duration of time a corresponding one of the updates waits until processing is finished and a length of the corresponding one of the updates, wherein scheduling data table updates is based at least in part on the stretch value; and 
 distributing the updates among a plurality of processors to minimize the calculated stalenesses. 
 
     
     
       16. The server of  claim 9 , wherein the operations further comprise batching the updates.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.