US8856071B2ActiveUtilityPatentIndex 58
Minimizing staleness in real-time data warehouses
Est. expiryAug 11, 2029(~3.1 yrs left)· nominal 20-yr term from priority
G06F 16/24532G06F 17/30445
58
PatentIndex Score
3
Cited by
29
References
16
Claims
Abstract
Data tables in data warehouses are updated to minimize staleness and stretch of the data tables. New data is received from external sources and, in response, update requests are generated. Accumulated update requests may be batched. Data tables may be weighted to affect the order in which update requests are serviced.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. A method of updating data tables stored in a data warehouse, the method comprising:
storing, in memory, a plurality of data tables;
detecting, by a processor, incoming data for updating the plurality of data tables;
generating, by the processor, an update request associated with each data table in the plurality of data tables;
determining a calculated staleness for a portion of the plurality of data tables;
scheduling updates to the portion of the plurality of data tables based on the calculated staleness;
determining a stretch value for each one of the data tables in the portion of the plurality of data tables, the stretch value indicative of a maximum ratio between a duration of time a corresponding one of the updates waits until processing is finished and a length of the corresponding one of the updates, wherein scheduling data table updates is based at least in part on the stretch value; and
distributing the updates among a plurality of processors to minimize the calculated staleness;
transforming the portion of the plurality of data tables to include a different portion of the incoming data based on the scheduling.
2. The method of claim 1 , further comprising determining a previous update of the portion of the plurality of data tables.
3. The method of claim 1 , further comprising determining the update request is non-preemptible.
4. The method of claim 1 , further comprising batching an accumulation of update requests.
5. The method of claim 1 , further comprising performing the updates in a real-time data.
6. The method of claim 5 , further comprising responsive to receiving a user request for the portion of the plurality of data tables.
7. The method of claim 1 , further comprising weighting a first portion of the plurality of data tables higher than a second portion of the plurality of data tables.
8. The method of claim 1 , further comprising determining the calculated staleness is a priority weighted staleness.
9. The method of claim 8 , further comprising multiplying a first data table staleness by a first weight and multiplying a second data table staleness by a second weight.
10. The method of claim 1 , further comprising appending new data to one of the plurality of data tables.
11. The method of claim 1 , further comprising performing the updates at variable intervals.
12. A non-transitory computer readable medium storing computer instructions that when executed cause a processor to perform a method for managing a plurality of data tables in a data warehouse, the method comprising:
maintaining the plurality of data tables in the data warehouse;
receiving requests to update, with incoming data, a portion of the plurality of data tables;
generating update requests corresponding to the requests to update;
determining calculated stalenesses for individual data tables of the portion of the plurality of data tables;
ranking the calculated stalenesses;
scheduling updates to the portion of the plurality of data tables based on the calculated stalenesses;
determining a stretch value for each one of the individual data tables in the portion of the plurality of data tables, the stretch value indicative of a maximum ratio between a duration of time a corresponding one of the updates waits until processing is finished and a length of the corresponding one of the updates, wherein scheduling data table updates is based at least in part on the stretch value;
distributing the updates among a plurality of processors to minimize the calculated stalenesses; and
transforming the portion of the plurality of data tables to include a different portion of the incoming data based on scheduling of the updates and the update requests.
13. The non-transitory computer readable medium of claim 12 , further comprising weighting a first portion of the plurality of data tables higher than a second portion of the plurality of data tables, and wherein the scheduling is at least in part responsive to a result of the weighting.
14. The non-transitory computer readable medium of claim 12 , further comprising appending the incoming data to the portion of the plurality of data tables.
15. A server for managing a data warehouse, the server comprising:
a processor; and
a memory storing instructions that when executed cause the processor to perform operations, the operations comprising:
receiving incoming data for updating a plurality of data tables;
an interface for receiving incoming data for updating the plurality of data tables
determining calculated stalenesses for a portion of the plurality of data tables;
weighting a portion of the calculated stalenesses;
scheduling updates to the portion of the plurality of data tables based on the calculated stalenesses;
determining a stretch value for each one of the individual data tables in the portion of the plurality of data tables, the stretch value indicative of a maximum ratio between a duration of time a corresponding one of the updates waits until processing is finished and a length of the corresponding one of the updates, wherein scheduling data table updates is based at least in part on the stretch value; and
distributing the updates among a plurality of processors to minimize the calculated stalenesses.
16. The server of claim 9 , wherein the operations further comprise batching the updates.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.