Performing cache update adaptation
Abstract
A database system includes an original data source storing pieces of original data and a cache source storing pieces of cached data, each associated with an accuracy value. A method of processing queries in the system includes: for each randomly selected client query, retrieving a first piece of cached data matching the query, and retrieving a first piece of original data matching the query; for non-selected client queries, retrieving a second piece of cached data matching the query; evaluating the accuracy value of the second piece of cached data; if the accuracy value is below a given threshold, retrieving a second piece of original data matching the query, and updating the second piece of cached data by the second piece of original data; and adapting a probabilistic model based on the first piece of cached data and the first piece of original data using a machine learning algorithm.
Claims
exact text as granted — not AI-modifiedThe invention claimed is:
1. A method of processing queries in a database system, the database system comprising an original data source storing a plurality of pieces of original data and a cache source storing a plurality of pieces of cached data, each piece of cached data being associated with an accuracy value, the method comprising:
randomly selecting queries from among a plurality of queries handled by the database system, at the time the respective query is received from a client;
for each of the queries randomly selected,
retrieving a first piece of the cached data matching the randomly selected query from the cache source, and
retrieving a first piece of the original data matching the randomly selected query from the original data source;
for queries not being randomly selected,
retrieving a second piece of the cached data matching the query from the cache source;
evaluating the accuracy value of the second piece of the cached data;
if the accuracy value is below a given threshold,
retrieving a second piece of the original data matching the query from the original data source, and
updating the second piece of the cached data by the second piece of the original data; and
adapting a probabilistic model based on the first piece of the cached data and the first piece of the original data using a machine learning algorithm,
wherein the accuracy value is derived from the probabilistic model and indicates a probability that a piece of the cached data coincides with a corresponding piece of the original data.
2. The method of claim 1 , further comprising:
for the queries randomly selected,
comparing the first piece of the cached data and the first piece of the original data;
updating the first piece of the cached data by the first piece of the original data, and returning the first piece of the original data to the client.
3. The method of claim 1 , further comprising:
for queries not being randomly selected,
if the accuracy value is below the given threshold, returning the second piece of the original data as a result to the client; and
otherwise, returning the second piece of the cached data as the result to the client.
4. The method of claim 1 , further comprising:
for each of the queries randomly selected,
sending information about the query, the first piece of the cached data and the first piece of the original data to a history database of the database system.
5. The method of claim 1 , further comprising:
comparing the first piece of the original data with the first piece of the cached data;
storing a first indicator value indicating whether the first piece of the original data coincides with the first piece of the cached data;
comparing the accuracy value of the first piece of the cached data and the given threshold;
storing a second indicator value indicating whether the accuracy value is below the given threshold; and
updating the given threshold based on pairs of the first and the second indicator values.
6. The method of claim 5 , wherein updating the given threshold comprises:
determining a polling rate and a polling efficiency based on the pairs of the first and second indicator values; and
updating the given threshold based on the polling rate and the polling efficiency.
7. The method of claim 6 , wherein
the polling rate is determined based on equation: (TP+FP)/(TP+FP+TN+FN), and
the polling efficiency is determined based on equation: TP/(TP+FP),
where:
TP is a number of said pairs wherein the first indicator value indicates that the piece of the original data does not coincide with the piece of the cached data, and the second indicator value indicates that the accuracy value is below the given threshold,
TN is a number of said pairs wherein the first indicator value indicates that the piece of the original data coincides with the piece of the cached data, and the second indicator value indicates that the accuracy value is not below the given threshold,
FP is a number of said pairs wherein the first indicator value indicates that the piece of the original data coincides with the piece of the cached data, and the second indicator value indicates that the accuracy value is below the given threshold, and
FN is a number of said pairs wherein the first indicator value indicates that the piece of the original data does not coincide with the piece of the cached data, and the second indicator value indicates that the accuracy value is not below the given threshold.
8. A database system for processing queries, comprising:
a memory containing an original data source storing a plurality of pieces of original data and a cache source storing a plurality of pieces of cached data, each piece of cached data being associated with an accuracy value
a processor coupled to the memory, the processor configured to:
randomly select queries from among a plurality of queries handled by the database system, at the time the respective query is received from a client;
for each of the queries randomly selected,
retrieve a first piece of the cached data matching the randomly selected query from the cache source, and
retrieve a first piece of the original data matching the randomly selected query from the original data source;
for queries not being randomly selected,
retrieve a second piece of the cached data matching the query from the cache source;
evaluate the accuracy value of the second piece of the cached data;
if the accuracy value is below a given threshold,
retrieve a second piece of the original data matching the query from the original data source, and
update the second piece of the cached data by the second piece of the original data; and
adapt a probabilistic model based on the first piece of the cached data and the first piece of the original data using a machine learning algorithm,
wherein the accuracy value is derived from the probabilistic model and indicates a probability that a piece of the cached data coincides with a corresponding piece of the original data.
9. The database system of claim 8 , wherein the processor is further configured to:
for the queries randomly selected,
compare the first piece of the cached data and the first piece of the original data;
update the first piece of the cached data by the first piece of the original data, and return the first piece of the original data to the client.
10. The database system of claim 8 , wherein the processor is further configured to:
for queries not being randomly selected,
if the accuracy value is below the given threshold, return the second piece of the original data as a result to the client; and
otherwise, return the second piece of the cached data as the result to the client.
11. The database system of claim 8 , wherein the processor is further configured to:
for each of the queries randomly selected,
send information about the query, the first piece of the cached data and the first piece of the original data to a history database of the database system.
12. The database system of claim 8 , wherein the processor is further configured to:
compare the first piece of the original data with the first piece of the cached data;
store a first indicator value indicating whether the first piece of the original data coincides with the first piece of the cached data;
compare the accuracy value of the first piece of the cached data and the given threshold;
store a second indicator value indicating whether the accuracy value is below the given threshold; and
update the given threshold based on pairs of the first and the second indicator values.
13. The database system of claim 12 , wherein the processor is configured to update the given threshold by:
determining a polling rate and a polling efficiency based on the pairs of the first and second indicator values; and
updating the given threshold based on the polling rate and the polling efficiency.
14. The database system of claim 13 , wherein the polling rate is determined based on equation: (TP+FP)/(TP+FP+TN+FN), and
wherein the polling efficiency is determined based on equation: TP/(TP+FP),
where:
TP is a number of said pairs wherein the first indicator value indicates that the piece of the original data does not coincide with the piece of the cached data, and the second indicator value indicates that the accuracy value is below the given threshold,
TN is a number of said pairs wherein the first indicator value indicates that the piece of the original data coincides with the piece of the cached data, and the second indicator value indicates that the accuracy value is not below the given threshold,
FP is a number of said pairs wherein the first indicator value indicates that the piece of the original data coincides with the piece of the cached data, and the second indicator value indicates that the accuracy value is below the given threshold, and
FN is a number of said pairs wherein the first indicator value indicates that the piece of the original data does not coincide with the piece of the cached data, and the second indicator value indicates that the accuracy value is not below the given threshold.
15. A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method according to claim 1 .Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.