US11088925B2ActiveUtilityPatentIndex 53

Technologies for capacity remediation in multi-tenant cloud environments

Assignee: SALESFORCE COM INCPriority: Dec 4, 2017Filed: Jan 22, 2018Granted: Aug 10, 2021

Est. expiryDec 4, 2037(~11.4 yrs left)· nominal 20-yr term from priority

Inventors:BERTRAN ANA MORGENSTERN CARL KAWAMOTO DAISUKE ROAN NICHOLAS BOBROWSKI STEVE IYER SUDHISH LEE CHIN VASHI KUNAL RAHMAN ZAHID

H04L 41/147H04L 41/16H04L 41/5009H04L 41/5096H04L 41/5032G06F 16/254H04L 41/5025G06F 16/283

PatentIndex Score

Cited by

187

References

Claims

Abstract

Multitier, multitenant architecture of pods comprise multiple stacks with different metrics and workload compositions that constantly change over time. A computer system may identify an overall pod time-to-live (TTL) based on the changing metrics and workloads. The TTL may be a forecasted time that pod remediation is needed to avoid negative impact on pod performance and customer experience. Additionally, the computer system may identify the appropriate remediation(s) for each pod. The computer system may compare and prioritize remediations across a collection of pods with different configurations and workload characteristics based on the TTLs. Other embodiments may be described and/or claimed.

Claims

exact text as granted — not AI-modified

The invention claimed is: 
     
       1. One or more non-transitory computer-readable storage media (NTCRSM) having instructions stored thereon, wherein execution of the instructions by one or more processors of a computer system is operable to cause the computer system to:
 identify a set of performance metrics of a pod, the pod comprising a set of servers and data storage devices to provide on-demand services for one or more tenants of a database system; 
 correlate individual performance metrics of the set of performance metrics with corresponding ones of a set of service-level agreement (SLA) metrics of the pod to generate workload characteristics for the pod; 
 determine a pod remediation and a time-to-live (TTL) forecast for the pod based on the workload characteristics, the TTL forecast being an estimated time until the pod remediation is expected to occur; and 
 initiate performance of the pod remediation according to the TTL forecast such that the pod remediation is to take place before other pod remediations of other pods with other TTL forecasts that are greater than the TTL forecast. 
 
     
     
       2. The one or more NTCRSM of  claim 1 , wherein, to identify the set of performance metrics, execution of the instructions is operable to:
 identify a first set of performance metrics of an application tier of the pod; 
 identify a second set of performance metrics of a database (db) tier of the pod; 
 identify a third set of performance metrics of a storage tier of the pod; and 
 identify a fourth set of performance metrics of a search tier of the pod, 
 wherein the first set of performance metrics, second set of performance metrics, third set of performance metrics, and fourth set of performance metrics are different from one another. 
 
     
     
       3. The one or more NTCRSM of  claim 2 , wherein:
 the first set of performance metrics comprise application processor system utilization across individual servers of the set of servers; 
 the second set of performance metrics comprise a database processor system utilization of a plurality of db servers in relation to one another, a load balance among the set of servers in the pod, and a size of each database in the pod; 
 the third set of performance metrics comprise storage area network (SAN) input/output (IO) accesses and/or SAN utilization; and 
 the fourth set of performance metrics comprise a number of system accesses, an average page time (APT), and a number of transactions. 
 
     
     
       4. The one or more NTCRSM of  claim 3 , wherein, to correlate the set of performance metrics with the corresponding ones of a set of service-level agreement (SLA) metrics, execution of the instructions is operable to:
 identify the workload characteristics for each of the application tier, the db tier, the storage tier, and the search tier. 
 
     
     
       5. The one or more NTCRSM of  claim 2 , wherein, to determine the TTL forecast, execution of the instructions is operable to:
 determine an application tier TTL based on the first set of performance metrics; 
 determine a database tier TTL based on the second set of performance metrics; 
 determine a storage tier TTL based on the third set of performance metrics; 
 determine a search tier TTL based on the fourth set of performance metrics; and 
 combine the application tier TTL, the database tier TTL, the storage tier TTL, and the search tier TTL to obtain an overall TTL. 
 
     
     
       6. The one or more NTCRSM of  claim 5 , wherein, to determine the pod remediation, execution of the instructions is operable to:
 determine one or more driver metrics of the first set of performance metrics, the second set of performance metrics, the third set of performance metrics, and/or the fourth set of performance metrics, 
 the one or more driver metrics being metrics of the first, second, third, or fourth set of performance metrics that impact end-user performance experience greater than other metrics of the first or second set of performance metrics, 
 wherein the TTL forecast is determined based on a correlation between the one or more driver metrics and the respective SLA metrics. 
 
     
     
       7. The one or more NTCRSM of  claim 1 , wherein execution of the instructions is operable to:
 negate a biasing effect of biasing indications when the TTL forecast is to be determined, and the biasing indications comprise hardware changes, code release dates, code regressions, and holiday schedules. 
 
     
     
       8. The one or more NTCRSM of  claim 1 , wherein execution of the instructions is operable to:
 apply one or more machine learning (ML) models to the set of performance metrics to generate workload and/or TTL trend models, the one or more ML models comprising decision trees, several levels of quantile regressions, support vector machines, and/or Bayesian networks. 
 
     
     
       9. The one or more NTCRSM of  claim 1 , wherein execution of the instructions is operable to:
 feedback the determined TTL forecast to be correlated with a next TTL forecast. 
 
     
     
       10. A computer system comprising:
 an interface system to obtain metrics of a plurality of pods, the pod comprising a set of servers and data storage devices to provide on-demand services for one or more tenants of a database system; and 
 one or more-processors and a memory, the memory to store instructions that are executable by the one or more processors to:
 identify, for each pod of the plurality of pods, a set of performance metrics for each of a plurality of pod components; 
 correlate individual performance metrics of the set of performance metrics with corresponding ones of a set of service-level agreement (SLA) metrics of each pod to generate workload characteristics for the pod; 
 determine, for each pod, pod remediations and time-to-live (TTL) forecasts based on the workload characteristics, the TTL forecasts being estimated times until the pod remediation is expected to occur; and 
 initiate performance of the pod remediations according to the TTL forecasts such that an individual pod remediation of a pod is to take place before other pod remediations of other pods with other TTL forecasts that are greater than the TTL forecast of the individual pod. 
 
 
     
     
       11. The computer system of  claim 10 , wherein, to identify the set of performance metrics, the instructions are further executable by the one or more processors to:
 identify a first set of performance metrics of an application tier of the pod; 
 identify a second set of performance metrics of a database (db) tier of the pod; 
 identify a third set of performance metrics of a storage tier of the pod; and 
 identify a fourth set of performance metrics of a search tier of the pod, and 
 the first set of performance metrics, the second set of performance metrics, the third set of performance metrics, and the fourth set of performance metrics are different from one another. 
 
     
     
       12. The computer system of  claim 11 , wherein:
 the first set of performance metrics comprise application processor system utilization across the set of servers; 
 the second set of performance metrics comprise a database processor system utilization of a plurality of db servers in relation to one another, a load balance among the set of servers in the pod, and a size of each database in the pod; 
 the third set of performance metrics comprise storage area network (SAN) input/output (IO) accesses and/or SAN utilization; and 
 the fourth set of performance metrics comprise a number of system accesses, an average page time (APT), and a number of transactions. 
 
     
     
       13. The computer system of  claim 12 , wherein, to correlate the set of performance metrics with the corresponding ones of the set of SLA metrics, the one or more processors are to:
 identify the workload characteristics for each of the application tier, the db tier, the storage tier, and the search tier. 
 
     
     
       14. The computer system of  claim 11 , wherein, to determine the TTL forecast, the one or more processors to:
 determine an application tier TTL based on the first set of performance metrics; 
 determine a database tier TTL based on the second set of performance metrics; 
 determine a storage tier TTL based on the third set of performance metrics; 
 determine a search tier TTL based on the fourth set of performance metrics; and 
 combine the application tier TTL, the database tier TTL, the storage tier TTL, and the search tier TTL to obtain an overall TTL. 
 
     
     
       15. The computer system of  claim 14 , wherein, to determine the pod remediation, the one or more processors are to:
 determine one or more driver metrics of the first set of performance metrics, the second set of performance metrics, the third set of performance metrics, and/or the fourth set of performance metrics, 
 the one or more driver metrics being metrics of the first, second, third, or fourth set of performance metrics that impact end-user performance experience greater than other metrics of the first or second set of performance metrics, 
 wherein the TTL forecast is determined based on a correlation between the one or more driver metrics and the respective SLA metrics. 
 
     
     
       16. The computer system of  claim 10 , wherein the one or more processor systems are to:
 negate a biasing effect of biasing indications when the TTL forecast is to be determined, and the biasing indications comprising hardware changes, code release dates, code regressions, and holiday schedules. 
 
     
     
       17. The computer system of  claim 10 , wherein the one or more processors are to:
 apply one or more machine learning (ML) models to the set of performance metrics to generate workload and/or TTL trend models, the one or more ML models comprising decision trees, several levels of quantile regressions, support vector machines, and/or Bayesian networks. 
 
     
     
       18. The computer system of  claim 10 , wherein the one or more processors are to:
 feedback the determined TTL forecast to be correlated with a next TTL forecast. 
 
     
     
       19. The computer system of  claim 10 , wherein individual pods of the plurality of pods comprise one or more content batch servers, one or more content search servers, one or more query servers, one or more file force servers, one or more access control system (ACS) servers, one or more batch servers, one or more application servers, one or more database instances implemented by one or more data storage systems, one or more quick file systems (QFS), and one or more indexer servers. 
     
     
       20. The computer system of  claim 19 , wherein the computer system is implemented by a load balancer outside of each pod of the plurality of pods or an application server within a pod of the plurality of pods.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.