P
US10474501B2ActiveUtilityPatentIndex 92

Serverless execution of code using cluster resources

Assignee: DATABRICKS INCPriority: Apr 28, 2017Filed: Apr 28, 2017Granted: Nov 12, 2019
Est. expiryApr 28, 2037(~10.8 yrs left)· nominal 20-yr term from priority
Inventors:GHODSI ALISHANKAR SRINATHPARANJPYE SAMEERXIN SHIZAHARIA MATEI
G06F 2209/5011G06F 9/5061G06F 2209/505G06F 9/5005
92
PatentIndex Score
50
Cited by
4
References
22
Claims

Abstract

A system for cluster resource allocation includes an interface and a processor. The interface is configured to receive a process and input data. The processor is configured to determine an estimate for resources required for the process to process the input data; determine existing available resources in a cluster for running the process; determine whether the existing available resources are sufficient for running the process; in the event it is determined that the existing available resources are not sufficient for running the process, indicate to add new resources; determine an allocated share of resources in the cluster for running the process; and cause execution of the process using the share of resources.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A system for cluster resource allocation, comprising:
 an interface configured to:
 receive a process and input data; and 
 
 a processor configured to:
 determine an estimate for resources required for the process to process the input data, comprising to:
 determine a parallelism level for one or more steps of the process, the parallelism level including a quantity of data per cluster worker, a number of cluster workers, or a quantity of data per processor; 
 determine an implementation type for one or more steps of the process, the implementation type including a sort algorithm, a join algorithm, or a filter algorithm; and 
 determine, based on historical usage data, the estimate for resources required for the process using the parallelism level and the implementation type; 
 
 determine existing available resources in a cluster for running the process; 
 determine whether the existing available resources are sufficient for running the process based at least in part on the estimate for resources required; 
 in response to determining that the existing available resources are not sufficient for running the process, indicate to add new resources to a cluster; 
 determine an allocated share of resources in the cluster for running the process; 
 cause execution of the process using the allocated share of resources, comprising to:
 provide an indication of the share of resources, the process, and the input data to a cluster driver; and 
 provide an indication to the cluster driver to execute the process of the input data using the share of resources. 
 
 
 
     
     
       2. The system of  claim 1 , wherein determining existing available resources in a cluster for running the process comprises determining resources that are not allocated to other processes. 
     
     
       3. The system of  claim 1 , wherein determining existing available resources in a cluster for running the process comprises determining resources that are not currently being used by other processes. 
     
     
       4. The system of  claim 1 , wherein determining existing available resources in a cluster for running the process comprises determining the estimated required resources for other processes running in the cluster. 
     
     
       5. The system of  claim 1 , wherein the estimate for resources required for the process to process the input data is based at least in part on one or more of the following: the process, the input data, or the historical usage data. 
     
     
       6. The system of  claim 1 , wherein the processor is further configured to:
 in response to determining that the existing available resources are not sufficient for running the process, indicate to add new resources if possible. 
 
     
     
       7. The system of  claim 1 , wherein the processor is further configured to:
 in response to determining that the existing available resources are more than sufficient for running the process, indicate to remove resources. 
 
     
     
       8. The system of  claim 1 , wherein the process comprises an associated scheduled running time. 
     
     
       9. The system of  claim 8 , wherein the processor is further configured to:
 in response to determining that the existing available resources are not sufficient for running the process, indicate to add new resources in advance of the scheduled running time. 
 
     
     
       10. The system of  claim 1 , wherein the allocated share of resources in the cluster comprises one or more of the following: a share determined by dividing the cluster resources evenly among users, a share determined by dividing the cluster resources according to a predetermined user allocation, an equal share for each running process, or a maximum amount of cluster resources. 
     
     
       11. The system of  claim 1 , wherein the allocated share of resources comprises one or more of the following: a number of worker machines, a number of processors, a number of processes, an amount of memory, a number of disks, a number of virtual machines, or a number of containers. 
     
     
       12. The system of  claim 1 , wherein the allocated share of resources comprises one of the following: an amount of resources, a fraction of resources, or a priority. 
     
     
       13. The system of  claim 1 , wherein the processor is further configured to:
 determine that the share of resources being used is greater than the allocated share of resources. 
 
     
     
       14. The system of  claim 13 , wherein the processor is further configured to:
 reduce a second process share of resources associated with a second process. 
 
     
     
       15. The system of  claim 14 , wherein the second process share of resources associated with the second process is not reduced below a threshold share. 
     
     
       16. The system of  claim 13 , wherein the processor is further configured to:
 indicate to add resources. 
 
     
     
       17. The system of  claim 16 , wherein resources are limited to a customer maximum allocation. 
     
     
       18. The system of  claim 1  wherein the driver is shared between users of a customer. 
     
     
       19. The system of  claim 1  wherein the driver is shared between users of different customers. 
     
     
       20. The system of  claim 1 , wherein the threshold comprises one or more of a percentage of the allocated resources or a number of resources. 
     
     
       21. A method for cluster resource allocation, comprising:
 receiving a process and input data; 
 determining, using a processor, an estimate for resources required for the process to process the input data, comprising:
 determining a parallelism level for one or more steps of the process, the parallelism level including a quantity of data per cluster worker, a number of cluster workers, or a quantity of data per processor; 
 determining an implementation type for one or more steps of the process, the implementation type including a sort algorithm a join algorithm or a filter algorithm; and 
 determining, based on historical usage data, the estimate for resources required for the process using the parallelism level and the implementation type; 
 
 determining existing available resources in a cluster for running the process; 
 determining whether the existing available resources are sufficient for running the process based at least in part on the estimate for resources required; 
 in response to determining that the existing available resources are not sufficient for running the process, indicating to add new resources to a cluster; 
 determining an allocated share of resources in the cluster for running the process; 
 causing execution of the process using the allocated share of resources, comprising:
 providing an indication of the share of resources, the process, and the input data to a cluster driver; and 
 providing an indication to the cluster driver to execute the process of the input data using the share of resources. 
 
 
     
     
       22. A computer program product for cluster resource allocation, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
 receiving a process and input data; 
 determining an estimate for resources required for the process to process the input data, comprising:
 determining a parallelism level for one or more steps of the process, the parallelism level including a quantity of data per cluster worker, a number of cluster workers, or a quantity of data per processor; 
 determining an implementation type for one or more steps of the process, the implementation type including a sort algorithm a join algorithm or a filter algorithm; and 
 determining, based on historical usage data, the estimate for resources required for the process using the parallelism level and the implementation type; 
 
 determining existing available resources in a cluster for running the process; 
 determining whether the existing available resources are sufficient for running the process based at least in part on the estimate for resources required; 
 in response to determining that the existing available resources are not sufficient for running the process, indicating to add new resources to a cluster; 
 determining an allocated share of resources in the cluster for running the process; 
 causing execution of the process using the allocated share of resources, comprising:
 providing an indication of the share of resources, the process, and the input data to a cluster driver; and 
 providing an indication to the cluster driver to execute the process of the input data using the share of resources.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.