P
US11238372B2ActiveUtilityPatentIndex 71

Simulator-training for automated reinforcement-learning-based application-managers

Assignee: VMWARE INCPriority: Aug 27, 2018Filed: Jul 22, 2019Granted: Feb 1, 2022
Est. expiryAug 27, 2038(~12.1 yrs left)· nominal 20-yr term from priority
Inventors:NAG DEVYANKOV YANISLAVWANG DONGNIBURK GREGORY TSTEPHEN NICHOLAS MARK GRANT
G06F 30/27G06N 20/00G06F 18/24G06N 5/01G06N 3/092G06N 3/0499G06N 3/02G06F 30/20G06N 3/08G06N 3/006G06K 9/6267
71
PatentIndex Score
2
Cited by
8
References
20
Claims

Abstract

The current document is directed to methods and systems for simulation-based training of automated reinforcement-learning-based application managers. Simulators are generated from data collected from controlled computing environments controlled and may employ any of a variety of different machine-learning models to learn state-transition and reward models. The current disclosed methods and systems provide facilities for visualizing aspects of the models learned by a simulator and for initializing simulator models using domain information. In addition, the currently disclosed simulators employ weighted differences computed from simulator-generated and training-data state transitions for feedback to the machine-learning models to address various biases and deficiencies of commonly employed difference metrics in the context of training automated reinforcement-learning-based application managers.

Claims

exact text as granted — not AI-modified
The invention claimed is: 
     
       1. A simulation manager that generates and trains simulators that are used to train automated reinforcement-learning-based application managers, the simulation manager comprising:
 one or more computer systems, each having one or more processors, one or more memories, one or more data-storage devices, and one or more communications subsystems; and 
 processor instructions, stored in one or more of the one or more memories and one or more data-storage devices that, when executed by one or more of the processors, control the one or more computer systems to
 generate simulators that train automated reinforcement-learning-based application managers; 
 train the generated simulators to simulate a computing environment controlled by an automated reinforcement-learning-based application manager; and 
 provide a management interface to human domain experts for providing simulator-configuration input. 
 
 
     
     
       2. The simulation manager of  claim 1  wherein the simulator repeatedly receives a next action a and returns, in response, a next state s′ and a reward r. 
     
     
       3. The simulation manager of  claim 2  wherein the simulator implements a first parametrized function that receives a current state s and a next action a and returns the next state s′ and a second parametrized function that receives a state s and returns a reward r. 
     
     
       4. The simulation manager of  claim 3  wherein the simulation manager generates a simulator by:
 choosing one or more machine-learning models to implement the first parametrized function and the second parameterized function; and 
 initializes the machine-learning models. 
 
     
     
       5. The simulation manager of  claim 3  wherein the simulation manager trains the generated simulators to simulate a computing environment controlled by an automated reinforcement-learning-based application manager by:
 receiving data collected from a computing environment controlled by an automated reinforcement-learning-based application manager, the data including action/current-state/next-state triples; and 
 iteratively selecting a next action/current-state/next-state triple, 
 inputting the action to the simulator, 
 receiving an estimated next state and estimated reward from the simulator, 
 computing a difference metric from the current state and next state, 
 feeding the difference metric, action, and current state to the simulator, 
 which adjusts one or more parameters of the first parametrized function to improve estimation of the next state. 
 
     
     
       6. The simulation manager of  claim 5 
 wherein the current state and the next state are vectors containing metric and configuration elements; and 
 wherein the distance metric is computed as the sum of terms, each term i comprising the product of a weight w i  and the squared difference of the i th  elements of the current state and next state. 
 
     
     
       7. The simulation manager of  claim 6  wherein the weight w i , is the absolute value of the i th  element of a reward-function vector, where the second parameterized function uses a dot product of the reward-function vector and a state vector to estimate the reward r corresponding to the state. 
     
     
       8. The simulation manager of  claim 5  wherein the simulator learns the parameter values for the second parameterized function during training by optimizing the second parameterized function to produce rewards that would produce the action/current-state/next-state triples of the data collected from the computing environment controlled by the automated reinforcement-learning-based application manager. 
     
     
       9. The simulation manager of  claim 5 
 wherein the data collected from the computing environment controlled by the automated reinforcement-learning-based application manager includes rewards corresponding to the action/current-state/next-state triples; and 
 wherein the simulator adjusts one or more parameters of the second parameterized function in response to computed differences between the data rewards and corresponding estimated rewards. 
 
     
     
       10. The simulation manager of  claim 4  wherein the management interface provides, for one or more of the machine-learning models used to implement the first and second parameterized functions, simulator-configuration-input features through which machine-learning model parameters can be specified. 
     
     
       11. The simulation manager of  claim 10  wherein model parameters may include;
 the number of layers in a neural network or decision tree; 
 functions or logic associated with decision-tree nodes; 
 initial weights of neural-network nodes; 
 initial values of weights that multiple terms of linear combinations of terms; 
 the size and contents of state vectors; and 
 the number and contents of classifications. 
 
     
     
       12. The simulation manager of  claim 4  wherein the management interface provides a visualization feature that displays the reward surface for two selected elements of the state vector. 
     
     
       13. A method for training an automated reinforcement-learning-based application manager, the method comprising:
 generating a simulator; 
 training the generated simulator to simulate a computing environment controlled by an automated reinforcement-learning-based application manager; and 
 connecting the automated reinforcement-learning-based application manager to the simulator. 
 
     
     
       14. The method of  claim 13  wherein the simulator repeatedly receives a next action a from the automated reinforcement-learning-based application manager and returns, in response, a next state s′ and a reward r to the automated reinforcement-learning-based application manager. 
     
     
       15. The method of  claim 14  wherein the simulator implements a first parametrized function that receives a current state s and a next action a and returns the next state s′ and a second parametrized function that receives a states and returns a reward r, both the first and second parametrized functions implemented by more machine-learning models. 
     
     
       16. The method of  claim 14  wherein training the generated simulator to simulate a computing environment controlled by an automated reinforcement-learning-based application manager further comprises:
 receiving data collected from a computing environment controlled by an automated reinforcement-learning-based application manager, the data including action/current-state/next-state triples; and 
 iteratively selecting a next action/current-state/next-state triple, 
 inputting the action to the simulator, 
 receiving an estimated next state and estimated reward from the simulator, 
 computing a difference metric from the current state and next state, 
 feeding the difference metric, action, and current state to the simulator, which adjusts one or more parameters of the first parametrized function to improve estimation of the next state. 
 
     
     
       17. The method of  claim 16 
 wherein the current state and the next state are vectors containing metric and configuration elements; and 
 wherein the distance metric is computed as the sum of terms, each term i comprising the product of a weight w i  and the squared difference of the i th  elements of the current state and next state. 
 
     
     
       18. The method of  claim 17  wherein the weight w i  is the absolute value of the i th  element of a reward-function vector, where the second parameterized function uses a dot product of the reward-function vector and a state vector to estimate the reward r corresponding to the state. 
     
     
       19. The method of  claim 16  wherein the simulator learns the parameter values for the second parameterized function during training by optimizing the second parameterized function to produce rewards that would produce the action/current-state/next-state triples of the data collected from the computing environment controlled by the automated reinforcement-learning-based application manager. 
     
     
       20. A physical data-storage device encoded with computer instructions that, when executed by one or more processors of a computer system that implements a simulation manager having one or more processors, one or more memories, one or more data-storage devices, and one or more communications subsystems, controls the simulation manager to:
 generate simulators for training train automated reinforcement-learning-based application managers; 
 train the generated simulators to simulate a computing environment controlled by an automated reinforcement-learning-based application manager; and 
 provide a management interface to human domain experts for providing simulator-configuration input.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.