Technology system auto-recovery and optimality engine and techniques
Abstract
Disclosed are hardware and techniques for correcting computer process faults by identifying risk associated with correcting a computer process fault and computer processes that may depend on the corrected computer process. The interdependent computer processes in a network may be determined by evaluating a stream of process break flags from a monitoring component coupled to the network. Each computer process break flag in the stream of computer process break flags indicates a process fault detected by the monitoring component and is correlated to a corrective response. The break flag and the corrective response are assigned a risk. A risk matrix accounts for interdependencies between computer processes and identified corrective actions. A final response strategy that corrects the computer process faults is determined using the assigned risk and computer system interdependence. A runbook stores the final response strategy, which may be updated based on changing computer process interdependencies and assigned risk.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. An apparatus, comprising:
a memory storing programming code; and
a triage processing component coupled with the memory, the triage processing component to:
populate, for one or more corrective actions in a list of corrective actions, a risk assessment matrix with a break risk assessment value and a fix risk assessment value, the one or more corrective actions identified to correct a possible cause of a current event, the current event to indicate a potential operational breakdown of a computer process; and
generate, based on correlation of the break risk assessment values and fix risk assessment values in the risk assessment matrix, a response strategy incorporating at least one corrective action from the list of corrective actions.
2. The apparatus of claim 1 , the triage processing component further to identify interdependency risk patterns that indicate risks related to each corrective action in a runbook and an effect of applying each corrective action on the computer process in a network, wherein the runbook comprises the list of corrective actions.
3. The apparatus of claim 1 , wherein the break risk assessment values indicate a likelihood of occurrence of the potential operational breakdown of the computer process in a network and the triage processing component assigns at least one of the fix risk assessment values to each of the one or more corrective actions.
4. The apparatus of claim 3 , wherein:
the break risk assessment value has a range from a high likelihood value indicating the potential operational breakdown has a high likelihood of occurring to a low likelihood value indicating the potential operation breakdown has a low likelihood of occurring; and
the fix risk assessment value assigned to each of the identified corrective actions has a range from a value indicating the potential operational breakdown has a high likelihood of being fixed to a different value indicating the potential operation breakdown has a low likelihood of being fixed by the respective identified corrective action.
5. The apparatus of claim 1 , the triage processing component further to:
assign an interdependency rating to each corrective action in the list of corrective actions, wherein the interdependency rating quantifies a level of interdependence of the computer process on other computer processes potentially affected by application of each of the one or more corrective action in the list of corrective actions;
populate the risk assessment matrix with the assigned interdependency rating of each corrective action in the list of corrective actions; and
evaluate the risk assessment matrix, based on the assigned interdependency rating of each corrective action in the list of corrective actions to one another.
6. The apparatus of claim 1 , the triage processing component further to:
after an evaluation of the risk assessment matrix, flag a respective corrective action from the list of corrective actions as an optimal corrective action for use in the response strategy.
7. The apparatus of claim 6 , the triage processing component further to:
apply the flagged respective corrective action to the computer process experiencing a process fault associated with a computer process break flag in a network.
8. The apparatus of claim 1 , the triage processing component further to:
receive successive process break flags that follow previous process break flags from a monitoring circuit coupled to the triage processing component;
generate additional break risk assessment values and fix risk assessment values of the successive process break flags;
populate a copy of the risk assessment matrix using the additional break risk assessment values and the fix risk assessment values to produce a revised risk assessment matrix;
analyze the break risk assessment values and the fix risk assessment values of the previous process break flags in the populated risk assessment matrix with reference to the additional break risk assessment values and the fix risk assessment values of the successive process break flags in the revised risk assessment matrix; and
update, based on results of the analysis of the additional break risk assessment values and the fix risk assessment values and the break risk assessment values and the fix risk assessment values of the previous process break flags, a runbook to identify a corresponding corrective action in the list of corrective actions for implementing the corresponding corrective action to fix the potential operational breakdown.
9. A method, comprising:
populating, for one or more corrective actions in a list of corrective actions, a risk assessment matrix with a break risk assessment value and a fix risk assessment value, the one or more corrective actions identified to correct a possible cause of a current event, the current event to indicate a potential operational breakdown of a computer process; and
generating, based on correlation of the break risk assessment values and fix risk assessment values in the risk assessment matrix, a response strategy incorporating at least one corrective action from the list of corrective actions.
10. The method of claim 9 , further comprising:
applying the at least one corrective action in the response strategy to the computer process experiencing a process fault associated with a computer process break flag associated with the current event in a network environment.
11. The method of claim 9 , further comprising identifying interdependency risk patterns to indicate risks related to each corrective action in a runbook and an effect of applying each corrective action on the computer process in a network.
12. The method of claim 9 , further comprising:
assigning the break risk assessment value indicating a likelihood of occurrence of the potential operational breakdown of the computer process;
assigning a respective fix risk assessment value to each of the identified corrective actions; and
populating the risk assessment matrix with the assigned break risk assessment value of the computer process to each of the identified corrective actions and the assigned fix risk assessment value to each of the identified corrective actions.
13. The method of claim 9 , wherein:
the break risk assessment value has a range from a high likelihood value indicating the potential operational breakdown has a high likelihood of occurring to a low likelihood value indicating the potential operation breakdown has a low likelihood of occurring; and
the fix risk assessment value assigned to each of the identified corrective actions has a range from a value indicating the potential operational breakdown has a high likelihood of being fixed to a different value indicating the potential operation breakdown has a low likelihood of being fixed by a respective identified corrective action.
14. The method of claim 9 , further comprising:
assigning an interdependency rating to each corrective action in the list of corrective actions, wherein the interdependency rating quantifies a level of interdependence of each of the computer processes potentially affected by application of each corrective action in the list of corrective actions;
populating the risk assessment matrix with the assigned interdependency rating of each corrective action in the list of corrective actions;
evaluating the risk assessment matrix, based on the assigned interdependency rating of each corrective action in the list of corrective actions to one another; and
based on the evaluation of the risk assessment matrix, flagging a respective corrective action from the list of corrective actions as an optimal corrective action.
15. The method of claim 14 , wherein the interdependency rating assigned to each corrective action quantifies a level of interdependence of each respective individual computer process affected by the application of each of corrective action in the list of corrective actions.
16. A non-transitory computer-readable storage medium storing computer-readable program code executable by a processor, wherein execution of the computer-readable program code causes the processor to:
populate, for one or more corrective actions in a list of corrective actions, a risk assessment matrix with a break risk assessment value and a fix risk assessment value, the one or more corrective actions identified to correct a possible cause of a current event, the current event to indicate a potential operational breakdown of a computer process; and
generate, based on correlation of the break risk assessment values and fix risk assessment values in the risk assessment matrix, a response strategy incorporating at least one corrective action from the list of corrective actions.
17. The non-transitory computer-readable storage medium of claim 16 , wherein execution of the computer-readable program code further causes the processor to:
apply the response strategy to the computer process experiencing a process fault associated with a process break flag of the current event in a network environment.
18. The non-transitory computer-readable storage medium of claim 16 , wherein execution of the computer-readable program code further causes the processor to:
assign the break risk assessment value indicating a likelihood of occurrence of the potential operational breakdown of the computer process; and
assign a fix risk assessment value to each of the identified corrective actions.
19. The non-transitory computer-readable storage medium of claim 18 , wherein execution of the computer-readable program code further causes the processor to:
identify interdependency risk patterns indicate risks related to each corrective action in a runbook and an effect of applying each corrective on the computer process in a network.
20. The non-transitory computer-readable storage medium of claim 16 , wherein execution of the computer-readable program code further causes the processor to:
assign an interdependency rating to each corrective action in the list of corrective actions, wherein the interdependency rating quantifies a level of interdependence of each of the computer processes potentially affected by application of each corrective action in the list of corrective actions;
populate the risk assessment matrix with the assigned interdependency rating of each corrective action in the list of corrective actions;
evaluate the risk assessment matrix, based on the assigned interdependency rating of each corrective action in the list of corrective actions to one another; and
based on the evaluation of the risk assessment matrix, flag a respective corrective action from the list of corrective actions as an optimal corrective action for use in the response strategy.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.