US5872990AExpiredUtilityPatentIndex 92
Reordering of memory reference operations and conflict resolution via rollback in a multiprocessing environment

Assignee: IBMPriority: Jan 7, 1997Filed: Jan 7, 1997Granted: Feb 16, 1999
Est. expiryJan 7, 2017(expired)· nominal 20-yr term from priority
Inventors:LUICK DAVID ARNOLD WILLIS JOHN CHRISTOPHER WINTERFIELD PHILIP BRAUN
G06F 9/3863G06F 9/3824G06F 8/445G06F 9/3834G06F 9/52
PatentIndex Score
Cited by
References
Claims
Abstract

Compile and/or run time instruction scheduling is used in a multiprocessing system to reorder memory access instructions such that a strongly consistent programming model is emulated in a fashion transparent to the programmer. The multiprocessing system detects potential shared memory conflicts, avoiding these conflicts by restarting operation of the affected processing unit at a predetermined previous state, previously archived in a rollback register set, and resuming instruction execution from that state.
Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method of expediting memory reference operations in a multiprocessing apparatus that includes (1) multiple processors including first and second processors associated with respective first and second processor instruction streams each stream containing an ordered sequence of processor instructions for execution by the respective processor, (2) first and second machine register sets coupled to the first and second processors, respectively, each machine register set having contents that define a state of the respective processor, (3) first and second rollback register sets coupled to the first and second processors, respectively, and (4) a shared cache accessible by each of the multiple processors and having a plurality of storage locations, the method comprising the steps of: reordering instructions of the first instruction stream to optimize execution of the first instruction stream by the first processor, the reordering including shifting of a LOAD instruction from a later position to an earlier position in the first instruction stream, the LOAD instruction directing the first processor to access a first one of the locations in the shared cache;   establishing checkpoints in the first instruction stream according to a predetermined schedule, intervening instructions between successive checkpoints defining rollback windows;   defining a load percolation window including all instructions between the LOAD instruction&#39;s earlier and later positions in the first instruction stream;   sequentially executing instructions of the first instruction stream while concurrently sequentially executing instructions of the second instruction stream;   during execution of the first instruction stream, at each checkpoint backing-up contents of the first machine register set into the first rollback register set; and   determining whether the second processor performed any STORE operations to the first location during the load percolation window, and if so, operating the first processor to perform steps comprising: halting execution of the first instruction stream;   restoring the first machine register set to its state at the beginning of the rollback window containing the earlier LOAD by copying corresponding contents of the first rollback register set;   re-executing instructions of the first instruction stream in the rollback window containing the earlier LOAD; and   resuming execution of the first instruction stream starting at a next instruction immediately after the rollback window containing the earlier LOAD.     
     
     
       2. The method of claim 1, further comprising the steps of: reordering instructions of the second instruction stream to optimize execution of the second instruction stream by the second processor, the reordering including shifting of a second LOAD instruction from a later position to an earlier position in the second instruction stream, the second LOAD instruction directing the second processor to access a second one of the locations in the shared cache;   establishing second checkpoints according to a predetermined schedule, intervening instructions between successive checkpoints defining rollback windows;   defining a second load percolation window including all instructions between the second LOAD instruction&#39;s earlier and later positions in the second instruction stream;   sequentially executing instructions of the second instruction stream while concurrently sequentially executing instructions of the second instruction stream;   during execution of the second instruction stream, at each second checkpoint backing-up content of the second machine register set into the second rollback register set; and   determining whether the first processor performed any STORE operations to the second location in the shared cache during the second load percolation window, and if so, operating the second processor to perform steps comprising: halting execution of the second instruction stream;   restoring the second machine register set to its state at the beginning of the rollback window containing the earlier second LOAD by copying corresponding contents of the second rollback register set;   re-executing instructions of the second instruction stream in the rollback window containing the earlier second LOAD; and   resuming execution of the second instruction stream starting at a next instruction immediately after the rollback window containing the earlier second LOAD.     
     
     
       3. The method of claim 1, where STORE operations of the first processor are initially made to a temporary STORE buffer and subsequently committed to the shared cache upon satisfaction of a predetermined criteria. 
     
     
       4. The method of claim 3, the predetermined criteria comprising a condition that, for any LOAD operations in the first instruction stream dependent upon results of the buffered STORE operations, said LOAD operations are free from conflict with the second processor. 
     
     
       5. The method of claim 3, the predetermined criteria comprising progression of instruction execution by the first processor past two checkpoints after the STORE instruction. 
     
     
       6. The method of claim 1, the predetermined schedule specifying checkpoints occurring at a constant interval in the instruction stream, said constant interval being a predetermined rollback window length. 
     
     
       7. The method of claim 1, the backing-up step only maintaining copies of machine register set contents corresponding to two preceding checkpoints immediately prior to the current instruction in the first instruction stream. 
     
     
       8. The method of claim 1, the reordering step being performed by a compiler in advance of the establishing, defining, sequentially executing, backing-up, and determining steps. 
     
     
       9. The method of claim 1, the reordering step being performed concurrently with at least one of the establishing, defining, sequentially executing, backing-up, and determining steps. 
     
     
       10. The method of claim 1, the reordering step being performed by load percolation scheduling. 
     
     
       11. The method of claim 1, all rollback windows having a common, fixed length, the reordering step limiting each LOAD percolation window to a maximum length equal to the fixed length of rollback window. 
     
     
       12. The method of claim 1, each location in the shared cache being a cache line. 
     
     
       13. The method of claim 1, the shared cache comprising random access memory. 
     
     
       14. The method of claim 1, wherein executed instructions of the first instruction stream having a predetermined recency are archived by the multiprocessing apparatus, the re-executing step comprising the steps of obtaining archived instructions of the rollback window in the first instruction stream and re-executing the obtained instructions. 
     
     
       15. The method of claim 1, the first processor comprising a microprocessor. 
     
     
       16. A method for rollback conflict recovery emulating strong consistency in a shared cache multiprocessing system having multiple processors each with an associated instruction stream, said method comprising the following steps performed for each processor and its associated instruction stream: reordering instructions of the instruction stream to optimize the processor&#39;s execution of the associated instruction stream, the reordering including shifting of a LOAD instruction to an earlier position in the instruction stream, the LOAD instruction directing the processor to access a first one of the locations in the shared cache;   defining a load percolation window including all instructions between the earlier and later positions of the LOAD instruction in the instruction stream;   selecting an unexecuted instruction next in sequence in the instruction stream, and performing steps comprising: if the selected instruction is a STORE command to store a value in the shared cache, storing the value in a temporary queue;   if the selected instruction is a LOAD command to obtain contents of a first location in the shared cache, determining whether a conflict exists by determining whether another processor has performed any STORE operations to the first location during the load percolation window, and if no conflict exists executing the LOAD command, otherwise resolving the conflict by performing steps comprising: halting the processor&#39;s execution of instructions in the instruction stream;     restoring the processor to a previous state, experienced by the processor upon execution of a previously executed instruction in the instruction stream; and   re-starting the processor&#39;s execution of the instruction stream at a point immediately following the previously executed instruction and then continuing by sequentially executing subsequent instructions of the instruction stream.     
     
     
       17. The method of claim 16, each processor having an associated machine register set and rollback register set, the method further comprising the steps of: establishing checkpoints in the first instruction stream according to a predetermined schedule, intervening instructions between successive checkpoints defining rollback windows; and   during execution of each instruction corresponding to a checkpoint in the instruction stream, backing-up content of the processor&#39;s machine register set into the processor&#39;s rollback register set.   
     
     
       18. The method of claim 16, the restoring step comprising restoring contents of the processor&#39;s machine register set to its state at the beginning of the rollback widow containing the earlier LOAD by copying corresponding contents of the first rollback register set. 
     
     
       19. The method of claim 16, the re-starting step comprising the steps of re-executing instructions of the rollback window containing the earlier LOAD and then resuming execution of the instruction stream starting at a next instruction immediately after the rollback window containing the earlier LOAD. 
     
     
       20. The method of claim 16, where STORE operations of the first processor are initially made to a temporary STORE buffer and subsequently committed to the shared cache upon satisfaction of a predetermined criteria. 
     
     
       21. The method of claim 20, the predetermined criteria comprising a condition that, for any LOAD operations in the first instruction stream dependent upon results of the buffered STORE operations, said LOAD operations are free from conflict with the second processor. 
     
     
       22. The method of claim 20, the predetermined criteria comprising progression of instruction execution by the first processor past two checkpoints after the STORE instruction. 
     
     
       23. The method of claim 16, the predetermined schedule specifying checkpoints occurring at a constant interval in the instruction stream, said constant interval being a predetermined rollback window length. 
     
     
       24. The method of claim 16, the backing-up step only maintaining copies of machine register set contents corresponding to two preceding checkpoints immediately prior to the current instruction in the first instruction stream. 
     
     
       25. The method of claim 16, the reordering step being performed by a compiler in advance of the establishing, defining, sequentially executing, backing-up, and determining steps. 
     
     
       26. The method of claim 16, the reordering step being performed concurrently with at least one of the establishing, defining, sequentially executing, backing-up, and determining steps. 
     
     
       27. The method of claim 16, the reordering step being performed by load percolation scheduling. 
     
     
       28. The method of claim 17, all rollback windows having a common, fixed length, the reordering step limiting each LOAD percolation window to a maximum length equal to the fixed length of rollback window. 
     
     
       29. The method of claim 16, each location in the shared cache being a cache line. 
     
     
       30. The method of claim 16, the shared cache comprising random access memory. 
     
     
       31. The method of claim 17, wherein executed instructions of the first instruction stream having a predetermined recency are archived by the multiprocessing apparatus, the re-executing step comprising the steps of obtaining archived instructions of the rollback window in the first instruction stream and re-executing the obtained instructions. 
     
     
       32. The method of claim 16, each processor comprising a microprocessor. 
     
     
       33. A digital data multiprocessing apparatus, comprising: multiple processors including first and second processors associated with first and second processor instruction streams, respectively, each stream containing an ordered sequence of processor instructions for execution by the respective processor;   a first machine register set coupled to the first processor and having contents that define a state of the first processor;   a first rollback register set coupled to the first processor; and   a shared cache accessible by each of the multiple processors and having a plurality of storage locations;   the first processor being programmed to perform steps comprising: receiving a reordered version of the first instruction stream, said reordering having been performed by a compiler to optimize execution of the first instruction stream by the first processor, the reordering having shifted of a LOAD instruction to an earlier position in the first instruction stream, the LOAD instruction directing the first processor to access a first one of the locations in the shared cache;     establishing checkpoints in the first instruction stream according to a predetermined schedule, intervening instructions between successive checkpoints defining rollback windows;   defining a load percolation window including all instructions between the LOAD instruction&#39;s earlier and later positions in the first instruction stream;   sequentially executing instructions of the first instruction stream while concurrently sequentially executing instructions of the second instruction stream;   during execution of the first instruction stream, at each checkpoint backing-up contents of the first machine register set into the first rollback register set; and   determining whether the second processor performed any STORE operations to the first location during the load percolation window, and if so, operating the first processor to perform steps comprising: halting execution of the first instruction stream;   restoring the first machine register set to its state at the beginning of the rollback window containing the earlier LOAD by copying corresponding contents of the first rollback register set;   re-executing instructions of the first instruction stream in the rollback window containing the earlier LOAD; and   resuming execution of the first instruction stream starting at a next instruction immediately after the rollback window containing the earlier LOAD.     
     
     
       34. The apparatus of claim 33, further comprising: a second machine register set coupled to the second processor and having contents that define a state of the second processor; and   a second rollback register set coupled to the second processor;   the second processor being programmed to perform method steps comprising: receiving a reordered version of the second instruction stream, said reordering having been performed by a compiler to optimize execution of the second instruction stream by the second processor, the reordering including shifting of a LOAD instruction from a later position to an earlier position in the second instruction stream, the LOAD instruction directing the second processor to access a second one of the locations in the shared cache;     establishing second checkpoints according to a predetermined schedule, backing-up content of the second machine register set into the second rollback register set, intervening instructions between successive checkpoints defining rollback windows;   defining a second load percolation window including all instructions between the second LOAD instruction&#39;s earlier and later positions in the second instruction stream;   sequentially executing instructions of the second instruction stream while concurrently sequentially executing instructions of the second instruction stream;   during execution of the second instruction stream, at each second checkpoint backing-up content of the second machine register set into the second rollback register set; and   determining whether the first processor performed any STORE operations to the second location in the shared cache during the second load percolation window, and if so, operating the second processor to perform steps comprising: halting execution of the second instruction stream;   restoring the second machine register set to its state at the beginning of the rollback window containing the earlier second LOAD by copying corresponding contents of the second rollback register set;   re-executing instructions of the second instruction stream in the rollback window containing the earlier second LOAD; and   resuming execution of the second instruction stream starting at a next instruction immediately after the rollback window containing the earlier second LOAD.     
     
     
       35. The apparatus of claim 33, where STORE operations of the first processor are initially made to a temporary STORE buffer and subsequently committed to the shared cache upon satisfaction of a predetermined criteria. 
     
     
       36. The apparatus of claim 35, the predetermined criteria comprising a condition that, for any LOAD operations in the first instruction stream dependent upon results of the buffered STORE operations, said LOAD operations are free from conflict with the second processor. 
     
     
       37. The apparatus of claim 35, the predetermined criteria comprising progression of instruction execution by the first processor past two checkpoints after the STORE instruction. 
     
     
       38. The apparatus of claim 33, the predetermined schedule specifying checkpoints occurring at a constant interval in the instruction stream, said constant interval being a predetermined rollback window length. 
     
     
       39. The apparatus of claim 33, the backing-up step only maintaining copies of machine register set contents corresponding to two preceding checkpoints immediately prior to the current instruction in the first instruction stream. 
     
     
       40. The apparatus of claim 33, further comprising a compiler to perform the reordering step in advance of the establishing, defining, sequentially executing, backing-up, and determining steps. 
     
     
       41. The apparatus of claim 40, the compiler implementing load percolation scheduling. 
     
     
       42. The apparatus of claim 33, the reordering step being performed concurrently with at least one of the establishing, defining, sequentially executing, backing-up, and determining steps. 
     
     
       43. The apparatus of claim 33, all rollback windows having a common, fixed length, the reordering step limiting each LOAD percolation window to a maximum length equal to the fixed length of rollback window. 
     
     
       44. The apparatus of claim 33, each location in the shared cache being a cache line. 
     
     
       45. The apparatus of claim 33, the shared cache comprising random access memory. 
     
     
       46. The apparatus of claim 33, wherein executed instructions of the first instruction stream having a predetermined recency are archived by the multiprocessing apparatus, the re-executing step comprising the steps of obtaining archived instructions of the rollback window in the first instruction stream and re-executing the obtained instructions. 
     
     
       47. The apparatus of claim 32, the first processor comprising a microprocessor. 
     
     
       48. A shared cache multiprocessing system emulating strong consistency, said multiprocessing system having multiple processors each with an associated instruction stream, each said processor being programmed to perform the following steps: reordering instructions of the instruction stream to optimize the processor&#39;s execution of the associated instruction stream, the reordering including shifting of a LOAD instruction to an earlier position in the instruction stream, the LOAD instruction directing the processor to access a first one of the locations in the shared cache;   defining a load percolation window including all instructions between the earlier and later positions of the LOAD instruction in the instruction stream;   selecting an unexecuted instruction next in sequence in the instruction stream, and performing steps comprising: if the selected instruction is a STORE command to store a value in the shared cache, storing the value in a temporary queue;   if the selected instruction is a LOAD command to obtain contents of a first location in the shared cache, determining whether a conflict exists by determining whether another processor has performed any STORE operations to the first location during the load percolation window, and if no conflict exists executing the LOAD command, otherwise resolving the conflict by performing steps comprising: halting the processor&#39;s execution of instructions in the instruction stream;   restoring the processor to a previous state experienced by the processor upon execution of a previously executed instruction in the instruction stream; and   re-starting the processor&#39;s execution of the instruction stream at a point immediately following the previously executed instruction and then continuing by sequentially executing subsequent instructions of the instruction stream.       
     
     
       49. The system of claim 48, each processor having an associated machine register set and rollback register set, the method further comprising the steps of: establishing checkpoints in the first instruction stream according to a predetermined schedule, backing-up content of the processor&#39;s machine register set into the processor&#39;s rollback register set, intervening instructions between successive checkpoints defining rollback windows; and   during execution of each instruction corresponding to a checkpoint in the instruction stream, backing-up content of the processor&#39;s machine register set into the processor&#39;s rollback register set.   
     
     
       50. The system of claim 48, the restoring step comprising restoring contents of the processor&#39;s machine register set to its state at the beginning of the rollback widow containing the earlier LOAD by copying corresponding contents of the first rollback register set. 
     
     
       51. The system of claim 48, the re-starting step comprising the steps of re-executing instructions of the rollback window containing the earlier LOAD and then resuming execution of the instruction stream starting at a next instruction immediately after the rollback window containing the earlier LOAD. 
     
     
       52. The system of claim 48, where STORE operations of the first processor are initially made to a temporary STORE buffer and subsequently committed to the shared cache upon satisfaction of a predetermined criteria. 
     
     
       53. The system of claim 52, the predetermined criteria comprising a condition that, for any LOAD operations in the first instruction stream dependent upon results of the buffered STORE operations, said LOAD operations are free from conflict with the second processor. 
     
     
       54. The system of claim 52, the predetermined criteria comprising progression of instruction execution by the first processor past two checkpoints after the STORE instruction. 
     
     
       55. The system of claim 48, the predetermined schedule specifying checkpoints occurring at a constant interval in the instruction stream, said constant interval being a predetermined rollback window length. 
     
     
       56. The system of claim 48, the backing-up step only maintaining copies of machine register set contents corresponding to two preceding checkpoints immediately prior to the current instruction in the first instruction stream. 
     
     
       57. The system of claim 48, the reordering step being performed by a compiler in advance of the establishing, defining, sequentially executing, backing-up, and determining steps. 
     
     
       58. The system of claim 57, the compiler implementing load percolation scheduling. 
     
     
       59. The system of claim 48, the reordering step being performed concurrently with at least one of the backing-up, defining, sequentially executing, and determining steps. 
     
     
       60. The system of claim 49, all rollback windows having a common, fixed length, the reordering step limiting each LOAD percolation window to a maximum length equal to the fixed length of rollback window. 
     
     
       61. The system of claim 48, each location in the shared cache being a cache line. 
     
     
       62. The system of claim 48, the shared cache comprising random access memory. 
     
     
       63. The system of claim 49, wherein executed instructions of the first instruction stream of a predetermined recency are stored by the multiprocessing apparatus, the re-executing step comprising the steps of obtaining instructions of the first instruction stream from the first rollback window and re-executing the obtained instructions. 
     
     
       64. The system of claim 48, each processor comprising a microprocessor.
Cited by (0)

No later patents cite this yet.
References (0)

No backward citations on record.