P
US12436897B2ActiveUtilityPatentIndex 53

Cache management using eviction priority based on memory reuse

Assignee: NVIDIA CORPPriority: Mar 7, 2023Filed: Mar 7, 2023Granted: Oct 7, 2025
Est. expiryMar 7, 2043(~16.7 yrs left)· nominal 20-yr term from priority
Inventors:KOREM NOAM DORPHARRIS BRIAN SCOTTSUBAG JACOB
Y02D10/00G06N 3/0464G06F 12/126G06N 3/045G06N 3/063
53
PatentIndex Score
0
Cited by
19
References
20
Claims

Abstract

Apparatuses, systems, and techniques to manage a cache located on a processor of a computing system using eviction priority based on based on memory reuse. Memory addresses associated with a workload of an application executing using the processor are identified. An amount of reuse of the memory addresses corresponding to the workload is determined. A cache management policy for the workload is determined based on the amount of reuse. The cache management policy is applied to the cache.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method of managing a cache located on a processor, the method comprising:
 identifying a plurality of memory addresses associated with a workload of an application executing using the processor; 
 determining a characteristic of the workload that corresponds to an amount of traffic between the cache and off-chip memory generated at one or more memory addresses of the plurality of memory addresses; 
 determining an amount of reuse of the plurality of memory addresses using the characteristic; 
 determining a cache management policy for the workload based on the amount of reuse; and 
 applying the cache management policy to the cache. 
 
     
     
       2. The method of  claim 1 , wherein the application is a neural network (NN) inference application, and the plurality of memory addresses are a plurality of virtual addresses. 
     
     
       3. The method of  claim 2 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with activation data of the workload of the NN inference application. 
     
     
       4. The method of  claim 3 , wherein the determining the amount of reuse includes computing the amount of reuse according to a quantitative metric, the quantitative metric comprising a number of layers associated with the NN inference application that access the one or more virtual addresses associated with the activation data of the workload. 
     
     
       5. The method of  claim 4 , wherein applying the cache management policy to the cache comprises applying one or more eviction priority controls to one or more memory addresses of the plurality of memory addresses based on the quantitative metric. 
     
     
       6. The method of  claim 5 , wherein applying the one or more eviction priority controls to the one or more memory addresses of the plurality of memory addresses comprises designating at least one of an evict-last eviction priority control, an evict-normal eviction priority control, or an evict-first eviction priority control for at least one memory address of the one or more memory addresses. 
     
     
       7. The method of  claim 2 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with weight data of the workload of the NN inference application. 
     
     
       8. A system comprising:
 a processor, having a cache located thereon, to perform operations comprising:
 identifying a plurality of memory addresses associated with a workload of an application executing using the processor; 
 determining a characteristic of the workload that corresponds to an amount of traffic between the cache and off-chip memory generated at one or more memory addresses of the plurality of memory addresses; 
 determining an amount of reuse of the memory addresses using the characteristic; 
 determining a cache management policy for the workload based on the amount of reuse; and 
 applying the cache management policy to the cache. 
 
 
     
     
       9. The system of  claim 8 , wherein the application is a neural network (NN) inference application, and the plurality of memory addresses are a plurality of virtual addresses. 
     
     
       10. The system of  claim 9 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with activation data of the workload of the NN inference application. 
     
     
       11. The system of  claim 10 , wherein the determining the amount of reuse includes computing the amount of reuse according to a quantitative metric, the quantitative metric comprising a number of layers associated with the NN inference application that access the one or more virtual addresses associated with the activation data of the workload. 
     
     
       12. The system of  claim 11 , wherein applying the cache management policy for the workload based on the quantitative metric comprises applying one or more eviction priority controls to one or more memory addresses of the plurality of memory addresses based on the quantitative metric. 
     
     
       13. The system of  claim 12 , wherein applying the eviction priority controls to the one or more memory addresses of the plurality of memory addresses comprises designating at least one of an evict-last eviction priority control, an evict-normal eviction priority control, or an evict first eviction priority control for at least one memory address of the one or more memory addresses. 
     
     
       14. The system of  claim 9 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with weight data of the workload of the NN inference application. 
     
     
       15. A processor comprising:
 one or more processing units to:
 identify a plurality of memory addresses associated with a workload of an application; 
 determine a characteristic of the workload that corresponds to an amount of traffic between a cache corresponding to the processor and off-chip memory generated at one or more memory addresses of the plurality of memory addresses; 
 determine an amount of reuse of the plurality of memory addresses using the characteristic; 
 determine a cache management policy to apply to the cache for the workload based at least on the amount of reuse; and 
 apply the cache management policy to the cache. 
 
 
     
     
       16. The processor of  claim 15 , wherein the application is a neural network (NN) inference application, and the plurality of memory addresses are a plurality of virtual addresses. 
     
     
       17. The processor of  claim 16 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with activation data of the workload of the NN inference application. 
     
     
       18. The processor of  claim 17 , wherein the one or more processing units are to determine the amount of reuse according to a quantitative metric, the quantitative metric comprising a number of layers associated with the NN inference application that access the one or more virtual addresses associated with the activation data of the workload. 
     
     
       19. The processor of  claim 15 , wherein the one or more processing units are to apply the cache management policy to the cache by applying one or more eviction priority controls to at least one memory address of the one or more memory addresses of the plurality of memory addresses based on the amount of reuse of the plurality of memory addresses corresponding to the workload. 
     
     
       20. The processor of  claim 19 , wherein the applying the one or more eviction priority controls to the at least one memory address of the one or more memory addresses comprises designating at least one of an evict-last eviction priority control, an evict-normal eviction priority control, or an evict-first eviction priority control for at least one memory address of the one or more memory addresses.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.