US12436897B2ActiveUtilityPatentIndex 53

Cache management using eviction priority based on memory reuse

Assignee: NVIDIA CORPPriority: Mar 7, 2023Filed: Mar 7, 2023Granted: Oct 7, 2025

Est. expiryMar 7, 2043(~16.7 yrs left)· nominal 20-yr term from priority

Inventors:KOREM NOAM DOR PHARRIS BRIAN SCOTT SUBAG JACOB

Y02D10/00G06N 3/0464G06F 12/126G06N 3/045G06N 3/063

PatentIndex Score

Cited by

References

Claims

Abstract

Apparatuses, systems, and techniques to manage a cache located on a processor of a computing system using eviction priority based on based on memory reuse. Memory addresses associated with a workload of an application executing using the processor are identified. An amount of reuse of the memory addresses corresponding to the workload is determined. A cache management policy for the workload is determined based on the amount of reuse. The cache management policy is applied to the cache.

Claims

exact text as granted — not AI-modified

What is claimed is:

1. A method of managing a cache located on a processor, the method comprising:
identifying a plurality of memory addresses associated with a workload of an application executing using the processor;
determining a characteristic of the workload that corresponds to an amount of traffic between the cache and off-chip memory generated at one or more memory addresses of the plurality of memory addresses;
determining an amount of reuse of the plurality of memory addresses using the characteristic;
determining a cache management policy for the workload based on the amount of reuse; and
applying the cache management policy to the cache.

2. The method of claim 1 , wherein the application is a neural network (NN) inference application, and the plurality of memory addresses are a plurality of virtual addresses.

3. The method of claim 2 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with activation data of the workload of the NN inference application.

4. The method of claim 3 , wherein the determining the amount of reuse includes computing the amount of reuse according to a quantitative metric, the quantitative metric comprising a number of layers associated with the NN inference application that access the one or more virtual addresses associated with the activation data of the workload.

5. The method of claim 4 , wherein applying the cache management policy to the cache comprises applying one or more eviction priority controls to one or more memory addresses of the plurality of memory addresses based on the quantitative metric.

6. The method of claim 5 , wherein applying the one or more eviction priority controls to the one or more memory addresses of the plurality of memory addresses comprises designating at least one of an evict-last eviction priority control, an evict-normal eviction priority control, or an evict-first eviction priority control for at least one memory address of the one or more memory addresses.

7. The method of claim 2 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with weight data of the workload of the NN inference application.

8. A system comprising:
a processor, having a cache located thereon, to perform operations comprising:
identifying a plurality of memory addresses associated with a workload of an application executing using the processor;
determining a characteristic of the workload that corresponds to an amount of traffic between the cache and off-chip memory generated at one or more memory addresses of the plurality of memory addresses;
determining an amount of reuse of the memory addresses using the characteristic;
determining a cache management policy for the workload based on the amount of reuse; and
applying the cache management policy to the cache.

9. The system of claim 8 , wherein the application is a neural network (NN) inference application, and the plurality of memory addresses are a plurality of virtual addresses.

10. The system of claim 9 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with activation data of the workload of the NN inference application.

11. The system of claim 10 , wherein the determining the amount of reuse includes computing the amount of reuse according to a quantitative metric, the quantitative metric comprising a number of layers associated with the NN inference application that access the one or more virtual addresses associated with the activation data of the workload.

12. The system of claim 11 , wherein applying the cache management policy for the workload based on the quantitative metric comprises applying one or more eviction priority controls to one or more memory addresses of the plurality of memory addresses based on the quantitative metric.

13. The system of claim 12 , wherein applying the eviction priority controls to the one or more memory addresses of the plurality of memory addresses comprises designating at least one of an evict-last eviction priority control, an evict-normal eviction priority control, or an evict first eviction priority control for at least one memory address of the one or more memory addresses.

14. The system of claim 9 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with weight data of the workload of the NN inference application.

15. A processor comprising:
one or more processing units to:
identify a plurality of memory addresses associated with a workload of an application;
determine a characteristic of the workload that corresponds to an amount of traffic between a cache corresponding to the processor and off-chip memory generated at one or more memory addresses of the plurality of memory addresses;
determine an amount of reuse of the plurality of memory addresses using the characteristic;
determine a cache management policy to apply to the cache for the workload based at least on the amount of reuse; and
apply the cache management policy to the cache.

16. The processor of claim 15 , wherein the application is a neural network (NN) inference application, and the plurality of memory addresses are a plurality of virtual addresses.

17. The processor of claim 16 , wherein one or more virtual addresses of the plurality of virtual addresses are associated with activation data of the workload of the NN inference application.

18. The processor of claim 17 , wherein the one or more processing units are to determine the amount of reuse according to a quantitative metric, the quantitative metric comprising a number of layers associated with the NN inference application that access the one or more virtual addresses associated with the activation data of the workload.

19. The processor of claim 15 , wherein the one or more processing units are to apply the cache management policy to the cache by applying one or more eviction priority controls to at least one memory address of the one or more memory addresses of the plurality of memory addresses based on the amount of reuse of the plurality of memory addresses corresponding to the workload.

20. The processor of claim 19 , wherein the applying the one or more eviction priority controls to the at least one memory address of the one or more memory addresses comprises designating at least one of an evict-last eviction priority control, an evict-normal eviction priority control, or an evict-first eviction priority control for at least one memory address of the one or more memory addresses.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.