Stride reference prefetcher
Abstract
A processor including a cache memory, processing logic, access logic, stride mask logic, count logic, arbitration logic, and a prefetcher. The processing logic submits load requests to access cache lines of a memory page. The access logic updates an access vector for the memory page, in which the access logic determines a minimum stride value between successive load requests. The stride mask logic provides a mask vector based on the minimum stride value. The count logic combines the mask vector with the access vector to provide an access count. The arbitration logic triggers a prefetch operation when the access count achieves a predetermined count threshold. The prefetcher performs the prefetch operation using a prefetch address determined by combining the minimum stride value with an address of a last one of the load requests. Direction of the stride may be determined, and a stable mode is described.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. A stride reference prefetcher for a processor, comprising:
access logic that detects successive load requests to cache lines of a page of memory, that determines a stride value between each of said successive load requests and that determines a minimum stride value, and that updates an access history vector which identifies each cache line of said page of memory that has been accessed;
a stride mask logic that provides a stride mask vector based on said minimum stride value;
count logic that filters said access history vector with said stride mask vector to determine a positive access count and a negative access count;
arbitration logic that triggers a prefetch operation when either one of said positive access count and said negative access count meets a trigger threshold and that indicates a direction of memory accesses based on a relative size of said positive access count and said negative access count; and
a prefetcher that performs said prefetch operation using a prefetch address when triggered by said arbitration logic, wherein said prefetcher determines said prefetch address by combining a load request address and said minimum stride value in a direction indicated by said arbitration logic.
2. The stride reference prefetcher of claim 1 , wherein said access logic comprises a reference table that stores a page address of said page, that stores said minimum stride value, that stores a previous access value that identifies a cache line accessed by said load request address, and that stores said access history vector.
3. The stride reference prefetcher of claim 2 , wherein said access logic further comprises:
an adder that determines a difference between a next access value and said previous access value to determine a next stride value;
a comparator that compares said minimum stride value with said next stride value and that replaces said minimum stride value with said next stride value as an updated minimum stride value in said reference table when said next stride value is less than said minimum stride value; and
wherein said previous access value is replaced by said next access value as an updated previous access value in said reference table.
4. The stride reference prefetcher of claim 2 , wherein said access logic further comprises:
a decoder that decodes a next access value of said load request address to provide a decoded access value that identifies a next cache line being accessed; and
OR logic that logically OR's said decoded access value with said access history vector to update said access history vector.
5. The stride reference prefetcher of claim 1 , wherein said access logic comprises a reference table comprising a column of page address entries, a column of corresponding minimum stride value entries, a column of corresponding previous access values that identify a cache line last accessed for a corresponding page address, and a column of corresponding access history vectors.
6. The stride reference prefetcher of claim 5 , wherein said access logic further comprises:
a comparator that compares a next page address with each valid page address stored in said column of page address entries of said reference table;
update logic that updates a row of entries corresponding to a page address that matched said next page address; and
replace logic that adds a new row of entries into said reference table when said next page address does not match any valid page addresses stored in said column of page address entries of said reference table.
7. The stride reference prefetcher of claim 6 , wherein said replace logic uses a least-recently used replacement policy to replace a valid row of entries in said reference table including said next page address and corresponding initial values.
8. The stride reference prefetcher of claim 1 , wherein said arbitration logic comprises:
a first comparator that compares said positive access count with said trigger threshold and that asserts a first value indicative thereof;
a second comparator that compares said negative access count with said trigger threshold and that asserts a second value indicative thereof; and
OR logic that logically OR's said first and second values to provide a trigger signal that triggers said prefetch operation.
9. The stride reference prefetcher of claim 8 , wherein said arbitration logic further comprises a third comparator that compares said positive access count with said negative access count and that provides a direction signal, wherein said direction signal indicates a negative direction of memory accesses when said positive access count is less than said negative access count, and otherwise indicates a positive direction of memory accesses.
10. The stride reference prefetcher of claim 1 , wherein:
said arbitration logic asserts a trigger signal to trigger a prefetch operation and asserts a direction signal indicating said direction of said prefetch operation;
wherein said access logic increments only one of a positive value and a negative value with each assertion of said direction signal, and determines a sign value based on a larger one of said positive value and said negative value;
wherein said arbitration logic comprises:
an adder that adds said positive access count and said negative access count to provide an access sum; and
a comparator circuit that compares said access sum to a stable enable threshold and that asserts a stable enable signal when said stable threshold is met; and
wherein said prefetcher performs a plurality of sequential prefetch operations using a corresponding plurality of sequential prefetch addresses in response to assertion of said stable enable signal, wherein said plurality of sequential prefetch addresses is determined by repeatedly combining a multiple of said minimum stride value to said load request address in a direction indicated by said sign value.
11. The stride reference prefetcher of claim 10 , wherein said comparator circuit comprises:
a first comparator that compares said access sum with said stable enable threshold and that asserts a stable enable signal when said stable enable threshold is met;
a second comparator that compares said access sum with a stable disable threshold and that asserts a stable disable signal when said stable disable threshold is met; and
a latch circuit that receives said stable enable and disable signals and that asserts a stable signal indicative of a stable mode of operation.
12. The stride reference prefetcher of claim 10 , wherein said access logic comprises a reference table that stores a page address of said page, a stable value, said minimum stride value, said sign value, said positive value, said negative value, a previous access value that identifies a cache line accessed by said load request address, and said access history vector.
13. The stride reference prefetcher of claim 12 , wherein said access logic further comprises update logic that toggles said stable value when said stable enable signal is asserted, that increments one of said positive value and said negative value in response to assertion of said direction signal, that updates said sign value to reflect only one of said positive value and said negative value.
14. The stride reference prefetcher of claim 10 , wherein said plurality of sequential prefetch operations is limited by said prefetcher to stay ahead of said successive load requests by no more than a predetermined maximum number.
15. A processor, comprising:
a cache memory;
processing logic that submits a plurality of load requests to access cache lines of a memory page;
access logic that receives said plurality of load requests by said processing logic and that updates an access vector for said memory page, wherein said access logic determines a minimum stride value between successive ones of said plurality of load requests;
stride mask logic that provides a mask vector based on said minimum stride value;
count logic that combines said mask vector with said access vector to provide an access count;
arbitration logic that triggers a prefetch operation when said access count achieves a predetermined count threshold; and
a prefetcher that performs said prefetch operation using a prefetch address determined by combining said minimum stride value with an address of a last one of said plurality of load requests.
16. The processor of claim 15 , wherein said access logic comprises:
a reference table that stores a previous access value identifying a first cache line within said memory page and that stores said minimum stride value;
an adder that determines a difference between said previous access value and a next access value identifying a second cache line within said memory page and to determine a stride offset;
a comparator that compares a current minimum stride value with said stride offset to update said minimum stride value; and
wherein said next access value replaces said previous access value in said reference table for each of said plurality of load requests.
17. The processor of claim 15 , wherein said access vector comprises a plurality of bits identifying load request accesses of a corresponding plurality of cache lines of said memory page.
18. The processor of claim 15 , wherein said stride mask logic comprises a lookup table that provides one of a plurality of mask vectors for each of a plurality of possible stride values.
19. The processor of claim 15 , wherein said count logic comprises:
a first shift register that shifts said access count in a first direction by an amount based on a relative location of said last one of said plurality of load requests to provide a positive access history value;
a second shift register that shifts said access count in a second and opposite direction by an amount based on said relative location of said last one of said plurality of load requests to provide a negative access history value;
AND logic that logically AND's said mask vector with said positive access history value and said negative access history value; and
count logic that counts outputs of said AND logic to provide a positive access count and a negative access count.
20. The processor of claim 15 , wherein:
said count logic determines a positive access count and a negative access count; and
wherein said arbitration logic comprises:
a first comparator that compares said positive access count with said predetermined count threshold and that asserts a first value indicative thereof;
a second comparator that compares said negative access count with said predetermined count threshold and that asserts a second value indicative thereof;
OR logic that logically OR's said first and second values to provide a trigger signal indicative of triggering said prefetch operation; and
a third comparator that compares said positive access count with said negative access count and that provides a direction signal indicative of one of a positive direction and a negative direction.
21. The processor of claim 15 , wherein:
said count logic determines positive and negative access vector values, combines said mask vector with each of said positive and negative access vector values and provides positive and negative access counts;
wherein said arbitration logic determines a direction of said prefetch operation based on said positive and negative access counts;
wherein said arbitration logic indicates a stable condition of said load requests based on said positive and negative access counts; and
wherein said prefetcher performs successive prefetch operations using successive prefetch addresses by combining a multiple of said minimum stride value with said address of a last load request when said stable condition is indicated.
22. The processor of claim 21 , wherein said arbitration logic comprises:
an adder that adds said positive access count and said negative access count to provide an access sum;
a first comparator that compares said access sum with a stable enable threshold and that asserts a stable enable signal when said stable enable threshold is met;
a second comparator that compares said access sum with a stable disable threshold and that asserts a stable disable signal when said stable disable threshold is met; and
a latch circuit that receives said stable enable and disable signals and that asserts a stable signal indicative of a stable mode of operation.
23. A method of prefetching data from a memory into a processor, combining:
detecting successive load requests to cache lines of a page of memory;
determining a stride value between each of the successive load requests;
determining a minimum stride value;
updating an access history vector that identifies each cache line of the page of memory that has been accessed;
determining a stride mask vector based on the minimum stride value;
filtering the access history vector with the stride mask vector and determining an access count; and
performing a prefetch operation when the access count achieves a predetermined count threshold.
24. The method of claim 23 , wherein said determining a stride value comprises:
updating a stored previous access value for each successive load request; and
comparing a next access value with the stored previous access value.
25. The method of claim 23 , wherein said determining a minimum stride value comprises:
determining a stride value for each successive load request;
comparing a next stride value with a currently stored minimum stride value; and
updating the minimum stride value with the next stride value if the next stride value is less than the currently stored minimum stride value.
26. The method of claim 23 , wherein said updating an access history vector comprises setting a bit that corresponds with a currently accessed cache line of the page of memory.
27. The method of claim 23 , wherein said determining a stride mask vector comprises selecting one of a plurality of stride mask vectors, each corresponding with a corresponding one of a plurality of possible stride values.
28. The method of claim 23 , further comprising:
shifting the access history vector in a first direction to provide a positive access history vector;
shifting the access history vector in a second, opposite direction to provide a negative access history vector;
wherein said filtering comprises filtering the positive access history vector with the stride mask vector to provide a positive count, and filtering the negative access history vector with the stride mask vector to provide a negative count;
wherein said performing a prefetch operation comprises performing the prefetch operation when either one of the positive and negative counts achieves the predetermined count threshold; and
determining a direction of the prefetch operation based on comparing the positive and negative counts.
29. The method of claim 28 , further comprising:
adding the positive and negative counts to determine a sum count;
comparing the sum count with a stable enable threshold; and
operating in a stable mode when the sum count reaches the stable enable threshold.
30. The method of claim 29 , wherein said operating in a stable mode comprises repeatedly adding the minimum stride value to determine successive prefetch addresses and performing a prefetch operation for each of the successive prefetch addresses.
31. The method of claim 29 , further comprising:
comparing the sum count with a stable disable threshold; and
terminating said operating in a stable mode when the sum count falls to the stable disable threshold.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.