P
US9448766B2ActiveUtilityPatentIndex 82

Interconnected arithmetic logic units

Assignee: NVIDIA CORPPriority: Aug 15, 2007Filed: Aug 27, 2013Granted: Sep 20, 2016
Est. expiryAug 15, 2027(~1.1 yrs left)· nominal 20-yr term from priority
Inventors:BERGLAND TYSONTOKSVIG MICHAEL J MMAHAN JUSTIN MICHAEL
G06F 9/30G06F 7/57G06F 7/5443G06F 9/3893G06F 9/3001
82
PatentIndex Score
7
Cited by
84
References
20
Claims

Abstract

An arithmetic logic stage in a graphics pipeline includes a number of arithmetic logic units (ALUs). The ALUs each include, for example, a multiplier and an adder. The ALUs are interconnected by circuitry that, for example, routes the output from the multiplier in one ALU to both the adder in that ALU and an adder in another ALU.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. An arithmetic logic stage circuit of a graphics processor unit pipeline, the circuit comprising:
 a plurality of arithmetic logic units (ALUs) coupled in parallel to one another; and 
 programmable interconnecting circuitry coupled between the ALUs and programmable according to programming code, wherein the interconnecting circuitry is operable to allow the plurality of ALUs to implement, on a single pass through the ALUs, a multiply-add operation according to a first programming code and a multidimensional dot product computation according to a second programming code, wherein the interconnecting circuitry comprises multiplexers coupling the parallel ALUs, the multiplexers comprising a multiplexer that routes data from one of the ALUs to another one of the ALUs, the multiplexers also comprising a multiplexer that receives data at one of the ALUs from another one of the ALUs. 
 
     
     
       2. The circuit of  claim 1  wherein the programming code comprises a respective code for each off the ALUs and wherein the respective code is symmetrical with respect to each other code. 
     
     
       3. The circuit of  claim 1  wherein the interconnecting circuitry is asymmetric with respect to each of the ALUs. 
     
     
       4. The circuit of  claim 1  wherein each of the ALUs is analogous. 
     
     
       5. The circuit of  claim 1  wherein the multidimensional dot product computation comprises a four-dimensional dot product. 
     
     
       6. A method comprising:
 performing a first type of operation and performing a second type of operation using a plurality of arithmetic logic units (ALUs) comprising a first ALU, a second ALU, a third ALU and a fourth ALU, each of the ALUs comprising a first digital circuit operable for performing the first type of operation and a second digital circuit operable for performing the second type of operation; 
 routing data that is output from the first digital circuit of the first ALU to both the second digital circuit of the second ALU and the second digital circuit of the third ALU, the routing through circuitry interconnecting the ALUs, the circuitry comprising a first multiplexer coupled between the first digital circuit of the first ALU and the second digital circuit of the second ALU, the circuitry further comprising a second multiplexer coupled between the first digital circuit of the first ALU and the second digital circuit of the third ALU; 
 selecting, using the first multiplexer, the data as an operand for the second digital circuit of the second ALU; and 
 forwarding, using the second multiplexer, the data to the second digital circuit of the third ALU. 
 
     
     
       7. The method of  claim 6 , wherein the first digital circuit comprises a multiplier and the second digital circuit comprises an adder, wherein the first type of operation comprises multiplication and the second type of operation comprises addition. 
     
     
       8. The method of  claim 6 , wherein the plurality of ALUs are operable for performing multidimensional computations in a single pass, the multidimensional computations selected from the group consisting of: four-dimensional dot product; three-dimensional dot product with scalar add and multiply-add; three-dimensional dot product and multiply-add; up to four multiply-adds; two two-dimensional dot products with scalar adds; two two-dimensional dot products with scalar add and two multiply-adds; two two-dimensional dot products; three two-dimensional dot products; and four two-dimensional dot products. 
     
     
       9. The method of  claim 6  further comprising receiving, at each of the ALUs, a two-bit control signal for controlling the routing of data. 
     
     
       10. A method comprising:
 receiving, at a first adder of a plurality of adders, a first operand that is an output of a first multiplier of a plurality of multipliers comprising the first multiplier, a second multiplier, a third multiplier and a fourth multiplier; 
 selecting, with a first multiplexer, a second operand for the first adder; 
 selecting, with a second multiplexer, a third operand for the first adder, wherein the third operand comprises an output of one of the second, third and fourth multipliers; and 
 interconnecting the plurality of multipliers to the plurality of adders with software-configurable circuitry, the circuitry configurable to allow an adder to receive data from more than one of the multipliers and to allow data to be sent from a multiplier to more than one of the adders. 
 
     
     
       11. The method of  claim 10 , wherein the circuitry comprises a plurality of multiplexers. 
     
     
       12. The method of  claim 10 , wherein the second operand comprises an input to the arithmetic logic stage that bypasses the multipliers. 
     
     
       13. The method of  claim 10 , wherein the second operand comprises an output of one of the second, third and fourth multipliers. 
     
     
       14. The method of  claim 10  further comprising selecting, with a third multiplexer, a fourth operand for the first adder, wherein the fourth operand comprises an output of one of the second, third and fourth multipliers. 
     
     
       15. The method of  claim 10 , wherein the plurality of multipliers and the plurality of adders in combination are operable for performing multidimensional computations in a single pass through the arithmetic logic stage, the multidimensional computations selected from the group consisting of: four-dimensional dot product; three-dimensional dot product with scalar add and multiply-add; three-dimensional dot product and multiply-add; up to four multiply-adds; two two-dimensional dot products with scalar adds; two two-dimensional dot products with scalar add and two multiply-adds; two two-dimensional dot products; three two-dimensional dot products; and four two-dimensional dot products. 
     
     
       16. In an arithmetic logic stage in a graphics pipeline comprising a first arithmetic logic unit (ALU) comprising a first multiplier and a first adder coupled in series, a second ALU comprising a second multiplier and a second adder coupled in series, a third ALU comprising a third multiplier and a third adder coupled in series, and a fourth ALU comprising a fourth multiplier and a fourth adder coupled in series, a method comprising coupling operations comprising:
 coupling an output of the first multiplier to the second adder; 
 coupling an output of the second multiplier to the third adder; 
 coupling an output of the third multiplier to the fourth adder; 
 coupling an output of the fourth multiplier to the first adder; 
 coupling an output of the second multiplier to the first adder; 
 coupling an output of the third multiplier to the first adder; and 
 coupling an output of the fourth multiplier to the third adder. 
 
     
     
       17. The method of  claim 16 , wherein the circuitry comprises a plurality of multiplexers, the method further comprising:
 selecting, with a first multiplexer coupled to an input of the first adder, between the output of the third multiplier and an input to the first ALU; 
 selecting, with a second multiplexer coupled to the output of the second multiplier, between sending and not sending the output of the second multiplier to the first adder; 
 selecting, with a third multiplexer coupled to an input of the second adder, between the output of the first multiplier and an input to the second ALU; 
 selecting, with a fourth multiplexer coupled to the output of the third multiplier, between sending and not sending the output of the third multiplier to the first adder; 
 selecting, with a fifth multiplexer coupled to an input of the third adder, between the output of the second multiplier and an input to the third ALU; 
 selecting, with a sixth multiplexer coupled to the output of the fourth multiplier, between sending and not sending the output of the fourth multiplier to the third adder; and 
 selecting, with a seventh multiplexer, between the output of the third multiplier and an input to the fourth ALU. 
 
     
     
       18. The method of  claim 16 , wherein the first, second, third and fourth ALUs are operable for performing multidimensional computations in a single pass, the multidimensional computations selected from the group consisting of: four-dimensional dot product; three-dimensional dot product with scalar add and multiply-add; three-dimensional dot product and multiply-add; up to four multiply-adds; two two-dimensional dot products with scalar adds; two two-dimensional dot products with scalar add and two multiply-adds; two two-dimensional dot products; three two-dimensional dot products; and four two-dimensional dot products. 
     
     
       19. The method of  claim 16  further comprising receiving, at each of the first, second, third and fourth ALUs, a two-bit control signal for configuring the coupling operations. 
     
     
       20. The method of  claim 16 , programmable in software and dynamically configurable on the fly.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.