Graphic processor based accelerator system and method
Abstract
An accelerator system is implemented on an expansion card comprising a printed circuit board having (a) one or more graphics processing units (GPUs), (b) two or more associated memory banks (logically or physically partitioned), (c) a specialized controller, and (d) a local bus providing signal coupling compatible with the PCI industry standards. The controller handles most of the primitive operations to set up and control GPU computation. Thus, the computer's central processing unit (CPU) can be dedicated to other tasks. In this case a few controls (simulation start and stop signals from the CPU and the simulation completion signal back to CPU), GPU programs and input/output data are exchanged between CPU and the expansion card. Moreover, since on every time step of the simulation the results from the previous time step are used but not changed, the results are preferably transferred back to CPU in parallel with the computation.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. A computer system, comprising:
a central processing unit to receive input data;
main memory, operably coupled to the central processing unit via a bus, to store the input data received by the central processing unit;
an accelerator, operably coupled to the central processing unit and the first main memory via the bus, to receive at least a portion of the input data from the main memory, the accelerator comprising:
at least one graphics processing unit to perform a sequence of computations on the at least a portion of the input data so as to generate output data, the sequence of computations representing an artificial neural network, intermediate computations in the sequence of computations representing respective layers of the artificial neural network and yielding intermediate results; and
accelerator memory, operably coupled to the graphic at least one graphics processing unit, to store the results of the plurality of sequential sequence of computations; and
a controller, operably coupled to the at least one graphics processing unit and the accelerator memory, to initialize textures and shaders in the accelerator memory for performing the sequence of computations, to control performance of the sequence of computations by the at least one graphics processing unit, to transfer the at least a portion of the input data into the accelerator memory during performance of the intermediate computations in the sequence of computations by the at least one graphics processing unit, and to transfer at least a portion of the output data from the accelerator memory to the main memory during performance of the intermediate computations in the sequence of computations by the at least one graphic graphics processing unit.
2. The computer system of claim 1 , wherein the central processing unit is configured to receive the input data in response to a user interaction.
3. The computer system of claim 1 , wherein:
the central processing unit is configured to receive the input data at a first rate; and
the at least one graphics processing unit is configured to perform the sequence of computations at a second rate different than the first rate.
4. The computer system of claim 1 , wherein the main memory is configured to store a copy of the output data stored in the accelerator memory.
5. The computer system of claim 1 , wherein an output of at least one computation in the sequence of computations represents an output of at least one neuron in an artificial neural network.
6. The computer system of claim 1 , wherein accelerator memory comprises:
a first memory bank to store parameters common to all of the computations in the sequence of computations; and
a second memory bank to store data specific to at least one computation in the sequence of computations.
7. The computer system of claim 1 , wherein the controller is configured to transfer the output data from the accelerator memory to the main memory without transferring any of the intermediate results from the accelerator memory to the main memory so as to reduce data transfer via the bus.
8. The computer system of claim 1 , wherein the controller is configured to transfer at least a portion of the output data from the accelerator memory to the main memory after the at least one graphics processing unit has begun to perform another sequence of computations.
9. The computer system of claim 8 , wherein the controller is configured to initiate transfer of the at least a portion of the input data and to transfer the at least a portion of the output data in parallel with performance of at least one computation in the other sequence of computations by the at least one graphics processing unit.
10. The computer system of claim 1 , wherein the controller is configured to control execution of the sequence of computations by the at least one graphics processing unit.
11. The computer system of claim 1 , further comprising:
at least one of a video camera, a microphone, or a cell recording electrode, operably coupled to the central processor processing unit, to acquire the input data in real time.
12. A method of performing a sequence of computations representing an artificial neural network on a computer system comprising a central processing unit (CPU), a main memory operably coupled to the central processing unit via a bus, an accelerator operably coupled to the CPU and the main memory via the bus, the accelerator comprising a graphics processing unit (GPU) and an accelerator memory, the method comprising:
(A) performing, by the GPU, the sequence of computations on a first portion of the input data so as to generate a first portion of the output data, the first portion of the output data representing an output of a neuron in a first layer of the artificial neural network, intermediate computations in the sequence of computations yielding intermediate results, wherein performing the sequence of computations on the first portion of the input data comprises (i) assigning an output variable to a first texture and a second texture, the output variable being included in a first computational element of a plurality of computational elements, the plurality of computational elements representing the sequence of computations and (ii) accumulating a first value for the output variable in the first texture during a first time step;
(B) in parallel with performing the sequence of computations by the GPU in (A), transferring a second portion of the input data from the main memory to the accelerator via the bus; and
(C) in parallel with performing the sequence of computations by the GPU in (A), transferring a second portion of the output data from the accelerator memory to the main memory via the bus, the second portion of the output data representing an output of a neuron in a second layer in the artificial neural network; and
(D) performing, by the GPU, the sequence of computations on the second portion of the input data, wherein performing the sequence of computations on the second portion of the input data comprises (i) accumulating a second value for the output variable in the second texture during a second time step and (ii) making the first value of the output variable in the first texture accessible to other computational elements in the plurality of computational elements during the second time step.
13. The method of claim 12 , further comprising:
storing the input data in the main memory in response to a user interaction.
14. The method of claim 12 , further comprising:
receiving the input data at a first rate; and
wherein (A) comprises performing the sequence of computations at a second rate different than the first rate.
15. The method of claim 12 , wherein (A) comprises:
generating an output representative of an output of at least one neuron in an artificial neural network.
16. The method of claim 12 , wherein (C) comprises:
transferring the second portion of the output data from the accelerator memory to the main memory without transferring any of the intermediate results of the plurality of sequential computations from the accelerator memory to the main memory so as to reduce data transfer via the bus.
17. The method of claim 12 , wherein (C) comprises:
transferring the second portion of the output data from the accelerator memory to the main memory after the GPU has begun to perform another sequence of computations.
18. The method of claim 17 , wherein (C) further comprises:
initiating transfer of the second portion of the output data in parallel with performance of at least one computation in the other sequence of computations.
19. The method of claim 12 , further comprising:
acquiring the input data in real time with at least one of a video camera, a microphone, or a cell recording electrode operably coupled to the CPU.
20. The method of claim 12 , further comprising:
storing parameters common to all of the computations in the sequence of computations in a first memory bank in the accelerator memory; and
storing data specific to at least one computation in the sequence of computations in a second memory bank in the accelerator memory.
21. A method of performing a sequence of computations representing an artificial neural network, the method comprising:
receiving, at a central processing unit (CPU), first input data acquired from an external system in real time; initializing, by a controller operably coupled to a graphics processing unit (GPU), textures and shaders in a memory operably coupled to the GPU; transferring the first input data received by the CPU to the memory operably coupled to the GPU; performing, by the graphics processing unit (GPU), a first computation in the sequence of computations on the first input data based on the textures and shaders to generate first output data, computations in the sequence of computations representing respective layers of neurons in the artificial neural network, an output of the first computation in the sequence of computations representing an output of a first neuron in a first layer in the artificial neural network; storing, in the memory operably coupled to the GPU, the first input data and the first output data; and transferring second input data acquired from the external system in real time into the memory operably coupled to the GPU after the GPU starts the first computation and before the GPU starts a second computation of the sequence of computations, an output of the second computation in the sequence of computations representing an output of a second neuron in a second layer in the artificial neural network.
22. The method of claim 21, wherein transferring the second input data comprises transferring the second input data via a bus operably coupled to the CPU.
23. The method of claim 21, further comprising:
transferring the first output data from the memory to another memory during the second computation in the sequence of computations.
24. The method of claim 23, further comprising:
storing intermediate results of the sequence of computations in the memory, and wherein transferring the first output data from the memory to the other memory occurs without transferring the intermediate results of the sequence of computations.
25. The method of claim 23, wherein transferring the second input data and transferring the first output data occurs in parallel.
26. The method of claim 21, further comprising:
storing, in a first memory partition of the memory, parameters common to all of the computations in the sequence of computations.
27. The method of claim 26, further comprising:
storing, in a second memory partition of the memory, data specific to the first computation in the sequence of computations.
28. The method of claim 27, further comprising:
storing, in the second memory partition, external input data patterns, representations of internal variables, an input of the computation in the sequence of computations, and the output of the computation in the sequence of computations.
29. The method of claim 21, wherein storing the first output data comprises:
accumulating, in the memory, outputs of computational elements executed by the GPU in performing the first computation in the sequence of computations.
30. The method of claim 21, further comprising:
storing, in the memory, an output of a previous computation in the sequence of computations; and accessing, by the GPU, the output of the previous computation during performance of the computation in the sequence of computations.
31. The method of claim 21, wherein performing the first computation comprises executing a plurality of computational elements representing a layer of neurons in an artificial neural network.
32. The method of claim 31, wherein all neurons in the layer of neurons are described by the same equation.
33. The method of claim 21, further comprising:
acquiring the second input data with at least one of a video camera, a microphone, or a cell recording electrode.
34. The method of claim 21, further comprising:
loading the second input data from disk.
35. A system for performing a sequence of computations, the system comprising:
a camera to generate input data in real time; a first memory partition; a second memory partition operably coupled to the first memory partition; and a processing unit, operably coupled to the camera, the first memory partition, and the second memory partition, to perform the sequence of computations on a first portion of the input data so as to generate a first portion of output data, intermediate computations in the sequence of computations yielding intermediate results, the first portion of the output data representing an output of an artificial neural network, wherein the first memory partition is configured to transfer a second portion of the input data to the second memory partition in parallel with performance the sequence of computations by the processing unit, wherein the second memory partition is configured to transfer a second portion of the output data to the first memory partition in parallel with performance the sequence of computations by the processing unit, and wherein the sequence of computations represents the artificial neural network, each neuron in the artificial neural network has an output variable assigned to a first texture and a second texture in the memory, the first texture holds a first value of the output variable computed during a previous time step of the sequence of computations and accessible to other neurons in the neural network during a current time step of the sequence of computations and the second texture accumulates a second value of the output variable computed during the current time step.
36. The system of claim 35, wherein the first memory partition and the second memory partition are logical partitions.
37. The system of claim 35, wherein the processing unit is comprises a graphics processing unit (GPU).
38. The system of claim 35, wherein the processing unit is configured to receive the input data at a first rate and to perform the sequence of computations at a second rate is different than the first rate.
39. The system of claim 35, wherein the second memory partition is configured to transfer the second portion of the output data to the first memory partition without transferring any of the intermediate results to the first memory partition.
40. A system for executing an artificial neural network, the system comprising:
a central processing unit (CPU) to provide first input data; a memory, operably coupled to the CPU, to store the first input data in a first partition, referenced by a first pointer, before computing a first layer of neurons of the artificial neural network; a processing unit, operably coupled to the memory, to perform, during computation of the first layer of neurons, at least one calculation on the first input data so as to generate first output data, the first output data representing an output of at least one neuron in the first layer of neurons; and a controller, operably coupled to the processing unit and the memory, to:
store the first output data in a second partition of the memory, the second partition referenced by a second pointer, and to swap the first pointer with the second pointer at the end of the computation of the first layer of neurons, such that the first output data becomes an input for a second layer of neurons of the artificial neural network,
transfer the first output data to another memory during computation of the second layer of neurons, and
dictate an order of execution of instructions to the processing unit to perform the computation of the first layer of neurons.
41. The system of claim 40, wherein the processing unit comprises a graphics processing unit.
42. The system of claim 40, wherein the controller is configured to send instructions for performing the at least one calculation to the processing unit.
43. The system of claim 40, wherein the memory further comprises:
a third partition to store internal variables; and a fourth partition to store data used as input at a particular layer of neurons of the artificial neural network.
44. A computer system, comprising:
a central processing unit to receive input data acquired from an external system; main memory, operably coupled to the central processing unit via a bus, to store the input data received by the central processing unit; an accelerator, operably coupled to the central processing unit and the main memory via the bus, to receive at least a portion of the input data from the main memory, the accelerator comprising:
at least one processing unit to perform a sequence of computations representing an artificial neural network on the at least a portion of the input data so as to generate output data, intermediate computations in the sequence of computations representing layers of the neural network and yielding intermediate results; and
accelerator memory, operably coupled to the at least one processing unit, to store the results of the sequence of computations; and
a controller, operably coupled to the at least one processing unit and the accelerator memory, to control transfer of the at least a portion of the input data into the accelerator memory during performance of the intermediate computations in the sequence of computations by the at least one processing unit, to control transfer at least a portion of the output data from the accelerator memory to the main memory during performance of the intermediate computations in the sequence of computations by the at least one processing unit, and to control performance of the sequence of computations by the at least one processing unit.
45. The computer system of claim 44, wherein the central processing unit is configured to receive the input data in response to a user interaction.
46. The computer system of claim 44, wherein:
the central processing unit is configured to receive the input data at a first rate; and the at least one processing unit is configured to perform the sequence of computations at a second rate different than the first rate.
47. The computer system of claim 44, wherein the main memory is configured to store a copy of the output data stored in the accelerator memory.
48. The computer system of claim 44, wherein an output of at least one computation in the sequence of computations represents an output of at least one neuron in an artificial neural network.
49. The computer system of claim 44, wherein accelerator memory comprises:
a first memory partition to store parameters common to all of the computations in the sequence of computations; and a second memory partition to store data specific to at least one computation in the sequence of computations.
50. The computer system of claim 44, wherein the controller is configured to transfer the output data from the accelerator memory to the main memory without transferring any of the intermediate results from the accelerator memory to the main memory so as to reduce data transfer via the bus.
51. The computer system of claim 44, wherein the controller is configured to transfer at least a portion of the output data from the accelerator memory to the main memory after the at least one processing unit has begun to perform another sequence of computations.
52. The computer system of claim 51, wherein the controller is configured to initiate transfer of the at least a portion of the input data and to transfer the at least a portion of the output data in parallel with performance of at least one computation in the other sequence of computations by the at least one processing unit.
53. The computer system of claim 44, wherein the controller is configured to control execution of the sequence of computations by the at least one processing unit.
54. The computer system of claim 44, further comprising:
at least one of a video camera, a microphone, or a cell recording electrode, operably coupled to the central processing unit, to acquire the input data in real time.
55. The computer system of claim 1, wherein the controller is configured to inform the central processing unit that the sequence of computations is finished.
56. The computer system of claim 1, wherein the controller is configured to reduce a processing load on the central processing unit.
57. The computer system of claim 1, wherein the controller is configured to reduce interactions between the central processing unit and the accelerator.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.