P
US10769748B2ActiveUtilityPatentIndex 73

Programmable coarse grained and sparse matrix compute hardware with advanced scheduling

Assignee: INTEL CORPPriority: Apr 28, 2017Filed: Nov 21, 2018Granted: Sep 8, 2020
Est. expiryApr 28, 2037(~10.8 yrs left)· nominal 20-yr term from priority
Inventors:NURVITADHI ERIKOVEMBU BALAJIGALOPPO VON BORRIES NICOLAS CBARIK RAJKISHORELIN TSUNG-HANSINHA KAMALSATISH NADATHUR RAJAGOPALANBOTTLESON JEREMYAKHBARI FARSHADKOKER ALTUGSRINIVASA NARAYANKIM DUKHWANBAGHSORKHI SARA SGOTTSCHLICH JUSTIN ECHEN FENGOULD-AHMED-VALL ELMOUSTAPHANEALIS KEVINCHEN XIAOMINGYAO ANBANG
G06N 3/044G06N 3/045G06N 3/0495G06N 3/0464G06N 3/098G06F 9/3888G06F 9/38885G06F 9/3851G06F 9/3887G06F 9/30196G06F 9/3895G06F 9/3001G06F 9/3017G06N 3/063G06N 3/084G06T 1/20G06N 3/04G06N 3/08G06N 3/0445G06N 3/0454
73
PatentIndex Score
3
Cited by
23
References
17
Claims

Abstract

One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

Claims

exact text as granted — not AI-modified
The invention claimed is: 
     
       1. A compute apparatus to perform machine learning operations, the compute apparatus comprising:
 a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation, wherein the complex machine learning compute operation includes multiple pipeline commands; 
 a scheduler controller to schedule the multiple pipeline commands to one or more of multiple types of compute units, wherein the multiple types of compute units include a general-purpose graphics compute unit and a near-data compute unit; and 
 a micro-controller to execute firmware instructions, the firmware instructions to enable a parameter analyzer to determine a type of machine learning operations to perform for the single instruction, wherein the micro-controller is further to offload a near-data compute kernel to the near-data compute unit. 
 
     
     
       2. The compute apparatus as in  claim 1 , wherein the complex machine learning compute operation is to perform a convolution for a layer of a convolutional neural network. 
     
     
       3. The compute apparatus as in  claim 2 , wherein the convolution includes multiple matrix operations. 
     
     
       4. The compute apparatus as in  claim 1 , additionally including a fetch unit to fetch the single instruction. 
     
     
       5. The compute apparatus as in  claim 4 , the fetch unit to store the single instruction to a cache memory. 
     
     
       6. The compute apparatus as in  claim 1 , wherein the multiple types of compute units include a sparse compute unit. 
     
     
       7. The compute apparatus as in  claim 1 , additionally including a machine learning accelerator to determine a set of operations to perform to execute the decoded instruction. 
     
     
       8. The compute apparatus as in  claim 7 , the firmware instructions additionally to enable the machine learning accelerator. 
     
     
       9. A non-transitory machine-readable medium storing instructions to cause one or more processors to perform operations comprising:
 decoding a single instruction into a decoded instruction, the decoded instruction associated with a set of multiple machine learning operations to be performed via a compute pipeline of a general-purpose graphics processing unit; 
 determining a set of pipeline commands to perform the set of multiple machine learning operations; and 
 scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit, wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling the set of pipeline commands to multiple compute pipelines, the multiple compute pipelines including a general-purpose compute pipeline and a near-data compute pipeline, and scheduling the set of pipeline commands include offloading a near-data compute kernel to the near-data compute pipeline. 
 
     
     
       10. The non-transitory machine-readable medium as in  claim 9 , wherein determining the set of pipeline commands to perform the set of multiple machine learning operations includes analyzing parameters associated with the decoded instruction. 
     
     
       11. The non-transitory machine-readable medium as in  claim 9 , additionally comprising retiring the decoded instruction in response completion of the set of pipeline commands. 
     
     
       12. The non-transitory machine-readable medium as in  claim 9 , wherein the single instruction is to cause the general-purpose graphics processing unit to perform a convolution for a layer of a convolutional neural network, the convolution including multiple matrix operations. 
     
     
       13. The non-transitory machine-readable medium as in  claim 9 , wherein scheduling the set of pipeline commands to the compute pipeline of the general-purpose graphics processing unit includes scheduling one or more pipeline commands in the set of pipeline commands to a sparse compute pipeline of the multiple compute pipelines. 
     
     
       14. A data processing system comprising:
 a general-purpose graphics processing unit including a fetch unit to fetch a single instruction, a decode unit to decode the single instruction into a decoded instruction, a micro-controller to execute firmware instructions to enable a parameter analyzer to determine a type of machine learning operations to perform for the single instruction, and a scheduler controller to schedule multiple matrix operations to one or more of multiple types of compute units, wherein the multiple types of compute units include a general-purpose graphics compute unit and a near-data compute unit, the decoded instruction is to cause the general-purpose graphics processing unit to execute multiple pipeline commands to perform a complex machine learning compute operation, and the micro-controller is to offload a near-data compute kernel to the near-data compute unit; and 
 a memory coupled to the general-purpose graphics processing unit. 
 
     
     
       15. The data processing system as in  claim 14 , wherein the multiple types of compute units include a sparse compute unit. 
     
     
       16. The data processing system as in  claim 14 , the general-purpose graphics processing unit including a machine learning accelerator to determine the multiple pipeline commands to execute to perform the complex machine learning compute operation. 
     
     
       17. The data processing system as in  claim 16 , the firmware instructions to enable the machine learning accelerator.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.