P
US11599498B1ActiveUtilityPatentIndex 61

Device with data processing engine array that enables partial reconfiguration

Assignee: XILINX INCPriority: Apr 3, 2018Filed: Oct 12, 2020Granted: Mar 7, 2023
Est. expiryApr 3, 2038(~11.7 yrs left)· nominal 20-yr term from priority
Inventors:NOGUERA SERRA JUAN JDATE SNEHA BHALCHANDRALANGER JANOZGUL BARISBILSKI GORAN HK
G06F 15/177G06F 9/4411G06F 9/4401G06F 15/7825G06F 15/80G06F 15/17306G06F 15/7867G06F 1/24
61
PatentIndex Score
0
Cited by
54
References
20
Claims

Abstract

A device may include a processor system and an array of data processing engines (DPEs) communicatively coupled to the processor system. Each of the DPEs includes a core and a DPE interconnect. The processor system is configured to transmit configuration data to the array of DPEs, and each of the DPEs is independently configurable based on the configuration data received at the respective DPE via the DPE interconnect of the respective DPE. The array of DPEs enable, without modifying operation of a first kernel of a first subset of the DPEs of the array of DPEs, reconfiguration of a second subset of the DPEs of the array of DPEs.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A device comprising:
 an array of data processing engines (DPEs), wherein:
 each DPE of the array of DPEs comprises a core and a configuration memory space; 
 the array of DPEs comprises a stream network and a memory mapped network; 
 the stream network is configurable to route application data between DPEs of the array of DPEs; 
 the memory mapped network is configured to route memory mapped transactions, based on addresses contained in the respective memory mapped transactions, among the array of DPEs and to write configuration data of the respective memory mapped transactions to respective configuration memory spaces of DPEs of the array of DPEs based on the addresses contained in the respective memory mapped transactions; 
 each DPE of the array of DPEs is independently configurable based on configuration data received at the respective DPE via the memory mapped network; and 
 the array of DPEs enables, without modifying operation of a first kernel loaded on a first subset of the array of DPEs, reconfiguration of a second subset of the array of DPEs by memory mapped transactions through the memory mapped network. 
 
 
     
     
       2. The device of  claim 1 , wherein each DPE of the array of DPEs includes:
 a stream switch of the stream network comprising a core stream interface connected to the core of the respective DPE and comprising one or more neighboring stream interfaces to respective one or more neighboring DPEs of the array of DPEs; and 
 a memory mapped switch of the memory mapped network comprising one or more memory mapped interfaces connected to the configuration memory space of the respective DPE and comprising one or more neighboring memory mapped interfaces to respective one or more neighboring DPEs of the array of DPEs. 
 
     
     
       3. The device of  claim 2 , wherein, for each DPE of the array of DPEs, the configuration memory space comprises:
 program memory configured to store executable program code that is executable by the core of the respective DPE; and 
 configuration registers configured to store interconnect data that configures the stream switch of the respective DPE for routing application data via the stream network. 
 
     
     
       4. The device of  claim 2 , wherein the stream switch is configured to be partially reconfigurable while continuing to route application data for a kernel that is not being reconfigured. 
     
     
       5. The device of  claim 2 , wherein the array of DPEs enables, after reconfiguring the second subset of the array of DPEs:
 a data flow to or from one of the first subset of the array of DPEs or the second subset of the array of DPEs is through the stream switch of respective one or more DPEs of the other one of the first subset of the array of DPEs or the second subset of the array of DPEs. 
 
     
     
       6. The device of  claim 1 , wherein the array of DPEs further comprises:
 a broadcast network; and 
 event logic communicatively coupled to the broadcast network and configured to:
 detect a triggering event for partial reconfiguration and responsively transmit a stall signal through the broadcast network; and 
 halt execution of a respective DPE when the triggering event is detected and when a stall signal is received from the broadcast network. 
 
 
     
     
       7. The device of  claim 1  further comprising:
 a processor system; 
 a network-on-chip coupled to the processor system; and 
 a system interface circuit coupled to the network-on-chip and to the array of DPEs, the system interface circuit comprising tiles, each tile being coupled to a column of DPEs of the array of DPEs, the processor system being configured to transmit configuration data to the array of DPEs via the network-on-chip and the system interface circuit. 
 
     
     
       8. The device of  claim 1 , wherein the array of DPEs enables continued operation of the first kernel during reconfiguration of the second subset of the array of DPEs when (i) the first kernel and a second kernel loaded on the second subset of the array of DPEs do not have a shared hardware resource and (ii) no data and/or control dependency exists between the first kernel and the second kernel. 
     
     
       9. The device of  claim 1 , wherein the array of DPEs enables continued operation of the first kernel during reconfiguration of the second subset of the array of DPEs when (i) the first kernel and a second kernel loaded on the second subset of the array of DPEs have a shared hardware resource and (ii) no data and/or control dependency exists between the first kernel and the second kernel. 
     
     
       10. The device of  claim 1 , wherein the array of DPEs enables stalling operation of the first kernel during reconfiguration of the second subset of the array of DPEs when (i) the first kernel and a second kernel loaded on the second subset of the array of DPEs do not have a shared hardware resource and (ii) a data and/or control dependency exists between the first kernel and the second kernel. 
     
     
       11. The device of  claim 1 , wherein the array of DPEs enables stalling operation of the first kernel during reconfiguration of the second subset of the array of DPEs when (i) the first kernel and a second kernel loaded on the second subset of the array of DPEs have a shared hardware resource and (ii) a data and/or control dependency exists between the first kernel and the second kernel. 
     
     
       12. The device of  claim 1 , wherein the array of DPEs enables, after reconfiguring the second subset of the array of DPEs:
 application and/or control data generated by one of the first subset of the array of DPEs or the second subset of the array of DPEs is received and processed by the other one of the first subset of the array of DPEs or the second subset of the array of DPEs. 
 
     
     
       13. A method for operating a device, the method comprising:
 operating a first kernel loaded on a first subset of an array of data processing engines (DPEs), each DPE of the array of DPEs comprising a core and a configuration memory space, the array of DPEs comprising a stream network and a memory mapped network, the stream network being configurable to route application data between DPEs of the array of DPEs; 
 without modifying operation of the first kernel on the first subset of the array of DPEs, configuring a second subset of the array of DPEs to implement a second kernel, configuring the second subset of the array of DPEs comprising:
 routing memory mapped transactions via the memory mapped network, based on addresses contained in the respective memory mapped transactions, to the second subset of the array of DPEs; and 
 writing configuration data of the respective memory mapped transactions to respective configuration memory spaces of DPEs of the second subset of the array of DPEs based on the addresses contained in the respective memory mapped transactions; and 
 
 operating both the first kernel loaded on the first subset of the array of DPEs and the second kernel loaded on the second subset of the array of DPEs after configuring the second subset of the array of DPEs to implement the second kernel. 
 
     
     
       14. The method of  claim 13 , wherein:
 each DPE of the DPEs includes:
 a stream switch of the stream network comprising a core stream interface connected to the core of the respective DPE and comprising one or more neighboring stream interfaces to respective one or more neighboring DPEs of the array of DPEs; and 
 a memory mapped switch of the memory mapped network comprising one or more memory mapped interfaces connected to the configuration memory space of the respective DPE and comprising one or more neighboring memory mapped interfaces to respective one or more neighboring DPEs of the array of DPEs; and 
 
 routing the memory mapped transactions includes routing the memory mapped transactions by one or more memory mapped switches of the memory mapped network. 
 
     
     
       15. The method of  claim 14 , wherein:
 for each DPE of the array of DPEs, the configuration memory space comprises:
 program memory configured to store executable program code that is executable by the core of the respective DPE; and 
 configuration registers configured to store interconnect data that configures the stream switch of the respective DPE for routing application data via the stream network; and 
 
 writing the configuration data of the respective memory mapped transactions includes:
 writing executable program code to the program memory of respective DPEs of the second subset of the array of DPEs; and 
 writing interconnect data to the configuration registers of the stream switch of respective DPEs of the second subset of the array of DPEs. 
 
 
     
     
       16. The method of  claim 13 , wherein operation of the first kernel on the first subset of the array of DPEs continues while configuring the second subset of the array of DPEs to implement the second kernel. 
     
     
       17. The method of  claim 13  further comprising, before configuring the second subset of the array of DPEs to implement the second kernel, stalling operation of the first kernel, wherein operation of the first kernel on the first subset of the array of DPEs is stalled while configuring the second subset of the array of DPEs to implement the second kernel. 
     
     
       18. A device comprising:
 an array of data processing engines (DPEs), each DPE of the array of DPEs comprising a core, program memory, configuration registers, a memory mapped switch, and a stream switch, wherein:
 the program memory is configured to store executable program code that is executable by the core of the respective DPE; 
 the configuration registers are configured to store interconnect data that configures the stream switch of the respective DPE for routing communications via the stream switch of the respective DPE; 
 the memory mapped switch of the respective DPE is configured to route memory mapped transactions based on addresses contained in the respective memory mapped transactions; 
 the memory mapped switches of the array of DPEs are communicatively coupled together to form a memory mapped network; and 
 the stream switches of the array of DPEs are communicatively coupled together to form a stream network; 
 the program memory and the configuration registers of each DPE of the array of DPEs are independently writeable based on configuration data contained in a memory mapped transaction received at the respective DPE via the memory mapped network; and 
 the array of DPEs enabling, without modifying operation of a first kernel loaded on a first subset of the array of DPEs, reconfiguration of a second subset of the array of DPEs by memory mapped transactions through the memory mapped network. 
 
 
     
     
       19. The device of  claim 18 , wherein the stream switch of each DPE of the array of DPEs is configured to be partially reconfigurable while continuing to route communications for a kernel that is not being reconfigured. 
     
     
       20. The device of  claim 18 , wherein the array of DPEs further comprises:
 a broadcast network; and 
 event logic communicatively coupled to the broadcast network and configured to:
 detect a triggering event for partial reconfiguration and responsively transmit a stall signal through the broadcast network; and 
 halt execution of a respective DPE when the triggering event is detected and when a stall signal is received from the broadcast network.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.