US11567778B2ActiveUtilityPatentIndex 62

Neural network operation reordering for parallel execution

Assignee: AMAZON TECH INCPriority: Jun 26, 2019Filed: Apr 28, 2021Granted: Jan 31, 2023

Est. expiryJun 26, 2039(~13 yrs left)· nominal 20-yr term from priority

Inventors:HUYNH JEFFREY T BORKOVIC DRAZEN ZEJDA JINDRICH HUANG RANDY RENFU DIAMANT RON

G06F 9/3855G06N 3/08G06F 9/5016G06F 9/5027G06N 3/04G06N 3/0464G06F 8/451G06F 8/452G06N 5/01G06F 8/4441G06N 3/063G06F 8/433G06F 9/4881G06F 9/3856

PatentIndex Score

Cited by

References

Claims

Abstract

Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

Claims

exact text as granted — not AI-modified

What is claimed is:

1. A method comprising:
receiving, by a compiler, a description of a neural network comprising a plurality of operations;
identifying a set of diverging branches within the plurality of operations, the set of diverging branches including a first branch and a second branch;
determining that the first branch is to be performed after the second branch;
determining a first memory usage associated with the first branch and a second memory usage associated with the second branch;
determining that the first memory usage is less than the second memory usage; and
modifying an order of performance of the plurality of operations to cause the first branch to be performed before the second branch.

2. The method of claim 1 , further comprising:
generating machine instructions based on the modified order of performance.

3. The method of claim 1 , further comprising:
grouping the plurality of operations into a set of blocks.

4. The method of claim 3 , wherein the first branch includes one or more first blocks from the set of blocks and the second branch includes one or more second blocks from the set of blocks.

5. The method of claim 4 , wherein determining the first memory usage associated with the first branch includes determining the first memory usage associated with each of the one or more first blocks, and wherein determining the second memory usage associated with the second branch includes determining the second memory usage associated with each of the one or more second blocks.

6. The method of claim 5 , wherein determining that the first memory usage is less than the second memory usage includes determining that the first memory usage associated with each of the one or more first blocks is less than the second memory usage associated with each of the one or more second blocks.

7. The method of claim 1 , wherein determining the first memory usage includes determining a first number of channels that are written to by the plurality of operations that are included in the first branch, and wherein determining the second memory usage includes determining a second number of channels that are written to by the plurality of operations that are included in the second branch.

8. The method of claim 7 , wherein determining that the first memory usage is less than the second memory usage includes determining that the first number of channels is less than the second number of channels.

9. A non-transitory computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform operations comprising:
receiving, by a compiler, a description of a neural network comprising a plurality of operations;
identifying a set of diverging branches within the plurality of operations, the set of diverging branches including a first branch and a second branch;
determining that the first branch is to be performed after the second branch;
determining a first memory usage associated with the first branch and a second memory usage associated with the second branch;
determining that the first memory usage is less than the second memory usage; and
modifying an order of performance of the plurality of operations to cause the first branch to be performed before the second branch.

10. The non-transitory computer-readable medium of claim 9 , wherein the operations further comprise:
grouping the plurality of operations into a set of blocks.

11. The non-transitory computer-readable medium of claim 10 , wherein the first branch includes one or more first blocks from the set of blocks and the second branch includes one or more second blocks from the set of blocks.

12. The non-transitory computer-readable medium of claim 11 , wherein determining the first memory usage associated with the first branch includes determining the first memory usage associated with each of the one or more first blocks, and wherein determining the second memory usage associated with the second branch includes determining the second memory usage associated with each of the one or more second blocks.

13. The non-transitory computer-readable medium of claim 12 , wherein determining that the first memory usage is less than the second memory usage includes determining that the first memory usage associated with each of the one or more first blocks is less than the second memory usage associated with each of the one or more second blocks.

14. The non-transitory computer-readable medium of claim 9 , wherein determining the first memory usage includes determining a first number of channels that are written to by the plurality of operations that are included in the first branch, and wherein determining the second memory usage includes determining a second number of channels that are written to by the plurality of operations that are included in the second branch.

15. The non-transitory computer-readable medium of claim 14 , wherein determining that the first memory usage is less than the second memory usage includes determining that the first number of channels is less than the second number of channels.

16. A system comprising:
a processor; and
a non-transitory computer-readable medium comprising instructions that, when executed by the processor, cause the processor to perform operations comprising:
receiving, by a compiler, a description of a neural network comprising a plurality of operations;
identifying a set of diverging branches within the plurality of operations, the set of diverging branches including a first branch and a second branch;
determining that the first branch is to be performed after the second branch;
determining a first memory usage associated with the first branch and a second memory usage associated with the second branch;
determining that the first memory usage is less than the second memory usage; and
modifying an order of performance of the plurality of operations to cause the first branch to be performed before the second branch.

17. The system of claim 16 , wherein the operations further comprise:
grouping the plurality of operations into a set of blocks.

18. The system of claim 17 , wherein the first branch includes one or more first blocks from the set of blocks and the second branch includes one or more second blocks from the set of blocks.

19. The system of claim 18 , wherein determining the first memory usage associated with the first branch includes determining the first memory usage associated with each of the one or more first blocks, and wherein determining the second memory usage associated with the second branch includes determining the second memory usage associated with each of the one or more second blocks.

20. The system of claim 19 , wherein determining that the first memory usage is less than the second memory usage includes determining that the first memory usage associated with each of the one or more first blocks is less than the second memory usage associated with each of the one or more second blocks.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.