Pipelined processor and compiler/scheduler for variable number branch delay slots
Abstract
Different numbers of delay slots are assigned by a compiler/scheduler to each different type of jump operation in a pipelined processor system. The number of delay slots is variable and kept to the minimum needed by each type of jump operation. A compatible processor uses a corresponding number of branch delay slots to exploit the difference in predictability of different types of branch or jump operations. Different types of jump operations resolved their target addresses in different numbers of delay slots. As a result, the compiler/scheduler is able to generate more efficient code than for a processor with a fixed number of delay slots for all jump types, resulting in better processor performance.
Claims
exact text as granted — not AI-modifiedThe invention claimed is:
1. A method, comprising:
determining, with a compiler/scheduler, whether a branch instruction in a sequence of instructions is conditional or non-conditional, wherein the branch instruction is conditional if branching to a target address of the branch instruction is dependent upon a condition of the branch instruction being satisfied during execution;
determining a quantity of associated branch delay slots assigned to the branch instruction, wherein the quantity varies based on whether the branch instruction is conditional or non-conditional and based on an addressing mode used to determine the target address of the branch instruction; and
filling the determined quantity of associated branch delay slots with instructions for which correctness of execution in the sequence of instructions is not dependent on whether execution of the branch instruction results in branching to the target address.
2. The method of claim 1 , wherein said determining a quantity of associated branch delay slots further comprises determining the quantity of associated branch delay slots based on whether the addressing mode uses immediate addressing.
3. The method of claim 1 , wherein said filling the determined quantity of associated branch delay slots comprises:
selecting, from instructions proximate the branch instruction, instructions equal in quantity to the determined quantity of associated branch delay slots; and
moving the selected instructions to the determined quantity of associated branch delay slots.
4. The method of claim 1 , wherein said filling the determined quantity of associated branch delay slots comprises:
moving one or more instructions proximate the branch instruction to one or more of the determined quantity of associated branch delay slots; and
adding a no-op instruction to any non-filled branch delay slot of the determined quantity of associated branch delay slots.
5. The method of claim 1 , wherein said determining a quantity of associated branch delay slots comprises determining that the quantity of associated branch delay slots equals two branch delay slots in response to determining the branch instruction is non-conditional and the addressing mode uses immediate addressing.
6. The method of claim 1 , wherein said determining a quantity of associated branch delay slots comprises determining that the quantity of associated branch delay slots equals four branch delay slots in response to determining the branch instruction is non-conditional and the addressing mode uses register-based addressing.
7. The method of claim 1 , wherein said determining a quantity of associated branch delay slots comprises determining that the quantity of associated branch delay slots equals five branch delay slots in response to determining the branch instruction is conditional.
8. A non-transitory computer readable storage device comprising a plurality of instructions stored therein that, in response to being executed by a computer system, result in a computer system:
determining, with a compiler/scheduler, whether a branch instruction of a sequence of instructions is conditional or non-conditional, wherein the branch instruction is conditional if branching to a target address of the branch instruction is dependent upon a condition of the branch instruction being satisfied during execution;
determining a quantity of associated branch delay slots assigned to the branch instruction, wherein the quantity varies based on whether the branch instruction is conditional or non-conditional and based on an addressing mode used to determine the target address of the branch instruction; and
filling the determined number of branch delay slots with instructions that are selected to maintain a correct result of the sequence of instructions despite execution of the selected instructions regardless of whether execution of the branch instruction results in branching to the target address.
9. The non-transitory computer readable storage device of claim 8 , wherein the plurality of instructions further result in the computer system determining the quantity of associated branch delay slots based further on whether the addressing mode uses immediate addressing.
10. The non-transitory computer readable storage device of claim 8 , wherein the plurality of instructions further result in the computer system:
selecting, from instructions proximate the branch instruction, instructions equal in quantity to the determined quantity of associated branch delay slots; and
moving the selected instructions to the determined quantity of associated branch delay slots.
11. The non-transitory computer readable storage device of claim 8 , wherein the plurality of instructions further result in the computer system:
moving one or more instructions proximate the branch instruction to one or more of the determined quantity of associated branch delay slots; and
adding a no-op instruction to any branch delay slot of the determined quantity of associated branch delay slots that remain empty after said moving.
12. The non-transitory computer readable storage device of claim 8 , wherein the plurality of instructions further result in the computer system determining that the quantity of associated branch delay slots equals two branch delay slots in response to determining the branch instruction is non-conditional and the addressing mode uses immediate addressing.
13. The non-transitory computer readable storage device of claim 8 , wherein the plurality of instructions further result in the computer system determining that the quantity of associated branch delay slots equals four branch delay slots in response to determining the branch instruction is non-conditional and the addressing mode uses register-based addressing.
14. The non-transitory computer readable storage device of claim 8 , wherein the plurality of instructions further result in the computer system determining that the quantity of associated branch delay slots equals five branch delay slots in response to determining the branch instruction is conditional.
15. A computer system, comprising:
a compiler/scheduler configured to:
determine whether a branch instruction in a sequence of instructions is conditional or non-conditional, wherein the branch instruction is conditional if branching to a target address of the branch instruction is dependent upon a condition of the branch instruction being satisfied during execution,
determine a quantity of associated branch delay slots corresponding to instruction slots after an instruction slot for the branch instruction, wherein processing instructions in the corresponding instruction slots begins prior to completion of the branch instruction, and wherein the quantity varies based on whether the branch instruction is conditional or non-conditional and based on an addressing mode used to determine the target address of the branch instruction, and
fill the determined quantity of associated branch delay slots with instructions that are permitted to execute to completion regardless of whether execution of the branch instruction results in branching to the target address; and
a processor pipeline configured to execute the sequence of instructions including the filled branch delay slots.
16. The computer system of claim 15 , wherein the scheduler is further configured to determine the quantity of associated branch delay slots based further on whether the addressing mode uses immediate addressing.
17. The computer system of claim 15 , wherein the scheduler is further configured to select, from instructions proximate the branch instruction, instructions equal in quantity to the determined quantity of associated branch delay slots, and move the selected instructions to the determined quantity of associated branch delay slots.
18. The computer system of claim 15 , wherein the scheduler is further configured to move one or more instructions proximate the branch instruction to one or more of the determined quantity of associated branch delay slots, and add a no-op instruction to any branch delay slots of the determined quantity of associated branch delay slots that remain empty after said move.
19. The computer system of claim 15 , wherein the scheduler is further configured to determine that the quantity of associated branch delay slots equals two branch delay slots in response to determining the branch instruction is an unconditional jump instruction that uses immediate addressing.
20. The computer system of claim 15 , wherein the scheduler is further configured to determine that the quantity of associated branch delay slots equals four branch delay slots in response to determining the branch instruction is an unconditional jump instruction that uses register-based addressing.
21. The computer system of claim 15 , wherein the scheduler is further configured to determine that the quantity of associated branch delay slots equals five branch delay slots in response to determining the branch instruction is a conditional jump instruction.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.