USRE45458EExpiredUtilityPatentIndex 79
Dual function system and method for shuffling packed data elements
Est. expiryMar 31, 2018(expired)· nominal 20-yr term from priority
Inventors:ROUSSEL PATRICECHENNUPATY SRINIVASCRANFORD MICHEAL DABDALLAH MOHAMMED ACOKE JAMESKONG KATHERINE
G06F 9/30025G06F 9/30036G06F 15/7885G06F 9/30032
79
PatentIndex Score
7
Cited by
99
References
56
Claims
Abstract
An apparatus and method for performing a shuffle operation on packed data using computer-implemented steps is described. In one embodiment, a first packed data operand having at least two data elements is accessed. A second packed data operand having at least two data elements is accessed. One of the data elements in the first packed data operand is shuffled into a lower destination field of a destination register, and one of the data elements in the second packed data operand is shuffled into an upper destination field of the destination register.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. A computer system comprising:
a hardware unit to transmit data representing graphics to another computer or a display; a processor coupled to the hardware unit; and a storage device coupled to the processor and having stored therein an instruction, which when executed by the processor, causes the processor to at least, access a first packed data operand having at least two data elements; access a second packed data operand having at least two data elements; select a first set of data elements from the first packed data operand; copy each of the data elements in the first set to specified data fields located in the tower half of a destination operand; select a second set of data elements from the second packed data operand; and copy each of the data elements in the second set to specified data fields located in the upper half of the destination operand.
2. The computer system of claim 1 wherein the storage device further comprises a packing device for packing floating point data into the data elements.
3. The computer system of claim 1 wherein the storage device further comprises a packing device for packing integer data into the data elements.
4. A system as claimed in claim 1 wherein the first and second packed data operands are the same operand.
5. A method comprising the computer-implemented steps of:
decoding a single instruction; in response to the step of decoding the single instruction, accessing a first packed data operand having at least two data elements; accessing a second packed data operand having at least two data elements; selecting a first set of data elements from the first packed data operand; copying each of the data elements in the first set to specified data fields located in the lower half of a destination operand; selecting a second set of data elements from the second packed data operand; and copying each of the data elements in the second set to specified data fields located in the upper half of the destination operand.
6. The method of claim 5 further comprising the step of packing floating point data into the data elements.
7. The method of claim 5 further comprising the step of packing integer data into the data elements.
8. A method as claimed in claim 5 wherein the first and second packed data operands are the same operand.
9. A method comprising the computer implemented steps of:
accessing data representative of a first three-dimensional image; altering the data using three-dimensional geometry to generate a second three-dimensional image, the step of altering at least including, accessing a first packed data operand having at least two data elements; accessing a second packed data operand having at least two data elements; selecting a first set of data elements from the first packed data operand; copying each of the data elements in the first set to specified data fields located in the lower half of a destination operand; selecting a second set of data elements from the second packed data operand; copying each of the data elements in the second set to specified data fields located in the upper half of the destination operand; and displaying the second three-dimensional image.
10. The method of claim 9 wherein the step of altering includes the performance of a three-dimensional transformation.
11. The method of claim 9 wherein the step of altering includes the step of packing floating point data into the data elements.
12. The method of claim 9 wherein the step of altering includes the step of packing integer data into the data elements.
13. A method as claimed in claim 9 wherein the first and second packed data operands are the same operand.
14. A method comprising the computer implemented steps of:
accessing data representative of a first three-dimensional image; altering the data using three-dimensional geometry to generate a second three-dimensional image, the step of altering at least including, accessing a first packed data operand having at least two data elements; accessing a second packed data operand having at least two data elements; selecting a first set of data elements from the first packed data operand; copying each of the data elements in the first set to specified data fields located in the lower half of a destination operand; selecting a second set of data elements from the second packed data operand; copying each of the data elements in the second set to specified data fields located in the upper half of the destination operand; and displaying the second three-dimensional image.
15. The method of claim 14 wherein the step of altering includes the performance of a three-dimensional transformation.
16. The method of claim 14 wherein the step of altering includes the step of packing floating point data into the data elements.
17. The method of claim 14 wherein the step of altering includes the step of packing integer data into the data elements.
18. A method as claimed in claim 14 wherein the first and second packed data operands are the same operand.
19. A processor-implemented method for reducing the number of control hits required to shuffle packed data elements from first and second source operands, comprising the steps of:
decoding a single instruction specifying first and second source operands and a field of control bits; and
responsive to the field of control bits, generating a resultant packed data operand comprised of packed data elements from the first and second source operands,
wherein the control bits are limited to specifying for the upper and lower halves of the resultant packed data operand, data elements from the first and second source operands, respectively.
20. The method as claimed in claim 19 wherein the first and second packed data source operands and the resultant packed data operand are comprised of four packed data elements, and the field of control bits is an 8-bit field.
21. The method as claimed in claim 19 wherein the first and second packed data source operands are the same operand.
22. The method as claimed in claim 19 wherein the first and second packed data source operands are packed with floating point data.
23. A processor for performing a shuffle operation in response to a shuffle instruction comprising:
a decoder which decodes a single instruction specifying first and second source operands and a field of control bits; and
an execution unit which, responsive to the field of control bits, generates a resultant packed data operand comprised of packed data elements from the first and second source operands,
wherein the control bits are limited to specifying for the upper and lower halves of the resultant packed data operand, data elements from the first and second source operands, respectively.
24. The processor as claimed in claim 23 wherein the first and second source operands are the same operand.
25. The method as claimed in claim 19 wherein the first and second packed data source operands and the resultant packed data operand are each comprised of at least two packed data elements.
26. The method as claimed in claim 19 wherein the field of control bits is an 8-bit field.
27. The method as claimed in claim 26 wherein an 8-bit immediate to fill the field of control bits is decoded with the single instruction.
28. The processor of claim 23 wherein said field of control bits comprises of an 8-bit immediate value.
29. The processor of claim 23 wherein said field of control bits comprises of an 8-bits.
30. The processor of claim 29 wherein said first and second source operands comprise of double-precision floating-point values.
31. The processor of claim 29 wherein said first and second source operands comprise single-precision floating-point values.
32. The processor of claim 29 wherein said packed data elements comprise of packed double words.
33. The processor of claim 29 wherein said packed data elements comprise of packed words.
34. The processor of claim 29 wherein said packed data elements comprise of packed bytes.
35. The processor of claim 29 wherein said first and said second operands comprise of 128-bits of packed data.
36. An apparatus comprising:
a decode unit to decode a shuffle instruction into control signals, said shuffle instruction to include a first operand, a second operand, and a third operand wherein said third operand comprises of an 8-bit immediate value; said first operand to identify a first register to hold at least two packed data elements; said second operand to identify a memory location to hold at least two packed data elements; said third operand is to provide selection bits to indicate which of said packed data elements in said first operand and said second operand to select and copy to a resultant register; and an execution unit coupled to said decode unit, said execution unit responsive to said control signals and said selection bits to select a first set of data elements from said first register and to copy said first set of data elements to one or more lower destination fields of said resultant register, said execution unit further responsive to said control signals and said selection bits to select a second set of data elements from said memory location and to copy said second set of data elements to one or more upper destination fields of said resultant register.
37. The apparatus of claim 36 wherein said data elements of said first register and said second register comprise double-precision floating-point values.
38. The apparatus of claim 36 wherein said data elements of said first register and said second register comprise of single-precision floating-point values.
39. The apparatus of claim 36 wherein said packed data elements comprise of packed double words.
40. The apparatus of claim 36 wherein said packed data elements comprise of packed words.
41. The apparatus of claim 36 wherein said packed data elements comprise of packed bytes.
42. The apparatus of claim 36 wherein said first register is also said resultant register.
43. An apparatus comprising:
an instruction decoder to receive and decode a shuffle instruction, said shuffle instruction to include an immediate operand comprising two or more sets of control bits; a first source register to hold a first packed data, said first packed data comprising of a first data element and a second data element; a second source register to hold a second packed data, said second packed data comprising of a third data element and a fourth data element; a destination register to hold a third packed data; an execution unit coupled to said first source resister to receive said first packed data, and to said second source register to receive said second packed data; and wherein said execution unit is further coupled to said instruction decoder to receive said two or more sets of control bits, said execution unit to select from said first source register at least one of said first and second data elements in response to a first one of said two or more sets of control bits and to copy said selected data element from said first source register to a first data field in a lower half of said destination register, and said execution unit to select from said second source register at least one of said third and fourth data elements in response to a second one of said two or more sets of control bits and to copy said selected data element from said second source register to a second data field in an upper half of said destination register.
44. The apparatus of claim 43 wherein said immediate operand is an 8-bit immediate operand.
45. The apparatus of claim 43 wherein said data elements of said first source register and said second source register comprise of double-precision floating-point values.
46. The apparatus of claim 43 wherein said data elements of said first source register and said second source register comprise of single-precision floating-point values.
47. The apparatus of claim 43 wherein said packed data comprise of packed double words.
48. The apparatus of claim 43 wherein said packed data comprise of packed words.
49. The apparatus of claim 43 wherein said packed data comprise of packed bytes.
50. The apparatus of claim 43 wherein said apparatus is defined by machine readable data on a machine readable medium.
51. The apparatus of claim 43 wherein said first source register is also said destination register.
52. The apparatus of claim 43 wherein said first source register is the same as said second source register.
53. The apparatus of claim 43 wherein said two or more sets of control bits comprise bits 0 and 1 of the immediate operand.
54. The apparatus of claim 44 wherein said 8-bit immediate operand comprises bits 0 and 1 to select from said first source register which data element is copied into the lowest data field in the lower half of the destination register, and bits 4 and 5 to select from said second source register which data element is copied into the lowest data field in the upper half of the destination register.
55. The apparatus of claim 44 wherein said 8-bit immediate operand comprises bits 0 through 3 to select from said first source register which data elements are copied into the lower half of the destination register, and bits 4 through 7 to select from said second source register which data elements are copied into the upper half of the destination register.
56. The apparatus of claim 55 wherein said 8-bit immediate operand comprises bits 2 and 3 to select from said first source register which data element is copied into the highest data field in the lower half of the destination register, and bits 6 and 7 to select from said second source register which data element is copied into the highest data field in the upper half of the destination register.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.