Difference between revisions of "Tests of a SIMD version of DVector3"

From GlueXWiki
Jump to: navigation, search
 
Line 1: Line 1:
* SIMD (Single-Instruction Multiple-Data) instructions provide means of doing multiple calculations or manipulations of several pieces of data with a single operation
+
* SIMD (Single-Instruction Multiple-Data) instructions provide means of doing multiple calculations with or manipulations of several pieces of data with a single operation
 
** Size of special registers is 128 bits:  can do 4 single-precision floating point calculations (SSE) or 2 double-precision floating point calculations (SSE2) at the same time  
 
** Size of special registers is 128 bits:  can do 4 single-precision floating point calculations (SSE) or 2 double-precision floating point calculations (SSE2) at the same time  
 
** Basic arithmetic and logical operations are supported, as are square roots and reciprocal square roots
 
** Basic arithmetic and logical operations are supported, as are square roots and reciprocal square roots
Line 5: Line 5:
 
** Three pairs of coordinates: ''xy'',''yz'',''zx''
 
** Three pairs of coordinates: ''xy'',''yz'',''zx''
 
** gcc compiler flags:  -msse2 -mfpmath=sse -O2
 
** gcc compiler flags:  -msse2 -mfpmath=sse -O2
* Compare to TVector3 on ifarml6:
+
* Compare to TVector3 on ifarml6: 10 million events with randomly generated vectors
 +
<pre>
 +
Operation        DVector3 time (s)    TVector3 time (s)    T3/D3
 +
-----------------------------------------------------------------
 +
v2=-v1              0.054                0.171            3.17
 +
v3=v1+v2            0.092                0.191            2.08
 +
v3+=v1              0.056                0.048            0.86 <-
 +
v3=v1-v2            0.092                0.194            2.11
 +
v2=k v1              0.054                0.174            3.24 
 +
v2*=k                0.030                0.018            0.60 <-
 +
SetMag              0.198                0.198            1.00
 +
SetXYZ              0.007                0.008            1.14
 +
SetMagThetaPhi      0.680                0.890            1.31  Expensive!
 +
Rotate(a,v)          0.625                1.528            2.44  "
 +
RotateZ(a)          0.353                0.402            1.14
 +
make orthogonal v    0.142                0.273            1.92
 +
v3 = v1 x v2        0.107                0.193            1.80
 +
</pre>

Revision as of 08:05, 4 May 2010

  • SIMD (Single-Instruction Multiple-Data) instructions provide means of doing multiple calculations with or manipulations of several pieces of data with a single operation
    • Size of special registers is 128 bits: can do 4 single-precision floating point calculations (SSE) or 2 double-precision floating point calculations (SSE2) at the same time
    • Basic arithmetic and logical operations are supported, as are square roots and reciprocal square roots
  • Implemented DVector3 with SSE2 instructions
    • Three pairs of coordinates: xy,yz,zx
    • gcc compiler flags: -msse2 -mfpmath=sse -O2
  • Compare to TVector3 on ifarml6: 10 million events with randomly generated vectors
Operation        DVector3 time (s)    TVector3 time (s)     T3/D3
-----------------------------------------------------------------
 v2=-v1               0.054                 0.171            3.17
 v3=v1+v2             0.092                 0.191            2.08
 v3+=v1               0.056                 0.048            0.86 <-
 v3=v1-v2             0.092                 0.194            2.11
 v2=k v1              0.054                 0.174            3.24   
 v2*=k                0.030                 0.018            0.60 <-
 SetMag               0.198                 0.198            1.00
 SetXYZ               0.007                 0.008            1.14
 SetMagThetaPhi       0.680                 0.890            1.31  Expensive!
 Rotate(a,v)          0.625                 1.528            2.44   "
 RotateZ(a)           0.353                 0.402            1.14
 make orthogonal v    0.142                 0.273            1.92
 v3 = v1 x v2         0.107                 0.193            1.80