|
ubuntu 10.04 x64 + CUDA toolkit 3.1 + GTX 280 + 256.35"
Running (16 x 16 x 16) blocks of 512 empty threads...done
Running (16 x 16 x 16) blocks of 512 empty threads: 67.129 ms
Running clock() test...
kclock:
( 370976, 371004): 28
kclock_test2: [10 blocks, 1 thread(s)/block]
kclock_test2: [30 blocks, 1 thread(s)/block]
Block 00: start: 00375628, stop: 00378710
Block 01: start: 00375360, stop: 00378442
Block 02: start: 00375588, stop: 00378670
Block 03: start: 00375356, stop: 00378438
Block 04: start: 00375404, stop: 00378486
Block 05: start: 00375384, stop: 00378466
Block 06: start: 00375540, stop: 00378622
Block 07: start: 00375380, stop: 00378462
Block 08: start: 00375468, stop: 00378550
Block 09: start: 00376948, stop: 00380030
Block 00: start: 00384168, stop: 00387250
Block 10: start: 00384176, stop: 00387258
Block 20: start: 00384160, stop: 00387242
Block 01: start: 00384012, stop: 00387094
Block 11: start: 00384020, stop: 00387102
Block 21: start: 00384004, stop: 00387086
Block 02: start: 00383968, stop: 00387050
Block 12: start: 00383960, stop: 00387042
Block 22: start: 00383960, stop: 00387042
Block 03: start: 00383844, stop: 00386926
Block 13: start: 00383836, stop: 00386918
Block 23: start: 00383836, stop: 00386918
Block 04: start: 00383924, stop: 00387006
Block 14: start: 00383916, stop: 00386998
Block 24: start: 00383916, stop: 00386998
Block 05: start: 00383932, stop: 00387014
Block 15: start: 00383924, stop: 00387006
Block 25: start: 00383924, stop: 00387006
Block 06: start: 00384068, stop: 00387150
Block 16: start: 00384076, stop: 00387158
Block 26: start: 00384084, stop: 00387166
Block 07: start: 00384012, stop: 00387094
Block 17: start: 00384020, stop: 00387102
Block 27: start: 00384028, stop: 00387110
Block 08: start: 00384080, stop: 00387162
Block 18: start: 00384088, stop: 00387170
Block 28: start: 00384096, stop: 00387178
Block 09: start: 00385528, stop: 00388610
Block 19: start: 00385512, stop: 00388594
Block 29: start: 00385520, stop: 00388602
Running pipeline tests...
Pipeline latency (512 dependent operations)
mul: 49162 clk (96.020 clk/warp)
Running pipeline tests...
K_ADD_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_RSQRT_FLOAT_DEP128 latency: 7190 clk (28.086 clk/warp)
K_ADD_DOUBLE_DEP128 latency: 12276 clk (47.953 clk/warp)
K_ADD_UINT_DEP128 throughput: 21310 clk (6.151 ops/clk)
K_RSQRT_FLOAT_DEP128 throughput: 65672 clk (1.996 ops/clk)
K_ADD_DOUBLE_DEP128 throughput: 131346 clk (0.998 ops/clk)
K_ADD_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_SUB_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_MAD_UINT_DEP128 latency: 30730 clk (120.039 clk/warp)
K_MUL_UINT_DEP128 latency: 24586 clk (96.039 clk/warp)
K_DIV_UINT_DEP128 latency: 155664 clk (608.062 clk/warp)
K_REM_UINT_DEP128 latency: 186384 clk (728.062 clk/warp)
K_MIN_UINT_DEP128 latency: 12378 clk (48.352 clk/warp)
K_MAX_UINT_DEP128 latency: 12378 clk (48.352 clk/warp)
K_ADD_UINT_DEP128 throughput: 21518 clk (6.091 ops/clk)
K_SUB_UINT_DEP128 throughput: 21518 clk (6.091 ops/clk)
K_MAD_UINT_DEP128 throughput: 107530 clk (1.219 ops/clk)
K_MUL_UINT_DEP128 throughput: 86368 clk (1.518 ops/clk)
K_DIV_UINT_DEP128 throughput: 412540 clk (0.318 ops/clk)
K_REM_UINT_DEP128 throughput: 495040 clk (0.265 ops/clk)
K_MIN_UINT_DEP128 throughput: 42668 clk (3.072 ops/clk)
K_MAX_UINT_DEP128 throughput: 42614 clk (3.076 ops/clk)
K_ADD_INT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_SUB_INT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_MAD_INT_DEP128 latency: 30780 clk (120.234 clk/warp)
K_MUL_INT_DEP128 latency: 24616 clk (96.156 clk/warp)
K_DIV_INT_DEP128 latency: 175120 clk (684.062 clk/warp)
K_REM_INT_DEP128 latency: 200720 clk (784.062 clk/warp)
K_MIN_INT_DEP128 latency: 12378 clk (48.352 clk/warp)
K_MAX_INT_DEP128 latency: 12378 clk (48.352 clk/warp)
K_ABS_INT_DEP128 latency: 10876 clk (42.484 clk/warp)
K_ADD_INT_DEP128 throughput: 21300 clk (6.154 ops/clk)
K_SUB_INT_DEP128 throughput: 21310 clk (6.151 ops/clk)
K_MAD_INT_DEP128 throughput: 106810 clk (1.227 ops/clk)
K_MUL_INT_DEP128 throughput: 82838 clk (1.582 ops/clk)
K_DIV_INT_DEP128 throughput: 523372 clk (0.250 ops/clk)
K_REM_INT_DEP128 throughput: 589288 clk (0.222 ops/clk)
K_MIN_INT_DEP128 throughput: 42648 clk (3.073 ops/clk)
K_MAX_INT_DEP128 throughput: 42672 clk (3.072 ops/clk)
K_ABS_INT_DEP128 throughput: 41808 clk (3.135 ops/clk)
K_ADD_FLOAT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_SUB_FLOAT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_MAD_FLOAT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_MUL_FLOAT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_DIV_FLOAT_DEP128 latency: 33082 clk (129.227 clk/warp)
K_MIN_FLOAT_DEP128 latency: 12378 clk (48.352 clk/warp)
K_MAX_FLOAT_DEP128 latency: 12378 clk (48.352 clk/warp)
K_ADD_FLOAT_DEP128 throughput: 21344 clk (6.141 ops/clk)
K_SUB_FLOAT_DEP128 throughput: 21486 clk (6.100 ops/clk)
K_MAD_FLOAT_DEP128 throughput: 21486 clk (6.100 ops/clk)
K_MUL_FLOAT_DEP128 throughput: 10528 clk (12.450 ops/clk)
K_DIV_FLOAT_DEP128 throughput: 73848 clk (1.775 ops/clk)
K_MIN_FLOAT_DEP128 throughput: 42696 clk (3.070 ops/clk)
K_MAX_FLOAT_DEP128 throughput: 42698 clk (3.070 ops/clk)
K_ADD_DOUBLE_DEP128 latency: 12276 clk (47.953 clk/warp)
K_SUB_DOUBLE_DEP128 latency: 12276 clk (47.953 clk/warp)
K_MAD_DOUBLE_DEP128 latency: 12276 clk (47.953 clk/warp)
K_MUL_DOUBLE_DEP128 latency: 12276 clk (47.953 clk/warp)
K_DIV_DOUBLE_DEP128 latency: 348868 clk (1362.766 clk/warp)
K_MIN_DOUBLE_DEP128 latency: 24564 clk (95.953 clk/warp)
K_MAX_DOUBLE_DEP128 latency: 24564 clk (95.953 clk/warp)
K_ADD_DOUBLE_DEP128 throughput: 131346 clk (0.998 ops/clk)
K_SUB_DOUBLE_DEP128 throughput: 131346 clk (0.998 ops/clk)
K_MAD_DOUBLE_DEP128 throughput: 131346 clk (0.998 ops/clk)
K_MUL_DOUBLE_DEP128 throughput: 131346 clk (0.998 ops/clk)
K_DIV_DOUBLE_DEP128 throughput: 2043848 clk (0.064 ops/clk)
K_MIN_DOUBLE_DEP128 throughput: 262470 clk (0.499 ops/clk)
K_MAX_DOUBLE_DEP128 throughput: 262470 clk (0.499 ops/clk)
K_AND_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_OR_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_XOR_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_SHL_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_SHR_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_AND_UINT_DEP128 throughput: 21300 clk (6.154 ops/clk)
K_OR_UINT_DEP128 throughput: 21516 clk (6.092 ops/clk)
K_XOR_UINT_DEP128 throughput: 21614 clk (6.064 ops/clk)
K_SHL_UINT_DEP128 throughput: 21300 clk (6.154 ops/clk)
K_SHR_UINT_DEP128 throughput: 21300 clk (6.154 ops/clk)
K_UMUL24_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_MUL24_INT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_UMULHI_UINT_DEP128 latency: 36946 clk (144.320 clk/warp)
K_MULHI_INT_DEP128 latency: 46246 clk (180.648 clk/warp)
K_USAD_UINT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_SAD_INT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_UMUL24_UINT_DEP128 throughput: 21300 clk (6.154 ops/clk)
K_MUL24_INT_DEP128 throughput: 21516 clk (6.092 ops/clk)
K_UMULHI_UINT_DEP128 throughput: 109038 clk (1.202 ops/clk)
K_MULHI_INT_DEP128 throughput: 140950 clk (0.930 ops/clk)
K_USAD_UINT_DEP128 throughput: 21576 clk (6.075 ops/clk)
K_SAD_INT_DEP128 throughput: 21574 clk (6.075 ops/clk)
K_FADD_RN_FLOAT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_FADD_RZ_FLOAT_DEP128 latency: 6154 clk (24.039 clk/warp)
K_FMUL_RN_FLOAT_DEP128 latency: 6664 clk (26.031 clk/warp)
K_FMUL_RZ_FLOAT_DEP128 latency: 6664 clk (26.031 clk/warp)
K_FDIVIDEF_FLOAT_DEP128 latency: 13332 clk (52.078 clk/warp)
K_FADD_RN_FLOAT_DEP128 throughput: 21334 clk (6.144 ops/clk)
K_FADD_RZ_FLOAT_DEP128 throughput: 21342 clk (6.142 ops/clk)
K_FMUL_RN_FLOAT_DEP128 throughput: 11724 clk (11.180 ops/clk)
K_FMUL_RZ_FLOAT_DEP128 throughput: 11724 clk (11.180 ops/clk)
K_FDIVIDEF_FLOAT_DEP128 throughput: 66830 clk (1.961 ops/clk)
K_DADD_RN_DOUBLE_DEP128 latency: 12276 clk (47.953 clk/warp)
K_DADD_RN_DOUBLE_DEP128 throughput: 131346 clk (0.998 ops/clk)
K_RCP_FLOAT_DEP128 latency: 13332 clk (52.078 clk/warp)
K_SQRT_FLOAT_DEP128 latency: 14406 clk (56.273 clk/warp)
K_RSQRT_FLOAT_DEP128 latency: 7190 clk (28.086 clk/warp)
K_RCP_FLOAT_DEP128 throughput: 65674 clk (1.996 ops/clk)
K_SQRT_FLOAT_DEP128 throughput: 131208 clk (0.999 ops/clk)
K_RSQRT_FLOAT_DEP128 throughput: 65672 clk (1.996 ops/clk)
K_SINF_FLOAT_DEP128 latency: 12382 clk (48.367 clk/warp)
K_COSF_FLOAT_DEP128 latency: 12382 clk (48.367 clk/warp)
K_TANF_FLOAT_DEP128 latency: 25104 clk (98.062 clk/warp)
K_EXPF_FLOAT_DEP128 latency: 18442 clk (72.039 clk/warp)
K_EXP2F_FLOAT_DEP128 latency: 12382 clk (48.367 clk/warp)
K_EXP10F_FLOAT_DEP128 latency: 18442 clk (72.039 clk/warp)
K_LOGF_FLOAT_DEP128 latency: 13396 clk (52.328 clk/warp)
K_LOG2F_FLOAT_DEP128 latency: 7190 clk (28.086 clk/warp)
K_LOG10F_FLOAT_DEP128 latency: 13396 clk (52.328 clk/warp)
K_POWF_FLOAT_DEP128 latency: 18992 clk (74.188 clk/warp)
K_SINF_FLOAT_DEP128 throughput: 65760 clk (1.993 ops/clk)
K_COSF_FLOAT_DEP128 throughput: 65760 clk (1.993 ops/clk)
K_TANF_FLOAT_DEP128 throughput: 197642 clk (0.663 ops/clk)
K_EXPF_FLOAT_DEP128 throughput: 65850 clk (1.990 ops/clk)
K_EXP2F_FLOAT_DEP128 throughput: 65760 clk (1.993 ops/clk)
K_EXP10F_FLOAT_DEP128 throughput: 65772 clk (1.993 ops/clk)
K_LOGF_FLOAT_DEP128 throughput: 65810 clk (1.992 ops/clk)
K_LOG2F_FLOAT_DEP128 throughput: 65672 clk (1.996 ops/clk)
K_LOG10F_FLOAT_DEP128 throughput: 65810 clk (1.992 ops/clk)
K_POWF_FLOAT_DEP128 throughput: 131800 clk (0.994 ops/clk)
K_INTASFLOAT_UINT_DEP128 latency: 5136 clk (20.062 clk/warp)
K_FLOATASINT_FLOAT_DEP128 latency: 5136 clk (20.062 clk/warp)
K_INTASFLOAT_UINT_DEP128 throughput: 20658 clk (6.345 ops/clk)
K_FLOATASINT_FLOAT_DEP128 throughput: 20770 clk (6.311 ops/clk)
K_POPC_UINT_DEP128 latency: 75296 clk (294.125 clk/warp)
K_CLZ_UINT_DEP128 latency: 31012 clk (121.141 clk/warp)
K_POPC_UINT_DEP128 throughput: 288064 clk (0.455 ops/clk)
K_CLZ_UINT_DEP128 throughput: 71650 clk (1.829 ops/clk)
K_ALL_UINT_DEP128 latency: 38086 clk (148.773 clk/warp)
K_ANY_UINT_DEP128 latency: 38086 clk (148.773 clk/warp)
K_SYNC_UINT_DEP128 latency: 64 clk (0.250 clk/warp)
K_ALL_UINT_DEP128 throughput: 96672 clk (1.356 ops/clk)
K_ANY_UINT_DEP128 throughput: 96486 clk (1.358 ops/clk)
K_SYNC_UINT_DEP128 throughput: 442 clk (296.543 ops/clk)
Pipeline latency/throughput with multiple warps (200 iterations of 256 ops)
K_ADD_UINT_DEP128:
1 warp ( 1 thr) 1230800 clk (24.039 clk/warp, 0.042 ops/clk) Histogram { (24: 200) }
1 warp ( 2 thr) 1230800 clk (24.039 clk/warp, 0.083 ops/clk) Histogram { (24: 200) }
1 warp ( 3 thr) 1230800 clk (24.039 clk/warp, 0.125 ops/clk) Histogram { (24: 200) }
1 warp ( 4 thr) 1230800 clk (24.039 clk/warp, 0.166 ops/clk) Histogram { (24: 200) }
1 warp ( 6 thr) 1230800 clk (24.039 clk/warp, 0.250 ops/clk) Histogram { (24: 200) }
1 warp ( 8 thr) 1230800 clk (24.039 clk/warp, 0.333 ops/clk) Histogram { (24: 200) }
1 warp ( 16 thr) 1230800 clk (24.039 clk/warp, 0.666 ops/clk) Histogram { (24: 200) }
1 warp ( 24 thr) 1230800 clk (24.039 clk/warp, 0.998 ops/clk) Histogram { (24: 200) }
1 warp ( 32 thr) 1230800 clk (24.039 clk/warp, 1.331 ops/clk) Histogram { (24: 200) }
2 warps ( 64 thr) 1231600 clk (24.039 clk/warp, 2.661 ops/clk) Histogram { (24: 400) }
3 warps ( 96 thr) 1234800 clk (24.068 clk/warp, 3.981 ops/clk) Histogram { (24: 600) }
4 warps (128 thr) 1236520 clk (24.079 clk/warp, 5.300 ops/clk) Histogram { (24: 800) }
5 warps (160 thr) 1240540 clk (24.133 clk/warp, 6.604 ops/clk) Histogram { (24: 1000) }
6 warps (192 thr) 1247838 clk (24.201 clk/warp, 7.878 ops/clk) Histogram { (24: 1200) }
7 warps (224 thr) 2107458 clk (25.650 clk/warp, 5.442 ops/clk) Histogram { (24: 1076) (30: 324) }
8 warps (256 thr) 2616310 clk (27.643 clk/warp, 5.010 ops/clk) Histogram { (24: 200) (25: 424) (26: 116) (28: 455) (30: 394) (31: 11) }
9 warps (288 thr) 2681110 clk (30.535 clk/warp, 5.500 ops/clk) Histogram { (24: 184) (25: 16) (27: 600) (33: 956) (34: 44) }
10 warps (320 thr) 2768950 clk (35.554 clk/warp, 5.917 ops/clk) Histogram { (26: 190) (27: 212) (35: 4) (36: 403) (37: 785) (38: 8) (39: 398) }
11 warps (352 thr) 2981822 clk (38.441 clk/warp, 6.044 ops/clk) Histogram { (26: 16) (28: 184) (29: 184) (30: 184) (33: 48) (40: 540) (41: 476) (43: 562) (44: 6) }
12 warps (384 thr) 3418442 clk (41.891 clk/warp, 5.751 ops/clk) Histogram { (25: 200) (41: 1023) (42: 177) (45: 1000) }
13 warps (416 thr) 3486864 clk (45.848 clk/warp, 6.108 ops/clk) Histogram { (28: 185) (30: 15) (39: 400) (40: 200) (48: 1015) (50: 236) (51: 549) }
14 warps (448 thr) 3823702 clk (49.530 clk/warp, 5.999 ops/clk) Histogram { (26: 200) (48: 1065) (49: 135) (52: 468) (53: 597) (55: 285) (56: 50) }
15 warps (480 thr) 3976764 clk (53.439 clk/warp, 6.180 ops/clk) Histogram { (31: 47) (32: 153) (45: 400) (46: 200) (55: 800) (56: 400) (57: 47) (58: 251) (59: 702) }
16 warps (512 thr) 4288294 clk (57.354 clk/warp, 6.113 ops/clk) Histogram { (27: 82) (28: 75) (29: 43) (55: 1082) (56: 118) (60: 497) (61: 648) (62: 298) (63: 322) (64: 35) }
K_MUL_FLOAT_DEP128 throughput: 10634 clk (12.326 ops/clk)
K_MAD_FLOAT_DEP128 throughput: 21504 clk (6.095 ops/clk)
KADD_MUL throughput: 11540 clk (11.358 ops/clk)
KADD_MUL2 throughput: 64 thrds 3150 clk (5.201 ops/clk)
++++++++++++++++++++++++++++++++++++++++++++++++++
K_SYNC_UINT_DEP128 latency: 64 clk (0.250 clk/warp)
K_SYNC_UINT_DEP128 latency: 72 clk (0.281 clk/warp)
K_SYNC_UINT_DEP128 latency: 72 clk (0.281 clk/warp)
K_SYNC_UINT_DEP128 latency: 72 clk (0.281 clk/warp)
K_SYNC_UINT_DEP128 latency: 84 clk (0.328 clk/warp)
K_SYNC_UINT_DEP128 latency: 84 clk (0.328 clk/warp)
K_SYNC_UINT_DEP128 latency: 112 clk (0.438 clk/warp)
K_SYNC_UINT_DEP128 latency: 116 clk (0.453 clk/warp)
K_SYNC_UINT_DEP128 latency: 124 clk (0.484 clk/warp)
K_SYNC_UINT_DEP128 latency: 304 clk (1.188 clk/warp)
K_SYNC_UINT_DEP128 latency: 140 clk (0.547 clk/warp)
K_SYNC_UINT_DEP128 latency: 326 clk (1.273 clk/warp)
K_SYNC_UINT_DEP128 latency: 168 clk (0.656 clk/warp)
K_SYNC_UINT_DEP128 latency: 386 clk (1.508 clk/warp)
K_SYNC_UINT_DEP128 latency: 120 clk (0.469 clk/warp)
K_SYNC_UINT_DEP128 latency: 184 clk (0.719 clk/warp)
Running register file test...
Max threads x regs/thread before kernel spawn failure.
[512 x 4 = 2048]
[512 x 8 = 4096]
[512 x 12 = 6144]
[512 x 16 = 8192]
[512 x 20 = 10240]
[512 x 24 = 12288]
[512 x 28 = 14336]
[512 x 32 = 16384]
[512 x 36 = 18432]
[512 x 40 = 20480]
[512 x 44 = 22528]
[512 x 48 = 24576]
[512 x 52 = 26624]
[512 x 56 = 28672]
[512 x 60 = 30720]
[512 x 64 = 32768]
[512 x 68 = 34816]
[512 x 72 = 36864]
[512 x 76 = 38912]
[512 x 80 = 40960]
[512 x 84 = 43008]
[512 x 88 = 45056]
[512 x 92 = 47104]
[512 x 96 = 49152]
[512 x 100 = 51200]
[512 x 104 = 53248]
[512 x 108 = 55296]
[512 x 112 = 57344]
[512 x 116 = 59392]
[512 x 120 = 61440]
[512 x 124 = 63488]
[512 x 128 = 65536] |
|