We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
展开前:
#define DOT_A4B16C4(a, b, c) \ { \ c.x += (a.x * b.s0 + a.y * b.s1 + a.z * b.s2 + a.w * b.s3); \ c.y += (a.x * b.s4 + a.y * b.s5 + a.z * b.s6 + a.w * b.s7); \ c.z += (a.x * b.s8 + a.y * b.s9 + a.z * b.sa + a.w * b.sb); \ c.w += (a.x * b.sc + a.y * b.sd + a.z * b.se + a.w * b.sf); \ }
./test_convolution_ocl 32 128 128 32 3 3 1 1 0 [DEBUG] thread 15285 OCLContext 0x589b080390 constructor start [DEBUG] thread 15285 try to dlopen libQUALCOMM_Adreno_650_map.so failed, dlopen failed: library "libQUALCOMM_Adreno_650_map.so" not found, create kernel from source code [DEBUG] thread 15285 gcl_kernel_source 0xb4000074402203c0 constructor [DEBUG] thread 15285 OCLContext 0x589b080390 constructor end [DEBUG] thread 15285 get forward run info from cache fail, try to find best forward run info [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3311 runInfo: ls <0 0 0> executeTime = 2797.056000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3321 runInfo: ls <0 0 0> executeTime = 1689.088000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3331 runInfo: ls <0 0 0> executeTime = 1257.984000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3341 runInfo: ls <0 0 0> executeTime = 1140.992000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3351 runInfo: ls <0 0 0> executeTime = 1051.136000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3361 runInfo: ls <0 0 0> executeTime = 1120.000000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3371 runInfo: ls <0 0 0> executeTime = 1175.040000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <0 0 0> executeTime = 1026.048000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3312 runInfo: ls <0 0 0> executeTime = 2488.832000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3322 runInfo: ls <0 0 0> executeTime = 1725.952000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3332 runInfo: ls <0 0 0> executeTime = 1430.016000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3342 runInfo: ls <0 0 0> executeTime = 1312.000000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3314 runInfo: ls <0 0 0> executeTime = 5136.896000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3324 runInfo: ls <0 0 0> executeTime = 3611.136000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3334 runInfo: ls <0 0 0> executeTime = 3038.976000 us [DEBUG] thread 15285 enqueue_fill_image runInfo: executeTime = 17.920000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_trans_flt_hw_44 runInfo: executeTime = 13.056000 us [DEBUG] thread 15285 DATATRANS>>> enqueue_write_buffer runInfo: executeTime = 77.056000 us [DEBUG] thread 15285 KERNEL>>> unknow_mem_trans_om_nchw_to_nchwc4 runInfo: executeTime = 80.128000 us [DEBUG] thread 15285 Get memory val without allocated, the capacitySize is 0 [DEBUG] thread 15285 Get memory val without allocated, the capacitySize is 0 [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <0 0 0> executeTime = 1022.976000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <0 0 0> executeTime = 1000.960000 us [DEBUG] thread 15285 SELECT LS KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: best ls = 8 1 8 executeTime = 860.928000 us [DEBUG] thread 15285 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <8 1 8> executeTime = 860.160000 us [INFO] thread 15285 min_time = 0.860160 [INFO] thread 15285 max_time = 0.860160 [INFO] thread 15285 avg_time = -0.000000 [DEBUG] thread 15285 KERNEL>>> unknow_mem_trans_im_nchwc4_to_nchw runInfo: executeTime = 140.032000 us [DEBUG] thread 15285 DATATRANS>>> enqueue_read_buffer runInfo: executeTime = 79.104000 us [INFO] thread 15285 16bit, Convolution, (1 32 1 128 128)+(32 32 1 3 3)/(1 1 1 1 0 0 1 1 1 1)=(1 32 1 128 128), TIME 0.860ms, GFLOPS 351.695 abs(diff) >= 1.000000e+00f, number = 0 abs(diff) >= 1.000000e-01f, number = 0 abs(diff) >= 1.000000e-02f, number = 13129 abs(diff) >= 1.000000e-03f, number = 339363 abs(diff) >= 1.000000e-04f, number = 123968 abs(diff) >= 1.000000e-05f, number = 681 abs(diff) >= 0.000000e+00f, number = 47147 maxabs = 0.046875, a = 4.781250, b = 4.828125 @ 357254 maxrel = 10498.046875, a = 0.002625, b = -0.002625 @ 278147 [DEBUG] thread 15285 OCLContext 0x589b080390 deconstructor start [DEBUG] thread 15285 gcl_kernel_source 0xb4000074402203c0 constructor [DEBUG] thread 15285 OCLContext 0x589b080390 deconstructor end
展开后:
#define DOT_A4B16C4(a, b, c) \ { \ c.x += (a.x * b.s0); \ c.x += (a.y * b.s1); \ c.x += (a.z * b.s2); \ c.x += (a.w * b.s3); \ c.y += (a.x * b.s4); \ c.y += (a.y * b.s5); \ c.y += (a.z * b.s6); \ c.y += (a.w * b.s7); \ c.z += (a.x * b.s8); \ c.z += (a.y * b.s9); \ c.z += (a.z * b.sa); \ c.z += (a.w * b.sb); \ c.w += (a.x * b.sc); \ c.w += (a.y * b.sd); \ c.w += (a.z * b.se); \ c.w += (a.w * b.sf); \ }
./test_convolution_ocl 32 128 128 32 3 3 1 1 0 [DEBUG] thread 17343 OCLContext 0x5e124b4390 constructor start [DEBUG] thread 17343 try to dlopen libQUALCOMM_Adreno_650_map.so failed, dlopen failed: library "libQUALCOMM_Adreno_650_map.so" not found, create kernel from source code [DEBUG] thread 17343 gcl_kernel_source 0xb400007ab98203c0 constructor [DEBUG] thread 17343 OCLContext 0x5e124b4390 constructor end [DEBUG] thread 17343 get forward run info from cache fail, try to find best forward run info [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3311 runInfo: ls <0 0 0> executeTime = 2744.832000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3321 runInfo: ls <0 0 0> executeTime = 1667.072000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3331 runInfo: ls <0 0 0> executeTime = 1198.080000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3341 runInfo: ls <0 0 0> executeTime = 1105.920000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3351 runInfo: ls <0 0 0> executeTime = 1036.032000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3361 runInfo: ls <0 0 0> executeTime = 944.896000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3371 runInfo: ls <0 0 0> executeTime = 958.976000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <0 0 0> executeTime = 907.008000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3312 runInfo: ls <0 0 0> executeTime = 2529.024000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3322 runInfo: ls <0 0 0> executeTime = 1652.992000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3332 runInfo: ls <0 0 0> executeTime = 1390.848000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3342 runInfo: ls <0 0 0> executeTime = 1227.008000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3314 runInfo: ls <0 0 0> executeTime = 5095.936000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3324 runInfo: ls <0 0 0> executeTime = 3202.048000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3334 runInfo: ls <0 0 0> executeTime = 2576.896000 us [DEBUG] thread 17343 enqueue_fill_image runInfo: executeTime = 17.920000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_trans_flt_hw_44 runInfo: executeTime = 12.800000 us [DEBUG] thread 17343 DATATRANS>>> enqueue_write_buffer runInfo: executeTime = 68.864000 us [DEBUG] thread 17343 KERNEL>>> unknow_mem_trans_om_nchw_to_nchwc4 runInfo: executeTime = 78.080000 us [DEBUG] thread 17343 Get memory val without allocated, the capacitySize is 0 [DEBUG] thread 17343 Get memory val without allocated, the capacitySize is 0 [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <0 0 0> executeTime = 914.944000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <0 0 0> executeTime = 895.232000 us [DEBUG] thread 17343 SELECT LS KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: best ls = 8 1 8 executeTime = 760.064000 us [DEBUG] thread 17343 KERNEL>>> unknow_conv_direct_sh1_qc_iom_3381 runInfo: ls <8 1 8> executeTime = 768.000000 us [INFO] thread 17343 min_time = 0.768000 [INFO] thread 17343 max_time = 0.768000 [INFO] thread 17343 avg_time = -0.000000 [DEBUG] thread 17343 KERNEL>>> unknow_mem_trans_im_nchwc4_to_nchw runInfo: executeTime = 139.008000 us [DEBUG] thread 17343 DATATRANS>>> enqueue_read_buffer runInfo: executeTime = 77.056000 us [INFO] thread 17343 16bit, Convolution, (1 32 1 128 128)+(32 32 1 3 3)/(1 1 1 1 0 0 1 1 1 1)=(1 32 1 128 128), TIME 0.768ms, GFLOPS 393.899 abs(diff) >= 1.000000e+00f, number = 0 abs(diff) >= 1.000000e-01f, number = 0 abs(diff) >= 1.000000e-02f, number = 7769 abs(diff) >= 1.000000e-03f, number = 349884 abs(diff) >= 1.000000e-04f, number = 118162 abs(diff) >= 1.000000e-05f, number = 814 abs(diff) >= 0.000000e+00f, number = 47659 maxabs = 0.039062, a = -3.292969, b = -3.253906 @ 68999 maxrel = 11718.750000, a = 0.002930, b = -0.002930 @ 386530 [DEBUG] thread 17343 OCLContext 0x5e124b4390 deconstructor start [DEBUG] thread 17343 gcl_kernel_source 0xb400007ab98203c0 constructor [DEBUG] thread 17343 OCLContext 0x5e124b4390 deconstructor end
The text was updated successfully, but these errors were encountered:
./test_convolution_ocl 64 256 256 32 3 3 1 1 0
Sorry, something went wrong.
No branches or pull requests
展开前:
展开后:
The text was updated successfully, but these errors were encountered: