why tanh has different results in OpenCL and C++ function -


here opencl code.

#include <iostream> #include <cmath> #include <cl/cl.hpp>  int main(){    std::vector<cl::platform> all_platforms;    cl::platform::get(&all_platforms);    cl::platform default_platform=all_platforms[0];     std::vector<cl::device> all_devices;    default_platform.getdevices(cl_device_type_all, &all_devices);     cl::device default_device=all_devices[0];    std::cout<< "using device: "<<default_device.getinfo<cl_device_name>()<<"\n";     cl_context_properties properties[] = { cl_context_platform, (cl_context_properties)(default_platform)(), 0};    cl::context context = cl::context(cl_device_type_all, properties);     cl::program::sources sources;     std::string kernel_code=         "   void __kernel simple_tanh(__global const float *a, __global float *b){       "         "       b[get_global_id(0)]=tanh(a[get_global_id(0)]);                 "         "   }                                                                               ";    sources.push_back({kernel_code.c_str(),kernel_code.length()});     cl::program program(context,sources);    if(program.build({default_device})!=cl_success){        std::cout<<" error building: "<<program.getbuildinfo<cl_program_build_log>(default_device)<<"\n";        exit(1);    }     cl::buffer buffer_a(context,cl_mem_read_write,sizeof(float));    cl::buffer buffer_b(context,cl_mem_read_write,sizeof(float));     float a[1]; a[0] = 0.0595172755420207977294921875000000000000f;     cl::commandqueue queue(context,default_device);     queue.enqueuewritebuffer(buffer_a,cl_true,0,sizeof(float),a);    queue.finish();     cl::kernel kernel=cl::kernel(program,"simple_tanh");    kernel.setarg(0,buffer_a);    kernel.setarg(1,buffer_b);      queue.enqueuendrangekernel(kernel,cl::nullrange,cl::ndrange(1),cl::nullrange);     queue.finish();     float b[1];     queue.enqueuereadbuffer(buffer_b,cl_true,0,sizeof(float),b);      printf("result: %.40f %.40f\n", tanh(a[0]), b[0]);     return 0;  } 

after compile cmd: g++ -std=c++0x hello.cc -lopencl -o hello, , run it. got different results of tanh function.

using device: tahiti result: 0.0594470988394579374913817559900053311139 0.0594470985233783721923828125000000000000

the first cpu result, , second opencl function. 1 should trust?

when kernel unable vectorized compiler(opencl), generated instructions scalar types. then, x87 fpu computes 80 bit. sse has precision more comparable gpu, need float4 or float8 in kernel such compiler can produce sse/avx has closer precision gpu.

generally intel's opencl compiler vectorizes better(for old cpus @ least). implementation using? there can differences between gpus obey rule of not crossing ulp limit. if need more precision gpu(and sse/avx), why not write own series expansion function then? make learning slow faster single fpu @ least.

what cpu? opencl platform using? did check generated codes kernel profiler software or kernel analyzer?

above all, shouldn't this:

 cl::ndrange(1) 

unless it's learning purpose. have %99 kernel launch overhead, %1 data copy overhead , close 0 compute latency. maybe thats why it's using 80-bit fpu instead of sse(on cpu). try computing multiple-of-8 ndrange values or use float8 types in kernel let compiler use vectorized instructions.

when global ndrange value millions, have significant effect on learning time, not leraning iterations needed. if cpu can finish learning in 1 day 1m iterations, maybe gpu can finish in 1-hour if needs 10m iterations. transcandental functions have high compute data ratio speed ratio versus cpu higher if use more of them.

if derive own series expansion function achieve more precision, still faster single cpu core in embarrassingly parallel kernel code.

if neural network has few neurons, maybe can n networks training @ same time , pick best learner(if learning has randomization)? picks better results cpu?


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -