Spark standalone cluster tires to access local python.exe -


when executing python application on spark cluster run following exception:

using spark's default log4j profile: org/apache/spark/log4j-defaults.properties setting default log level "warn". adjust logging level use sc.setloglevel(newlevel). sparkr, use setloglevel(newlevel). 17/04/07 10:57:01 warn nativecodeloader: unable load native-hadoop library platform... using builtin-java classes applicable [stage 0:>                                                          (0 + 2) / 2]17/04/07 10:57:07 warn tasksetmanager: lost task 0.0 in stage 0.0 (tid 0, 192.168.2.113, executor 0): java.io.ioexception: cannot run program "c:\users\<local-user>\anaconda2\python.exe": createprocess error=2, system cannot find file specified     @ java.lang.processbuilder.start(processbuilder.java:1048)     @ org.apache.spark.api.python.pythonworkerfactory.createsimpleworker(pythonworkerfactory.scala:120)     @ org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:67)     @ org.apache.spark.sparkenv.createpythonworker(sparkenv.scala:116)     @ org.apache.spark.api.python.pythonrunner.compute(pythonrdd.scala:128)     @ org.apache.spark.api.python.pythonrdd.compute(pythonrdd.scala:63)     @ org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:323)     @ org.apache.spark.rdd.rdd.iterator(rdd.scala:287)     @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:87)     @ org.apache.spark.scheduler.task.run(task.scala:99)     @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:282)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617)     @ java.lang.thread.run(thread.java:745) caused by: java.io.ioexception: createprocess error=2, system cannot find file specified     @ java.lang.processimpl.create(native method)     @ java.lang.processimpl.<init>(processimpl.java:386)     @ java.lang.processimpl.start(processimpl.java:137)     @ java.lang.processbuilder.start(processbuilder.java:1029)     ... 13 more  17/04/07 10:57:07 error tasksetmanager: task 0 in stage 0.0 failed 4 times; aborting job traceback (most recent call last):   file "<redacted-absolute-path>.py", line 49, in <module>     eco_rdd   file "c:\spark\python\pyspark\rdd.py", line 2139, in zipwithindex     nums = self.mappartitions(lambda it: [sum(1 in it)]).collect()   file "c:\spark\python\pyspark\rdd.py", line 809, in collect     port = self.ctx._jvm.pythonrdd.collectandserve(self._jrdd.rdd())   file "c:\spark\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__   file "c:\spark\python\pyspark\sql\utils.py", line 63, in deco     return f(*a, **kw)   file "c:\spark\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value py4j.protocol.py4jjavaerror: error occurred while calling z:org.apache.spark.api.python.pythonrdd.collectandserve. : org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 4 times, recent failure: lost task 0.3 in stage 0.0 (tid 6, 192.168.2.113, executor 0): java.io.ioexception: cannot run program "c:\users\<local-user>\anaconda2\python.exe": createprocess error=2, system cannot find file specified     @ java.lang.processbuilder.start(processbuilder.java:1048)     @ org.apache.spark.api.python.pythonworkerfactory.createsimpleworker(pythonworkerfactory.scala:120)     @ org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:67)     @ org.apache.spark.sparkenv.createpythonworker(sparkenv.scala:116)     @ org.apache.spark.api.python.pythonrunner.compute(pythonrdd.scala:128)     @ org.apache.spark.api.python.pythonrdd.compute(pythonrdd.scala:63)     @ org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:323)     @ org.apache.spark.rdd.rdd.iterator(rdd.scala:287)     @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:87)     @ org.apache.spark.scheduler.task.run(task.scala:99)     @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:282)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617)     @ java.lang.thread.run(thread.java:745) caused by: java.io.ioexception: createprocess error=2, system cannot find file specified     @ java.lang.processimpl.create(native method)     @ java.lang.processimpl.<init>(processimpl.java:386)     @ java.lang.processimpl.start(processimpl.java:137)     @ java.lang.processbuilder.start(processbuilder.java:1029)     ... 13 more  driver stacktrace:     @ org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$$failjobandindependentstages(dagscheduler.scala:1435)     @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1423)     @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1422)     @ scala.collection.mutable.resizablearray$class.foreach(resizablearray.scala:59)     @ scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:48)     @ org.apache.spark.scheduler.dagscheduler.abortstage(dagscheduler.scala:1422)     @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:802)     @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:802)     @ scala.option.foreach(option.scala:257)     @ org.apache.spark.scheduler.dagscheduler.handletasksetfailed(dagscheduler.scala:802)     @ org.apache.spark.scheduler.dagschedulereventprocessloop.doonreceive(dagscheduler.scala:1650)     @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1605)     @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1594)     @ org.apache.spark.util.eventloop$$anon$1.run(eventloop.scala:48)     @ org.apache.spark.scheduler.dagscheduler.runjob(dagscheduler.scala:628)     @ org.apache.spark.sparkcontext.runjob(sparkcontext.scala:1918)     @ org.apache.spark.sparkcontext.runjob(sparkcontext.scala:1931)     @ org.apache.spark.sparkcontext.runjob(sparkcontext.scala:1944)     @ org.apache.spark.sparkcontext.runjob(sparkcontext.scala:1958)     @ org.apache.spark.rdd.rdd$$anonfun$collect$1.apply(rdd.scala:935)     @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:151)     @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:112)     @ org.apache.spark.rdd.rdd.withscope(rdd.scala:362)     @ org.apache.spark.rdd.rdd.collect(rdd.scala:934)     @ org.apache.spark.api.python.pythonrdd$.collectandserve(pythonrdd.scala:453)     @ org.apache.spark.api.python.pythonrdd.collectandserve(pythonrdd.scala)     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)     @ java.lang.reflect.method.invoke(method.java:498)     @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:244)     @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:357)     @ py4j.gateway.invoke(gateway.java:280)     @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:132)     @ py4j.commands.callcommand.execute(callcommand.java:79)     @ py4j.gatewayconnection.run(gatewayconnection.java:214)     @ java.lang.thread.run(thread.java:745) caused by: java.io.ioexception: cannot run program "c:\users\<local-user>\anaconda2\python.exe": createprocess error=2, system cannot find file specified     @ java.lang.processbuilder.start(processbuilder.java:1048)     @ org.apache.spark.api.python.pythonworkerfactory.createsimpleworker(pythonworkerfactory.scala:120)     @ org.apache.spark.api.python.pythonworkerfactory.create(pythonworkerfactory.scala:67)     @ org.apache.spark.sparkenv.createpythonworker(sparkenv.scala:116)     @ org.apache.spark.api.python.pythonrunner.compute(pythonrdd.scala:128)     @ org.apache.spark.api.python.pythonrdd.compute(pythonrdd.scala:63)     @ org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:323)     @ org.apache.spark.rdd.rdd.iterator(rdd.scala:287)     @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:87)     @ org.apache.spark.scheduler.task.run(task.scala:99)     @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:282)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617)     ... 1 more caused by: java.io.ioexception: createprocess error=2, system cannot find file specified     @ java.lang.processimpl.create(native method)     @ java.lang.processimpl.<init>(processimpl.java:386)     @ java.lang.processimpl.start(processimpl.java:137)     @ java.lang.processbuilder.start(processbuilder.java:1029)     ... 13 more 

somehow cluster (on remote pc in same network) tries access local python (that installed on local workstation executes driver):

caused by: java.io.ioexception: cannot run program "c:\users\<local-user>\anaconda2\python.exe": createprocess error=2, system cannot find file specified 
  • spark 2.1.0
  • the spark standalone cluster running on windows 10
  • the workstation running on windows 7
  • connecting cluster , executing tasks spark-shell (interactive) works without problems
  • connecting cluster , executing tasks pyspark (interactive) works without problems
  • running pycharm directly caused exception above
  • using spark-submit execute caused similar problem (=> trying access local python)
  • using findspark
  • the pyspark_python , pyspark_driver_python environment variables set
  • python on cluster , workstation: python 2.7.13 :: anaconda custom (64-bit)

thank in advance help

so found solution problem now. setup problem findspark library. removing allowed me run tasks on cluster.

if have similar problem it's not findspark related point comments samson scharfrichter above.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -