java - Spark: rdd.collect() runs fine but rdd.take(5) throws an exception -


i relatively new spark. installed spark in standalone mode on windows. when try use rdd.take(5), throws following exception:

py4jjavaerror: error occurred while calling z:org.apache.spark.api.python.pythonrdd.runjob. : org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 4.0 failed 1 times, recent failure: lost task 0.0 in stage 4.0 (tid 7, localhost): java.net.socketexception: connection reset peer: socket write error

eventhough, rdd.collect() runs successfully, rdd.take() throws error. can reason happening?

edit: adding code:

import findspark findspark.init("c:\users\polestaremployee\spark-1.6.3-bin-hadoop2.6") pyspark import sparkcontext, sparkconf sc = sparkcontext() data = sc.textfile("train.csv") data.take(20) 

edit2: first rows of data this: (pasting partial part of result got using collect())

[u'animalid,name,datetime,outcometype,outcomesubtype,animaltype,sexuponoutcome,ageuponoutcome,breed,color', u'a671945,hambone,2014-02-12 18:22:00,return_to_owner,,dog,neutered male,1 year,shetland sheepdog mix,brown/white', u'a656520,emily,2013-10-13 12:44:00,euthanasia,suffering,cat,spayed female,1 year,domestic shorthair mix,cream tabby', u'a686464,pearce,2015-01-31 12:28:00,adoption,foster,dog,neutered male,2 years,pit bull mix,blue/white', u'a683430,,2014-07-11 19:09:00,transfer,partner,cat,intact male,3 weeks,domestic shorthair mix,blue cream', u'a667013,,2013-11-15 12:52:00,transfer,partner,dog,neutered male,2 years,lhasa apso/miniature poodle,tan',


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -