Merge two columns in a spark dataset using Java -


i merge 2 columns in apache spark dataset. tried following did not work, can suggest solution?

        dataset<row> df1 = spark.read().json("src/test/resources/datasets/dataset.json");         dataset<row> df1map = df1.select(functions.array("begintime", "endtime"));     df1map.show(); 

the input dataset.json follows:

{"ipsrc":"abc", "ipdst":"def", "begintime": 1, "endtime":1} {"ipsrc":"def", "ipdst":"abc", "begintime": 2, "endtime":2} {"ipsrc":"abc", "ipdst":"def", "begintime": 3, "endtime":3} {"ipsrc":"def", "ipdst":"abc", "begintime": 4, "endtime":4} 

please note doing in java 8

the above results in error follows:

com.fasterxml.jackson.core.jsonparseexception: unrecognized token 'begintime': expecting ('true', 'false' or 'null') @ [source: begintime; line: 1, column: 19]  @ com.fasterxml.jackson.core.jsonparser._constructerror(jsonparser.java:1581) @ com.fasterxml.jackson.core.base.parserminimalbase._reporterror(parserminimalbase.java:533) @ com.fasterxml.jackson.core.json.readerbasedjsonparser._reportinvalidtoken(readerbasedjsonparser.java:2462) @ com.fasterxml.jackson.core.json.readerbasedjsonparser._handleoddvalue(readerbasedjsonparser.java:1621) @ com.fasterxml.jackson.core.json.readerbasedjsonparser.nexttoken(readerbasedjsonparser.java:689) @ com.fasterxml.jackson.databind.objectmapper._initforreading(objectmapper.java:3776) @ com.fasterxml.jackson.databind.objectmapper._readmapandclose(objectmapper.java:3721) @ com.fasterxml.jackson.databind.objectmapper.readvalue(objectmapper.java:2726) @ org.json4s.jackson.jsonmethods$class.parse(jsonmethods.scala:20) @ org.json4s.jackson.jsonmethods$.parse(jsonmethods.scala:50) @ org.apache.spark.sql.types.datatype$.fromjson(datatype.scala:104) @ org.apache.spark.sql.types.datatype.fromjson(datatype.scala) 

the initial table output of df1.show() follows:

+-----+-----+---------+-------+ |ipdst|ipsrc|begintime|endtime| +-----+-----+---------+-------+ |  def|  abc|        1|      1| |  abc|  def|        2|      2| |  def|  abc|        3|      3| |  abc|  def|        4|      4| +-----+-----+---------+-------+ 

the schema of df1 follows:

root  |-- ipdst: string (nullable = true)  |-- ipsrc: string (nullable = true)  |-- begintime: long (nullable = true)  |-- endtime: long (nullable = true) 

if want concatenate 2 columns can :

dataset<row> df1map = df1.select(functions.concat(df1.col("begintime"), df1.col("endtime"))); df1map.show(); 

edit : tried , worked : code :

sparksession spark = sparksession             .builder()             .master("local")             .appname("so")             .getorcreate();  dataset<row> df1 = spark.read().json("src/main/resources/json/dataset.json"); df1.printschema(); df1.show();  dataset<row> df1map = df1.select(functions.array("begintime", "endtime")); df1map.show(); 

Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -