Merge two columns in a spark dataset using Java -
i merge 2 columns in apache spark dataset. tried following did not work, can suggest solution?
dataset<row> df1 = spark.read().json("src/test/resources/datasets/dataset.json"); dataset<row> df1map = df1.select(functions.array("begintime", "endtime")); df1map.show();
the input dataset.json follows:
{"ipsrc":"abc", "ipdst":"def", "begintime": 1, "endtime":1} {"ipsrc":"def", "ipdst":"abc", "begintime": 2, "endtime":2} {"ipsrc":"abc", "ipdst":"def", "begintime": 3, "endtime":3} {"ipsrc":"def", "ipdst":"abc", "begintime": 4, "endtime":4}
please note doing in java 8
the above results in error follows:
com.fasterxml.jackson.core.jsonparseexception: unrecognized token 'begintime': expecting ('true', 'false' or 'null') @ [source: begintime; line: 1, column: 19] @ com.fasterxml.jackson.core.jsonparser._constructerror(jsonparser.java:1581) @ com.fasterxml.jackson.core.base.parserminimalbase._reporterror(parserminimalbase.java:533) @ com.fasterxml.jackson.core.json.readerbasedjsonparser._reportinvalidtoken(readerbasedjsonparser.java:2462) @ com.fasterxml.jackson.core.json.readerbasedjsonparser._handleoddvalue(readerbasedjsonparser.java:1621) @ com.fasterxml.jackson.core.json.readerbasedjsonparser.nexttoken(readerbasedjsonparser.java:689) @ com.fasterxml.jackson.databind.objectmapper._initforreading(objectmapper.java:3776) @ com.fasterxml.jackson.databind.objectmapper._readmapandclose(objectmapper.java:3721) @ com.fasterxml.jackson.databind.objectmapper.readvalue(objectmapper.java:2726) @ org.json4s.jackson.jsonmethods$class.parse(jsonmethods.scala:20) @ org.json4s.jackson.jsonmethods$.parse(jsonmethods.scala:50) @ org.apache.spark.sql.types.datatype$.fromjson(datatype.scala:104) @ org.apache.spark.sql.types.datatype.fromjson(datatype.scala)
the initial table output of df1.show() follows:
+-----+-----+---------+-------+ |ipdst|ipsrc|begintime|endtime| +-----+-----+---------+-------+ | def| abc| 1| 1| | abc| def| 2| 2| | def| abc| 3| 3| | abc| def| 4| 4| +-----+-----+---------+-------+
the schema of df1 follows:
root |-- ipdst: string (nullable = true) |-- ipsrc: string (nullable = true) |-- begintime: long (nullable = true) |-- endtime: long (nullable = true)
if want concatenate 2 columns can :
dataset<row> df1map = df1.select(functions.concat(df1.col("begintime"), df1.col("endtime"))); df1map.show();
edit : tried , worked : code :
sparksession spark = sparksession .builder() .master("local") .appname("so") .getorcreate(); dataset<row> df1 = spark.read().json("src/main/resources/json/dataset.json"); df1.printschema(); df1.show(); dataset<row> df1map = df1.select(functions.array("begintime", "endtime")); df1map.show();
Comments
Post a Comment