python - How to convert a PySpark RDD to a Dataframe with unknown columns? -


i creating rdd loading data text file in pyspark. want convert rdd dataframe not know how many , columns present in rdd. trying use createdataframe() , syntax shown sqldataframe = sqlcontext.createdataframe(rdd, schema). tried see how create schema of examples show hardcoded schema creation example. since not know columns how can convert rdd dataframe? here code far:

from pyspark import sqlcontext sqlcontext = sqlcontext(sc)  example_rdd = sc.textfile("\..\file1.csv")                .map(lambda line: line.split(","))   #convert rdd dataframe # df = sc.createdataframe() # dataframe conversion here. 

note 1: reason not know columns because trying create general script can create dataframe rdd read file number of columns.

note 2: know there function called todf() can convert rdd dataframe wuth have same issue how pass unknown columns.

note3: file format not csv file. have shown example can file of format

spark 2.0.0 onwards supports reading csv directly dataframe. in order read csv, use dataframereader.csv method

df = spark.read.csv("\..\file1.csv", header=true) 

in case, if not have access spark object, can use,

from pyspark import sqlcontext sqlcontext = sqlcontext(sc) df = sqlcontext.read.csv("\..\file1.csv", header=true) 

in case file has different separator, can specify too.

# eg if separator :: df = spark.read.csv("\..\file1.csv", head=true,sep="::") 

Comments

Popular posts from this blog

How to understand 2 main() functions after using uftrace to profile the C++ program? -

c# - Update a combobox from a presenter (MVP) -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -