hive - Parquet creation Conversion from pandas dataframe to pyarrow table not working for object dtype -


i want create parquet file csv file. test purposes, i've below piece of code reads file , converts same pandas dataframe first , pyarrow table. table stored on aws s3 , want run hive query on table.

inputfile contents:

year|word 2017|word 1 2018|word 2 

code:

dataframe=pd.read_csv(inputfile, sep='|') print(dataframe) print(dataframe.dtypes) print(dataframe.columns) dataframe['c1'] = dataframe['c1'].astype('str') print(dataframe.dtypes) table=pa.table.from_pandas(dataframe)#,schema=pa.string()) pq.write_table(table, outputfile) 

after writing pyarrow table, queried parquet file make sure data stored in s3. results weird:

+--------+--------------+ | year  |     word     | +--------+--------------+ | 2017   | [b@60716d4f  | | 2018   | [b@36bf8f00  | +--------+--------------+ 

somehow int values show fine, object/str value doesn't converted fine.

appreciate this.

thanks.

this replicated fine me roundtripping. please specify platform & versions of python, pandas , pyarrow

on 3.6 / macox (also worked on 2.7)

in [1]: import pandas pd  in [2]: import pyarrow pa  in [3]: pd.__version__ out[3]: '0.19.2'  in [4]: pa.__version__ out[4]: '0.2.0'  in [5]: data = """year|word    ...: 2017|word 1    ...: 2018|word 2    ...: """  in [6]: df = pd.read_csv(stringio(data), sep='|')  in [7]: df out[7]:     year    word 0  2017  word 1 1  2018  word 2  in [8]: df.dtypes out[8]:  year     int64 word    object dtype: object  in [9]: table=pa.table.from_pandas(df)  in [10]: import pyarrow.parquet pq  in [12]: pq.write_table(table, 'foo.pk')  in [13]: pq.read_table('foo.pk').to_pandas() out[13]:     year    word 0  2017  word 1 1  2018  word 2  in [14]: pq.read_table('foo.pk').to_pandas().dtypes out[14]:  year     int64 word    object dtype: object 

Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -