hadoop - Merge large number of spark dataframes into one -


i'm querying cached hive temp table using different queries satisfying different conditions on more 1500 times inside loop. need merge them using unionall inside loop. stackoverflow error due fact spark cannot keep rdd lineage.

pseudo code:

df=[from hive table] tablea=[from hive table] tablea.registertemptable("tablea") hivecontext.sql('cache table tablea')  in range(0,2000):     if (list[0]['column1']=='xyz'):         df1=query tablea         df=df.unionall(df1)     elif ():         df1=query tablea         df=df.unionall(df1)     elif ():         df1=query tablea         df=df.unionall(df1)     elif ():         df1=query tablea         df=df.unionall(df1)     else:         df1=query tablea         df=df.unionall(df1) 

this throws stackoverflow error due rdd lineage becoming hard. tried checkpointing follows:

for in range(0,2000):     if (list[0]['column1']=='xyz'):         df1=query tablea         df=df.unionall(df1)     elif ():         df1=query tablea         df=df.unionall(df1)     else:         df1=query tablea         df=df.unionall(df1)     df.rdd.checkpoint     df = sqlcontext.createdataframe(df.rdd, df.schema) 

i got same error. tried saveastable wanted avoid because of lag in job submission between each hql queries , hive io inside loop. approach worked well.

for in range(0,2000):     if (list[0]['column1']=='xyz'):         df=query tablea         df.write.saveastable('output', mode='append')     elif ():         df=query tablea         df.write.saveastable('output', mode='append')  

i need in avoiding saving dataframe hive inside loop. want merge dfs in manner that's in-memory , efficient. 1 of other options tried insert query result directly temp table error: cannot insert rdd based table.

maybe, temp table result work.

df1="query tablea".registertemptable("result") sqlcontext.sql("insert result query tablea") 

Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -