hadoop - To overwrite hive table with an updated file -


i have csv file:

name,age,city,country  sachin,44,pune,india tendulkar,45,mumbai,india sourav,45,new york,usa ganguly,45,chicago,usa 

i created hive table , loaded data it.

i found above file wrong , corrected file below:

name,age,city,country  sachin,44,pune,india tendulkar,45,mumbai,india sourav,45,new jersey,usa ganguly,45,chicago,usa 

i need update main table correct file.

i have tried below approaches.

1- created main table partitioned table on city , dynamically loaded first file.

step1- creating temp table , loading old.csv file without partitioning. step doing insert data in main table dyn dynamically not creating separate input files per partition.

create table temp( name string, age int, city string, country string) row format delimited fields terminated ',' stored textfile; 

step2- loaded old file temporary table.

load data local inpath '/home/test_data/old.csv' table temp;

step3- creating main partitioned table.

create table dyn( name string, age int) partitioned by(city string,country string) row format delimited fields terminated ',' stored textfile; 

step4- inserting dynamically old.csv file partitioned table temporary table.

insert table dyn partition(city,country) select name,age,city,country temp; 

old recorded dynamically inserted main table. in next steps trying correct main table dyn old.csv new.csv

step5- creating temporary table new , correct input file.

create table temp1( name string, age int, city string, country string) row format delimited fields terminated ',' stored textfile; 

step6- loading new , correct input file second temp table used overwrite main table row data wrong in old.csv. sourav,45,new york,usa sourav,45,new jersey,usa.

load data local inpath '/home/test_data/new.csv' table temp1;

overwriting main table row data wrong in old.csv. sourav,45,new york,usa sourav,45,new jersey,usa.

final overwrite step7 attempt 1-

insert overwrite table dyn partition(country='usa' , city='new york') select city,country temp1 t t.city='new jersey' , t.country='usa'; 

result:- inserted null in name column.

new jersey      null    new york        usa 

final overwrite step7 attempt 2-

insert overwrite table dyn partition(country='usa' , city='new york') select name,age temp1 t t.city='new jersey' , t.country='usa'; 

result:- no change in dyn table. same before. new york did not update new jersey

final overwrite step7 attempt3 -

insert overwrite table dyn partition(country='usa' , city='new york') select * temp1 t t.city='new jersey' , t.country='usa'; 

error:- failed: semanticexception [error 10044]: line 1:23 cannot insert target table because column number/types different. table insclause-0 has 2 columns,but query has 4 columns

what correct approach problem.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -