hadoop - To overwrite hive table with an updated file -
i have csv file:
name,age,city,country sachin,44,pune,india tendulkar,45,mumbai,india sourav,45,new york,usa ganguly,45,chicago,usa
i created hive table , loaded data it.
i found above file wrong , corrected file below:
name,age,city,country sachin,44,pune,india tendulkar,45,mumbai,india sourav,45,new jersey,usa ganguly,45,chicago,usa
i need update main table correct file.
i have tried below approaches.
1- created main table partitioned table on city , dynamically loaded first file.
step1- creating temp table , loading old.csv file without partitioning. step doing insert data in main table dyn dynamically not creating separate input files per partition.
create table temp( name string, age int, city string, country string) row format delimited fields terminated ',' stored textfile;
step2- loaded old file temporary table.
load data local inpath '/home/test_data/old.csv' table temp;
step3- creating main partitioned table.
create table dyn( name string, age int) partitioned by(city string,country string) row format delimited fields terminated ',' stored textfile;
step4- inserting dynamically old.csv file partitioned table temporary table.
insert table dyn partition(city,country) select name,age,city,country temp;
old recorded dynamically inserted main table. in next steps trying correct main table dyn
old.csv new.csv
step5- creating temporary table new , correct input file.
create table temp1( name string, age int, city string, country string) row format delimited fields terminated ',' stored textfile;
step6- loading new , correct input file second temp table used overwrite main table row data wrong in old.csv. sourav,45,new york,usa
sourav,45,new jersey,usa
.
load data local inpath '/home/test_data/new.csv' table temp1;
overwriting main table row data wrong in old.csv. sourav,45,new york,usa
sourav,45,new jersey,usa
.
final overwrite step7 attempt 1-
insert overwrite table dyn partition(country='usa' , city='new york') select city,country temp1 t t.city='new jersey' , t.country='usa';
result:- inserted null in name column.
new jersey null new york usa
final overwrite step7 attempt 2-
insert overwrite table dyn partition(country='usa' , city='new york') select name,age temp1 t t.city='new jersey' , t.country='usa';
result:- no change in dyn table. same before. new york did not update new jersey
final overwrite step7 attempt3 -
insert overwrite table dyn partition(country='usa' , city='new york') select * temp1 t t.city='new jersey' , t.country='usa';
error:- failed: semanticexception [error 10044]: line 1:23 cannot insert target table because column number/types different. table insclause-0 has 2 columns,but query has 4 columns
what correct approach problem.
Comments
Post a Comment