scala - How do I copy 3 different types of files from different directories in 3 separate folders? -
if want know how solved it, go here.
the structure of directories follows. inside data directory have folders named date_latter
. inside, there gz compressed files. need files 1 date(current) end mta.gz , dfr.gz. every date_latter
folder has different number of these files. need copy mta files folder named mta , dfr files dfr folder. prefer remain compressed, if not that's okey too.
data --20170202_a ----20170202a(some_random_string)mta.gz ----20170202a(some_random_string)dfr.gz ----20170202a(some_random_string)crr.gz --20170202_b ----20170202b(some_random_string)mta.gz ----20170202b(some_random_string)dfr.gz ----20170202b(some_random_string)crr.gz --20170202_c --20170203_a --20170203_b --20170203_c --20170201_a --20170201_b --20170201_c
currently have
def main(args: array[string]) = { val conf = new configuration() val sparkconf = new sparkconf() val sc = new sparkcontext(sparkconf) val mta = "mta" val rcr = "dfr" val sub = "sub" val inpath = "/user/comverse/data/20170404_*/*" + mta + ".gz" val fs = filesystem.get( conf ) val frompath = fs.globstatus(new path(inpath)) for(path <- frompath) { sc.textfile(path.getpath.tostring).saveastextfile("/apps/hive/warehouse/adoc.db/fct_evkuzmin/file_" + mta) // println(path.getpath) } sc.stop() }
when try run println(path.getpath) paths mta files, when try copy
org.apache.hadoop.mapred.filealreadyexistsexception: output directory hdfs://nameservice1/apps/hive/warehouse/adoc.db/fct_evkuzmin/file_mta exists
and fct_evkuzmin looks like
--/fct_evkuzmin/file_mta/_temporary/0/_temporary ----many folders named `attempt_201704071536_0000_m_000000_0` empty files inside ------attempt_201704071536_0000_m_000000_0(emty)
in end want
fct_evkuzmin --file_mta ----20170202a(some_random_string)mta.gz ----20170202b(some_random_string)mta.gz fct_evkuzmin --file_dfr ----20170202a(some_random_string)dfr.gz ----20170202b(some_random_string)dfr.gz
Comments
Post a Comment