scala - How do I copy 3 different types of files from different directories in 3 separate folders? -


if want know how solved it, go here.

the structure of directories follows. inside data directory have folders named date_latter. inside, there gz compressed files. need files 1 date(current) end mta.gz , dfr.gz. every date_latter folder has different number of these files. need copy mta files folder named mta , dfr files dfr folder. prefer remain compressed, if not that's okey too.

data --20170202_a ----20170202a(some_random_string)mta.gz ----20170202a(some_random_string)dfr.gz ----20170202a(some_random_string)crr.gz --20170202_b ----20170202b(some_random_string)mta.gz ----20170202b(some_random_string)dfr.gz ----20170202b(some_random_string)crr.gz --20170202_c --20170203_a --20170203_b --20170203_c --20170201_a --20170201_b --20170201_c 

currently have

  def main(args: array[string]) = {     val conf = new configuration()     val sparkconf = new sparkconf()     val sc = new sparkcontext(sparkconf)      val mta = "mta"     val rcr = "dfr"     val sub = "sub"      val inpath = "/user/comverse/data/20170404_*/*" + mta + ".gz"  val fs = filesystem.get( conf ) val frompath = fs.globstatus(new path(inpath)) for(path <- frompath) {       sc.textfile(path.getpath.tostring).saveastextfile("/apps/hive/warehouse/adoc.db/fct_evkuzmin/file_" + mta) //      println(path.getpath)     }     sc.stop()   } 

when try run println(path.getpath) paths mta files, when try copy

org.apache.hadoop.mapred.filealreadyexistsexception: output directory hdfs://nameservice1/apps/hive/warehouse/adoc.db/fct_evkuzmin/file_mta exists 

and fct_evkuzmin looks like

--/fct_evkuzmin/file_mta/_temporary/0/_temporary ----many folders named `attempt_201704071536_0000_m_000000_0` empty files inside ------attempt_201704071536_0000_m_000000_0(emty) 

in end want

fct_evkuzmin --file_mta ----20170202a(some_random_string)mta.gz ----20170202b(some_random_string)mta.gz fct_evkuzmin --file_dfr ----20170202a(some_random_string)dfr.gz ----20170202b(some_random_string)dfr.gz 


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -