kerberos - Running Spark job to query Hive HBase tables in a Kerberized cluster -


i trying run spark 1.6 job (written in java) on kerberized cluster.

through job trying read data hive table uses hbase storage.

sparkconf conf = new sparkconf(); javasparkcontext context = new javasparkcontext(conf); hivecontext hivecontext = new hivecontext(context.sc()); hivecontext.setconf("hive.exec.dynamic.partition.mode", "nonstrict"); hivecontext.setconf("spark.sql.hive.convertmetastoreorc", "false"); hivecontext.setconf("spark.sql.casesensitive","false"); dataframe df = hivecontext.sql(task.getquery()); df.show(100); 

i using below spark-sumbit command run job on yarn:

spark-submit --master yarn --deploy-mode cluster --class <main class name> --num-executors 2 --executor-cores 1 --executor-memory 1g --driver-memory 1g --jars application.json,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/etc/hbase/conf/hbase-site.xml,/usr/hdp/current/spark-client/conf/hive-site.xml data-refiner-1.0.jar 

i have performed kinit before running job. job able communicate hive meta-store , parse query.

17/04/05 06:15:23 info parsedriver: parsing command: select * <db_name>.<table_name> 17/04/05 06:15:24 info parsedriver: parse completed 

but when trying communicate hbase data failing below exception:

17/04/05 06:15:26 warn abstractrpcclient: exception encountered while connecting server : javax.security.sasl.saslexception: gss initiate failed [caused gssexception: no valid credentials provided (mechanism level: failed find kerberos tgt)] 17/04/05 06:15:26 error abstractrpcclient: sasl authentication failed. cause missing or invalid credentials. consider 'kinit'. javax.security.sasl.saslexception: gss initiate failed [caused gssexception: no valid credentials provided (mechanism level: failed find kerberos tgt)] @ com.sun.security.sasl.gsskerb.gsskrb5client.evaluatechallenge(gsskrb5client.java:211) @ org.apache.hadoop.hbase.security.hbasesaslrpcclient.saslconnect(hbasesaslrpcclient.java:179) @ org.apache.hadoop.hbase.ipc.rpcclientimpl$connection.setupsaslconnection(rpcclientimpl.java:611) @ org.apache.hadoop.hbase.ipc.rpcclientimpl$connection.access$600(rpcclientimpl.java:156) @ org.apache.hadoop.hbase.ipc.rpcclientimpl$connection$2.run(rpcclientimpl.java:737) @ org.apache.hadoop.hbase.ipc.rpcclientimpl$connection$2.run(rpcclientimpl.java:734) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:422) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1724) @ org.apache.hadoop.hbase.ipc.rpcclientimpl$connection.setupiostreams(rpcclientimpl.java:734) @ org.apache.hadoop.hbase.ipc.rpcclientimpl$connection.writerequest(rpcclientimpl.java:887) @ org.apache.hadoop.hbase.ipc.rpcclientimpl$connection.tracedwriterequest(rpcclientimpl.java:856) @ org.apache.hadoop.hbase.ipc.rpcclientimpl.call(rpcclientimpl.java:1199) @ org.apache.hadoop.hbase.ipc.abstractrpcclient.callblockingmethod(abstractrpcclient.java:213) @ org.apache.hadoop.hbase.ipc.abstractrpcclient$blockingrpcchannelimplementation.callblockingmethod(abstractrpcclient.java:287) @ org.apache.hadoop.hbase.protobuf.generated.clientprotos$clientservice$blockingstub.execservice(clientprotos.java:32765) @ org.apache.hadoop.hbase.protobuf.protobufutil.execservice(protobufutil.java:1627) @ org.apache.hadoop.hbase.ipc.regioncoprocessorrpcchannel$1.call(regioncoprocessorrpcchannel.java:104) @ org.apache.hadoop.hbase.ipc.regioncoprocessorrpcchannel$1.call(regioncoprocessorrpcchannel.java:94) @ org.apache.hadoop.hbase.client.rpcretryingcaller.callwithretries(rpcretryingcaller.java:126) @ org.apache.hadoop.hbase.ipc.regioncoprocessorrpcchannel.callexecservice(regioncoprocessorrpcchannel.java:107) @ org.apache.hadoop.hbase.ipc.coprocessorrpcchannel.callblockingmethod(coprocessorrpcchannel.java:73) @ org.apache.hadoop.hbase.protobuf.generated.authenticationprotos$authenticationservice$blockingstub.getauthenticationtoken(authenticationprotos.java:4512) @ org.apache.hadoop.hbase.security.token.tokenutil.obtaintoken(tokenutil.java:86) @ org.apache.hadoop.hbase.security.token.tokenutil$1.run(tokenutil.java:111) @ org.apache.hadoop.hbase.security.token.tokenutil$1.run(tokenutil.java:108) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:422) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1724) @ org.apache.hadoop.hbase.security.user$securehadoopuser.runas(user.java:313) @ org.apache.hadoop.hbase.security.token.tokenutil.obtaintoken(tokenutil.java:108) @ org.apache.hadoop.hbase.security.token.tokenutil.addtokenforjob(tokenutil.java:329) @ org.apache.hadoop.hive.hbase.hbasestoragehandler.addhbasedelegationtoken(hbasestoragehandler.java:496) @ org.apache.hadoop.hive.hbase.hbasestoragehandler.configuretablejobproperties(hbasestoragehandler.java:441) @ org.apache.hadoop.hive.hbase.hbasestoragehandler.configureinputjobproperties(hbasestoragehandler.java:342) @ org.apache.spark.sql.hive.hivetableutil$.configurejobpropertiesforstoragehandler(tablereader.scala:304) @ org.apache.spark.sql.hive.hadooptablereader$.initializelocaljobconffunc(tablereader.scala:323) @ org.apache.spark.sql.hive.hadooptablereader$anonfun$12.apply(tablereader.scala:276) @ org.apache.spark.sql.hive.hadooptablereader$anonfun$12.apply(tablereader.scala:276) @ org.apache.spark.rdd.hadooprdd$anonfun$getjobconf$6.apply(hadooprdd.scala:174) @ org.apache.spark.rdd.hadooprdd$anonfun$getjobconf$6.apply(hadooprdd.scala:174) @ scala.option.map(option.scala:145) @ org.apache.spark.rdd.hadooprdd.getjobconf(hadooprdd.scala:174) @ org.apache.spark.rdd.hadooprdd.getpartitions(hadooprdd.scala:195) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:242) @ org.apache.spark.rdd.rdd$anonfun$partitions$2.apply(rdd.scala:240) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:240) @ org.apache.spark.sql.execution.sparkplan.executetake(sparkplan.scala:190) @ org.apache.spark.sql.execution.limit.executecollect(basicoperators.scala:165) @ org.apache.spark.sql.execution.sparkplan.executecollectpublic(sparkplan.scala:174) @ org.apache.spark.sql.dataframe$anonfun$org$apache$spark$sql$dataframe$execute$1$1.apply(dataframe.scala:1499) @ org.apache.spark.sql.dataframe$anonfun$org$apache$spark$sql$dataframe$execute$1$1.apply(dataframe.scala:1499) @ org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution.scala:56) @ org.apache.spark.sql.dataframe.withnewexecutionid(dataframe.scala:2086) @ org.apache.spark.sql.dataframe.org$apache$spark$sql$dataframe$execute$1(dataframe.scala:1498) @ org.apache.spark.sql.dataframe.org$apache$spark$sql$dataframe$collect(dataframe.scala:1505) @ org.apache.spark.sql.dataframe$anonfun$head$1.apply(dataframe.scala:1375) @ org.apache.spark.sql.dataframe$anonfun$head$1.apply(dataframe.scala:1374) @ org.apache.spark.sql.dataframe.withcallback(dataframe.scala:2099) @ org.apache.spark.sql.dataframe.head(dataframe.scala:1374) @ org.apache.spark.sql.dataframe.take(dataframe.scala:1456) @ org.apache.spark.sql.dataframe.showstring(dataframe.scala:170) @ org.apache.spark.sql.dataframe.show(dataframe.scala:350) @ org.apache.spark.sql.dataframe.show(dataframe.scala:311) @ com.hpe.eap.batch.eapdatarefinermain.main(eapdatarefinermain.java:88) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:497) 

the job runs fine when query normal hive table , on non-kerberized cluster.

kindly suggest if need modify configuration parameter/code changes resolve issue.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -