mapreduce - Hadoop MR - HLL : Union of two different size HLL is throwing exception -
i'm using hll in mapreduce code. in combiner side, hll size of log2m=8, registerwidth=8 used store data. on reducer side, hll size of log2m=26, registerwidth=8 used union map side hll. i'm getting exception when trying union. 2 questions are, 1) possible union 2 different size hll? 2) if answer question 1 yes, missing?
combiner: hll maphll = new hll(8,8) ; maphll.addraw(murmurhash64(userid)); valueobject.sethll(maphll); reducer: hll reducehll = new hll(26,8) ; for(iterator itr : values){ reduerhll.union(itr.gethll()); }
also, i'm using hll tobytes & frombytes method serializing hll data.
below exception stack trace, 17/04/06 10:56:20 info mapreduce.job: task id : attempt_1490152294761_13920_r_000040_1, status : failed error: java.lang.arrayindexoutofboundsexception: 4512149 @ net.agkn.hll.util.bitvector.setmaxregister(bitvector.java:201) @ net.agkn.hll.hll.addrawprobabilistic(hll.java:466) @ net.agkn.hll.hll.addraw(hll.java:373) @ net.agkn.hll.hll.heterogenousunion(hll.java:747) @ net.agkn.hll.hll.union(hll.java:634) @ com.yumecorp.yfa.yfaimsuniqueviewerhllmr$yfaimsuniqueviewerhllreducer.reduce(yfaimsuniqueviewerhllmr.java:338) @ com.yumecorp.yfa.yfaimsuniqueviewerhllmr$yfaimsuniqueviewerhllreducer.reduce(yfaimsuniqueviewerhllmr.java:1) @ org.apache.hadoop.mapreduce.reducer.run(reducer.java:171) @ org.apache.hadoop.mapred.reducetask.runnewreducer(reducetask.java:627) @ org.apache.hadoop.mapred.reducetask.run(reducetask.java:389) @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:164) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1707) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:158
let me know if additional information required flavor of issue.
Comments
Post a Comment