java - Hadoop Text Object toString() Split Problems -
my hadoop mapper emitting csv line line text object follows:
public void map(object key, text value, context context) throws ioexception, interruptedexception { string datarows[] = value.tostring().trim().split("\\r?\\n"); (int = 0; < datarows.length; i++) { random r = new random(); int partition = r.nextint(3); hc.set(integer.tostring(partition)); data.set(datarows[i]); context.write(hc, data); } } within reducer, need split csv , use strings further action. here reducer code:
public static class intsumreducer extends reducer<text, text, text, text> { private text data = new text(); public void reduce(text key, iterable<text> values, context context) throws ioexception, interruptedexception { list<coord> coords = new arraylist<coord>(); iterator<text> iter = values.iterator(); while (iter.hasnext()) { string[] elems = iter.next().tostring().split(","); double[] x = new double[elems.length]; try { (int = 0; < x.length; i++) { x[i] = integer.parseint(elems[i]); } } catch (exception e) { continue; } coords.add(new coord(x)); } try { cluster cluster = runclusterer(coords); data.set(cluster.tonewick()+";"); context.write(key, data); } catch (ioexception e) { e.printstacktrace(); } } } here strange thing, elems string array has length of 1 , contains left elements of every line of csv.
for e.g - suppose csv contains 2 rows. first row - {1,2} , second row {3,4}
the elems array getting populated {1,3}.
any appreciated.
Comments
Post a Comment