python - Keep element data when extracting sessions -
similarly top wikipedia sessions example have following test data
edits = [ json.dumps({'timestamp': 0, 'username': 'user1', 'action': 'a'}), json.dumps({'timestamp': 1, 'username': 'user1', 'action': 'b'}), json.dumps({'timestamp': 20, 'username': 'user1', 'action': 'a'}), json.dumps({'timestamp': 132, 'username': 'user2', 'action': 'a'}), json.dumps({'timestamp': 500, 'username': 'user2', 'action': 'b'}), json.dumps({'timestamp': 3601, 'username': 'user2', 'action': 'b'}), json.dumps({'timestamp': 3602, 'username': 'user2', 'action': 'a'}), json.dumps({'timestamp': 8004, 'username': 'user2', 'action': 'a'}), json.dumps({'timestamp': 9320, 'username': 'user1', 'action': 'b'}) ]
i split dataset sessions per username
, each user session count user actions. previous dataset , 1 hour max gap (3600 seconds), want following result:
expected = [ 'user1 : [0.0, 3620.0), a: 2, b: 1', 'user2 : [132.0, 7202.0), a: 2, b: 2', 'user2 : [8004.0, 11604.0), a: 1, b: 0', 'user1 : [9320.0, 12920.0), a: 0, b: 1', ]
contrary wikipedia sessions example need keep complete element data , not key in order use within custom combiner function.
you should able write combinefn
counts number of actions of each type, using dictionary of counts accumulator. then, can use session windows in collection keyed user id combiner.
see beam programming guide section on combine fns ideas on how write one.
Comments
Post a Comment