time - Conditional Summing with multiple columns in R -
i want plot points per user versus time not sure columns in order achieve result. data looks like:
> head(data, n=3) points user time 25 1 02/22/2017 0 2 02/26/2017 15 3 02/27/2017 > dput(data) structure(list(points = c(25, 0, 15), user = c(1, 2, 3), time = c("02/22/2017", "02/26/2017", "02/27/2017")), .names = c("points", "user", "time"), row.names = c(na, -3l), class = "data.frame") fyi there multiple users ids (i think 15). want sum total points per user (then number in user column corresponds user's id number. , plot values on time (by day specifically).
this code use generate total points per user
library(data.table) ppu = setkey(setdt(df), user_id)[, list(points=sum(points)), by=list(user_id)] which gives following result:
but want find total points per user per day! appreciate guidance.
please, try (with df given result of dput() in q):
library(data.table) # version 1.10.4 used ppu <- setdt(df)[, .(points = sum(points)), = .(user, time)] ppu # user time points #1: 1 02/22/2017 25 #2: 2 02/26/2017 0 #3: 3 02/27/2017 15 this return user, time in order appear in df. if want have result sorted, have 2 choices:
e.g., printing, use
ppu[order(user, time)] # or ppu[order(time, user)] or, if result should keyed, try keyby:
ppu <- setdt(df)[, .(points = sum(points)), keyby = .(user, time)] some remarks:
- your code snippet uses
user_idwhile data sample usesuser. also, data sample includes column namedtimecontains dates character strings in text using term "day". byaccepts more 1 grouping variable. can create expressions on fly.- as simplification,
timedoesn't need coerced classdatalong same dates being written same way. - in
data.tablesyntax, .()abbreviationlist(). - the recent versions of
data.tablehave lifted requirement set keys.
in this comment, op asked how
to plot amount of points per user vs time (per day).
this requires modfications ppu work better ggplot2.
# coerce user factor discrete colour scale # required here because user given numeric ppu[, user := factor(user)] # coerce time character date class # nicely scaled x-axis instead of discrete values ppu[, time := lubridate::mdy(time)] now, points plotted versus time separate, colour-coded line each user:
library(ggplot2) ggplot(ppu, aes(time, points, group = user, colour = user)) + geom_point() + geom_line() well, see lines here if there enough sample data ...


Comments
Post a Comment