time - Conditional Summing with multiple columns in R -


i want plot points per user versus time not sure columns in order achieve result. data looks like:

> head(data, n=3) points   user       time 25        1      02/22/2017 0         2      02/26/2017 15        3      02/27/2017  > dput(data) structure(list(points = c(25, 0, 15), user = c(1, 2, 3), time = c("02/22/2017", "02/26/2017", "02/27/2017")), .names = c("points", "user", "time"), row.names = c(na, -3l), class = "data.frame") 

fyi there multiple users ids (i think 15). want sum total points per user (then number in user column corresponds user's id number. , plot values on time (by day specifically).

this code use generate total points per user

library(data.table) ppu = setkey(setdt(df), user_id)[, list(points=sum(points)), by=list(user_id)] 

which gives following result:

enter image description here

but want find total points per user per day! appreciate guidance.

please, try (with df given result of dput() in q):

library(data.table)   # version 1.10.4 used ppu <- setdt(df)[, .(points = sum(points)), = .(user, time)]  ppu #   user       time points #1:    1 02/22/2017     25 #2:    2 02/26/2017      0 #3:    3 02/27/2017     15 

this return user, time in order appear in df. if want have result sorted, have 2 choices:

e.g., printing, use

ppu[order(user, time)] # or ppu[order(time, user)] 

or, if result should keyed, try keyby:

ppu <- setdt(df)[, .(points = sum(points)), keyby = .(user, time)] 

some remarks:

  • your code snippet uses user_id while data sample uses user. also, data sample includes column named time contains dates character strings in text using term "day".
  • by accepts more 1 grouping variable. can create expressions on fly.
  • as simplification, time doesn't need coerced class data long same dates being written same way.
  • in data.table syntax, .() abbreviation list().
  • the recent versions of data.table have lifted requirement set keys.

in this comment, op asked how

to plot amount of points per user vs time (per day).

this requires modfications ppu work better ggplot2.

# coerce user factor discrete colour scale # required here because user given numeric  ppu[, user := factor(user)] # coerce time character date class # nicely scaled x-axis instead of discrete values ppu[, time := lubridate::mdy(time)] 

now, points plotted versus time separate, colour-coded line each user:

library(ggplot2) ggplot(ppu, aes(time, points, group = user, colour = user)) +    geom_point() + geom_line() 

enter image description here

well, see lines here if there enough sample data ...


Comments

Popular posts from this blog

c# - Update a combobox from a presenter (MVP) -

How to understand 2 main() functions after using uftrace to profile the C++ program? -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -