r - Using a tibble to apply linear models by group to new data -
let's have 2 datasets same group of irises on 2 years:
# create data reproducible results. iris.2007 <- iris iris.2008 <- iris iris.2008[1:4] <- 2*iris.2008[1:4] # let's make 2008 data different
i fit separate linear model each species in 2007 data, can this:
# first nest species. iris.2007.nested <- iris.2007 %>% group_by(species) %>% nest() # apply linear model call group using data. iris.2007.nested <- iris.2007.nested %>% mutate(models = map(data, ~ lm(petal.length ~ petal.width, data = .)))
when @ results, make sense nicely-organized tibble.
head(iris.2007.nested) # tibble: 3 × 3 species data models <fctr> <list> <list> 1 setosa <tibble [50 × 4]> <s3: lm> 2 versicolor <tibble [50 × 4]> <s3: lm> 3 virginica <tibble [50 × 4]> <s3: lm>
now let's same thing 2008 data.
# first nest species. iris.2008.nested <- iris.2008 %>% group_by(species) %>% nest() # apply linear model call species using data. iris.2008.nested <- iris.2008.nested %>% mutate(models = map(data, ~ lm(petal.length ~ petal.width, data = .)))
again, end nice tibble.
head(iris.2008.nested) # tibble: 3 × 3 species data models <fctr> <list> <list> 1 setosa <tibble [50 × 4]> <s3: lm> 2 versicolor <tibble [50 × 4]> <s3: lm> 3 virginica <tibble [50 × 4]> <s3: lm>
now use linear models 2008 data predict results using 2007 data. thinking best way combine 2 datasets (retaining group structure), here happens when try merge 2 nested tibbles:
iris.both.nested <- merge(iris.2007.nested, iris.2008.nested, by='species')
as can see below, tibble no longer seems have same format individual tibbles above. specifically, organization hard discern (note not including full output in chunk, idea).
head(iris.both.nested) species 1 setosa 2 versicolor 3 virginica data.x 1 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, ... ... <truncated> 1 1.327563, 0.5464903, -0.03686145, -0.03686145, -0.1368614, 0.06313855, ...
and although can still apparently use models fitted 2008 data (as models.y) data 2007 (as data.x):
iris.both.nested.pred <- iris.both.nested %>% mutate( pred = map2(models.y, data.x, predict))
the result again not nicely-organized tibble: (again not showing full output)
head(iris.both.nested.pred) species 1 setosa 2 versicolor 3 virginica data.x 1 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, ... ... <truncated> 1 1.327563, 0.5464903, -0.03686145, -0.03686145, -0.1368614, ...
so question -- process working though tibbles become strangely organized after merge? or missing something? thanks!
i double nest first , apply models later
# data iris.2007 <- iris iris.2008 <- iris iris.2008[1:4] <- 2*iris.2008[1:4] joined<-bind_rows( cbind(dset=rep("iris.2007",length(iris.2007$species)),iris.2007) ,cbind(dset=rep("iris.2008",length(iris.2008$species)),iris.2008) ) # double nesting joined_nested<- joined %>% group_by(dset) %>% nest(.key=data1) %>% mutate(data1 = map(data1, ~.x %>% group_by(species) %>% nest)) # apply linear model call group using data. joined_nested_models<- joined_nested %>% mutate(data1 = map(data1, ~.x %>% mutate(models = map(data, ~ lm(petal.length ~ petal.width, data = .))) )) joined_nested_models %>% unnest # # tibble: 6 × 4 # dset species data models # <chr> <fctr> <list> <list> # 1 iris.2007 setosa <tibble [50 × 4]> <s3: lm> # 2 iris.2007 versicolor <tibble [50 × 4]> <s3: lm> # 3 iris.2007 virginica <tibble [50 × 4]> <s3: lm> # 4 iris.2008 setosa <tibble [50 × 4]> <s3: lm> # 5 iris.2008 versicolor <tibble [50 × 4]> <s3: lm> # 6 iris.2008 virginica <tibble [50 × 4]> <s3: lm>
which tidier version of inner_join
Comments
Post a Comment