dataframe - "Embedded" data.frame in R. What is it, what is it called, why does it behave the way it does? -
i have following data structure in r:
df <- structure( list( id = c(1l, 2l, 3l, 4l, 5l), var1 = c('a', 'b', 'c', 'd', 'e'), var2 = structure( list( var2a = c('v', 'w', 'x', 'y', 'z'), var2b = c('vv', 'ww', 'xx', 'yy', 'zz')), .names = c('var2a', 'var2b'), row.names = c(na, 5l), class = 'data.frame'), var3 = c('aa', 'bb', 'cc', 'dd', 'ee')), .names = c('id', 'var1', 'var2', 'var3'), row.names = c(na, 5l), class = 'data.frame') # looks this: # id var1 var2.var2a var2.var2b var3 # 1 1 v vv aa # 2 2 b w ww bb # 3 3 c x xx cc # 4 4 d y yy dd # 5 5 e z zz ee
this looks normal data frame, , behaves part; see length
, class
properties of columns below:
class(df) # [1] "data.frame" df[1,] # id var1 var2.var2a var2.var2b var3 # 1 v vv aa dim(df) # [1] 5 4 # 1 less expected due embedded data frame lapply(df, class) # $id # [1] "integer" # # $var1 # [1] "character" # # $var2 # [1] "data.frame" # # $var3 # [1] "character" lapply(df, length) # $id # [1] 5 # # $var1 # [1] 5 # # $var2 # [1] 2 # # $var3 # [1] 5 # str(df) # 'data.frame': 5 obs. of 4 variables: # $ id : int 1 2 3 4 5 # $ var1: chr "a" "b" "c" "d" ... # $ var2:'data.frame': 5 obs. of 2 variables: # ..$ var2a: chr "v" "w" "x" "y" ... # ..$ var2b: chr "vv" "ww" "xx" "yy" ... # $ var3: chr "aa" "bb" "cc" "dd" ...
my questions:
1) this?
i've never come across before. common format of out there? potential use cases?
2) called?
i called "embedded" lack of better word. suggested "nested", don't think that's right, see separate section tidyverse
tibble
s below.
3) why allowed?
i have expected structure
command above fail, because though data.frames lists, each element (column) has same number of elements (rows). rule seems violated in example, var2
has length = 2
(number of columns!). yet, subsetting df
surprisingly succeeds in usual way:
df[3,] # id var1 var2.var2a var2.var2b var3 # 3 3 c x xx cc
what's going on?
i don't think call "nested" structure, terminology used nested data.frames
, behave this:
library(tidyverse) df <- data_frame( x = c(1l, 2l, 3l), nested = list(data_frame(x = c('a', 'b', 'c')), data_frame(x = c('a', 'b', 'c')), data_frame(x = c('d', 'e', 'f')))) unnest(df) # # tibble: 9 × 2 # x x # <int> <chr> # 1 1 # 2 1 b # 3 1 c # 4 2 # 5 2 b # 6 2 c # 7 3 d # 8 3 e # 9 3 f
i think strucutre makes pretty clear
str(df) # 'data.frame': 5 obs. of 4 variables: # $ id : int 1 2 3 4 5 # $ var1: chr "a" "b" "c" "d" ... # $ var2:'data.frame': 5 obs. of 2 variables: # ..$ var2a: chr "v" "w" "x" "y" ... # ..$ var2b: chr "vv" "ww" "xx" "yy" ... # $ var3: chr "aa" "bb" "cc" "dd" ...
it's data.frame column (var2
) contains data.frame. isn't super easy create i'm not quite sure how did isn't technically "illegal" in r.
data.frames can contain matrices , other data.frames. doesn't @ length()
of elements, looks @ dim()
of elements see if has right number of "rows".
i "fix" or expand these data.frames using
fixed <- do.call("data.frame", df)
Comments
Post a Comment