dataframe - "Embedded" data.frame in R. What is it, what is it called, why does it behave the way it does? -


i have following data structure in r:

df <- structure(   list(     id = c(1l, 2l, 3l, 4l, 5l),     var1 = c('a', 'b', 'c', 'd', 'e'),     var2 = structure(       list(         var2a = c('v', 'w', 'x', 'y', 'z'),         var2b = c('vv', 'ww', 'xx', 'yy', 'zz')),       .names = c('var2a', 'var2b'),       row.names = c(na, 5l),       class = 'data.frame'),     var3 = c('aa', 'bb', 'cc', 'dd', 'ee')),   .names = c('id', 'var1', 'var2', 'var3'),   row.names = c(na, 5l),   class = 'data.frame')  # looks this: #   id var1 var2.var2a var2.var2b var3 # 1  1             v         vv   aa # 2  2    b          w         ww   bb # 3  3    c          x         xx   cc # 4  4    d          y         yy   dd # 5  5    e          z         zz   ee 

this looks normal data frame, , behaves part; see length , class properties of columns below:

class(df) # [1] "data.frame"  df[1,] # id var1 var2.var2a var2.var2b var3 # 1              v         vv   aa  dim(df) # [1] 5 4 # 1 less expected due embedded data frame  lapply(df, class) # $id # [1] "integer" #  # $var1 # [1] "character" #  # $var2 # [1] "data.frame" #  # $var3 # [1] "character"  lapply(df, length) # $id # [1] 5 # # $var1 # [1] 5 # # $var2 # [1] 2 # # $var3 # [1] 5 # str(df)  # 'data.frame': 5 obs. of  4 variables: #   $ id  : int  1 2 3 4 5 # $ var1: chr  "a" "b" "c" "d" ... # $ var2:'data.frame':  5 obs. of  2 variables: #   ..$ var2a: chr  "v" "w" "x" "y" ... # ..$ var2b: chr  "vv" "ww" "xx" "yy" ... # $ var3: chr  "aa" "bb" "cc" "dd" ... 

my questions:

1) this?

i've never come across before. common format of out there? potential use cases?

2) called?

i called "embedded" lack of better word. suggested "nested", don't think that's right, see separate section tidyverse tibbles below.

3) why allowed?

i have expected structure command above fail, because though data.frames lists, each element (column) has same number of elements (rows). rule seems violated in example, var2 has length = 2 (number of columns!). yet, subsetting df surprisingly succeeds in usual way:

df[3,] #   id var1 var2.var2a var2.var2b var3 # 3  3    c          x         xx   cc 

what's going on?


i don't think call "nested" structure, terminology used nested data.frames , behave this:

library(tidyverse) df <- data_frame(   x = c(1l, 2l, 3l),   nested = list(data_frame(x = c('a', 'b', 'c')),                  data_frame(x = c('a', 'b', 'c')),                  data_frame(x = c('d', 'e', 'f')))) unnest(df) # # tibble: 9 × 2 #       x     x #   <int> <chr> # 1     1     # 2     1     b # 3     1     c # 4     2     # 5     2     b # 6     2     c # 7     3     d # 8     3     e # 9     3     f 

i think strucutre makes pretty clear

str(df) # 'data.frame':   5 obs. of  4 variables: #  $ id  : int  1 2 3 4 5 #  $ var1: chr  "a" "b" "c" "d" ... #  $ var2:'data.frame':   5 obs. of  2 variables: #   ..$ var2a: chr  "v" "w" "x" "y" ... #   ..$ var2b: chr  "vv" "ww" "xx" "yy" ... #  $ var3: chr  "aa" "bb" "cc" "dd" ... 

it's data.frame column (var2) contains data.frame. isn't super easy create i'm not quite sure how did isn't technically "illegal" in r.

data.frames can contain matrices , other data.frames. doesn't @ length() of elements, looks @ dim() of elements see if has right number of "rows".

i "fix" or expand these data.frames using

fixed <- do.call("data.frame", df) 

Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -