performance - Speeding up count of pairwise observations in R -


i have dataset subset of measurements each entry randomly missing:

dat <- matrix(runif(100), nrow=10) rownames(dat) <- letters[1:10] colnames(dat) <- paste("time", 1:10) dat[sample(100, 25)] <- na 

i interested in calculating correlations between each row in dataset (i.e., a-a, a-b, a-c, a-d, ...). however, exclude correlations there fewer 5 pairwise non-na observations setting value na in resulting correlation matrix.

currently doing follows:

cor <- cor(t(dat), use = 'pairwise.complete.obs') names <- rownames(dat) filter <- sapply(names, function(x1) sapply(names, function(x2)      sum(!is.na(dat[x1,]) & !is.na(dat[x2,])) < 5)) cor[filter] <- na 

however, operation slow actual dataset contains >1,000 entries.

is there way filter cells based on number of non-na pairwise observations in vectorized manner, instead of within nested loops?

you can count number of non-na pairwise observations using matrix approach.

let's use data generation code. made data larger , added more nas.

nr = 1000; nc = 900; dat = matrix(runif(nr*nc), nrow=nr) rownames(dat) = paste(1:nr) colnames(dat) = paste("time", 1:nc) dat[sample(nr*nc, nr*nc*0.9)] = na 

then filter code taking 85 seconds

tic = proc.time() names = rownames(dat) filter = sapply(names, function(x1) sapply(names, function(x2)      sum(!is.na(dat[x1,]) & !is.na(dat[x2,])) < 5)); toc = proc.time(); show(toc-tic); # 85.50 seconds 

my version creates matrix values 1 non-nas in original data. using matrix multiplication calculate number of pairwise non-nas. ran in fraction of second.

tic = proc.time() namat = matrix(0, nrow = nr, ncol = nc) namat[ !is.na(dat) ] = 1; filter2 = (tcrossprod(namat) < 5) toc = proc.time(); show(toc-tic); # 0.09 seconds 

simple check shows results same:

all(filter == filter2) # true 

Comments

Popular posts from this blog

c# - Update a combobox from a presenter (MVP) -

How to understand 2 main() functions after using uftrace to profile the C++ program? -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -