Getting percentage of character variables and re-group small parts in R -
i have been trying write small piece of code can:
- take character variable ->
- get percentages of possible values taken variable ->
- re-name small percentages "other" instead of original value.
i working in r
, example:
#toy data x: x <-c("other","other","other","","office","other","other", "other","other","sales","","office","other", "mgr","other","other","mgr","","other","office", "other","profexe","mgr","mgr","other") x_freq <- plyr::count(x) names(x_freq) <- c("modality","count") x_freq$prob <- x_freq$count/sum(x_freq$count) small <- x_freq$modality[...]
the ...
saying, if probability not reach level, small
taking variable name , rename "other". code not neat , clean, wonder if there other simpler way code it.
how writing function
small_to_other <- function(x, min.fraction=.05) { counts <- table(x)/length(x) x[x %in% names(counts)[counts<min.fraction]] <- "other" x }
here set default 5% category less 5% gets other. can call
small_to_other(x) # changes "profexe" other
if wanted rid of less 15%, can do
small_to_other(x, .15) # change "profexe", "office" , "" "other"
Comments
Post a Comment