dataframe - match a sentence with a sentence in R? -
i have 2 data frame occupation , data. want match each occupation in data occupation , assign coressponding class adding column in occupation dataframe.
occupation <- c("i civil engineer human being", "graphic designer late", "architect profession", "sales manager bank", "love profession of professor", "na") occupation <- data.frame(occupation) data <- data.frame(class = c("engineers","designer","artist","designer","poetry""banker , prof"), occupation = c("civil engineer", "graphic designer", "painter","poetry","architect(prof)", "sales manager bank"))
i want output this
occupation class civil engineer human being engineers painter architect poetry artists graphic designer late designers architect painter profession architect sales manager bank banker , prof love profession of professor na na na
i tried responding anything
occupation$value <- sapply(data$occupation, grepl, x = occupation)
i don't know how complex data is, useful low complex strings. use agrep
function allow set tolerance parameter can match no-equal strings:
occupation <- data.frame(occupation = c("i civil engineer human being", "graphic designer late", "architect profession", "sales manager bank"), stringsasfactors = false) data <- data.frame(class = c("engineers","designer","architect","banker , prof"), occupation = c("civil engineer", "graphic designer", "architect(prof)", "sales manager bank"), stringsasfactors = false) occupation$value <- sapply(occupation$occupation, function(x) { match.class <- sapply(data$class, function(y) agrep(y, x, max.distance = 0.2)) data$class[which(match.class == 1)] } )
if rise max.distance
can detect last text previos strings too.
occupation value 1 civil engineer human being civil engineer 2 graphic designer late graphic designer 3 architect profession architect(prof) 4 sales manager bank
a second option match every word, case 'i civil engineer human being' words 'i' , 'am' match everything.
occupation$value <- sapply(occupation$occupation, function(x) { match.class <- sapply(data$class, function(y) { any(sapply(strsplit(x, ' ')[[1]], function(z) any(agrep(z, y, max.distance = 0.2)))) }) data$class[which(match.class)] } )
so result ...
occupation value 1 civil engineer human being civil engineer, graphic designer, architect(prof), sales manager bank 2 graphic designer late graphic designer 3 architect profession architect(prof) 4 sales manager bank sales manager bank
Comments
Post a Comment