string - R pattern matching multiple combinations of rows in columns for replacement? -


i'm trying figure out how identify instances of strings 1 column of dataframe on column in same dataframe in order replace. in case have forum postings i've pulled in people reference other users name , want rid of names analyses otherwise count high quantity words. below dput data frame:

structure(list(uber_name = structure(c(9l, 2l, 1l, 2l, 3l, 10l,  3l, 9l, 11l), .label = c("aluber1968", "bigdreamslittlemoney",  "fubernyc", "jamesm", "jonnyplastic", "justdre", "king d", "klimarov",  "nycgirl705", "shumacker", "spike69", "theitalian", "uberman8263",  "ez2dj", "manhmptn", "nycdriver", "staytune", "ubs", "ubured",  "jme10", "lennyyellowcab", "mir", "eagle88", "ibuys4730", "nousername",  "bathotrask", "douglas", "lgc", "jakeinny098", "rustyshackelford",  "shabbyroch", "ubershiza", "drbrkln", "elys123", "bossdriver",  "herbyherb", "jim1985", "malik38", "stidriver", "vxlon7", "waqar",  "tohunt4me", "dogpound", "sulib", "albrklyn", "john cunningham",  "mreeves", "pinkfoot", "alextheboss", "luisannalui", "censoredbythefcc",  "kony", "cieru", "jorlev", "smooth954", "marcusguber", "nyc321",  "tony new jersey", "vanstaal", "bkrah", "brunoamat2", "gebbels6",  "kevin7889", "uanic", "uber og", "uberkilledmymarriage", "ya mon me",  "hunkawestchester", "mr affinito", "ninja warrior", "nononsense",  "notacabdriver", "notauberhater", "twofiddymile", "bilyvh", "cybertec69",  "johnnyblanco", "sobe", "ubernyc"), class = "factor"), uber_write = c("i see people post getting w",  "you have 2 choices either drive", "more year ago didnt drive ",  "yeah stopped driving them for", "ive been getting promotions la",  "fubernyc saidive been getting ", "shumacker saidand feel importan",  "fubernyc saidive been getting ", "they start coming after few months " ), uber_date = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l ), .label = c("jan 19, 2017", "mar 30, 2017", "jan 23, 2017",  "jan 12, 2017", "jan 9, 2017", "jan 1, 2017", "dec 31, 2016",  "nov 26, 2016", "nov 3, 2016", "dec 22, 2016", "dec 13, 2016",  "dec 2, 2016", "nov 15, 2016", "oct 31, 2016", "oct 20, 2016",  "mar 14, 2017", "sep 1, 2016", "jul 26, 2016", "mar 1, 2017",  "feb 25, 2017", "sep 8, 2016", "sep 9, 2016", "apr 21, 2015"), class = "factor")), .names = c("uber_name",  "uber_write", "uber_date"), class = c("data.table", "data.frame" ), row.names = c(na, -9l), .internal.selfref = <pointer: 0x0000000000220788>) 

i've used gsub before can't figure out how apply instance. want take names in "uber_names" column , removes these users of "uber_writes" postings.

you make vector uber_names of user names in data.table (dt) , generate regular expression (name1|name2|name3) replace matching user names "", like:

library(data.table) uber_names <- dt$uber_name dt[, uber_write_filtered := gsub(     pattern = paste0("(", paste(uber_names, collapse = "|"), ")"),     replacement = "", uber_write)]  

Comments

Popular posts from this blog

How to understand 2 main() functions after using uftrace to profile the C++ program? -

c# - Update a combobox from a presenter (MVP) -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -