string - R pattern matching multiple combinations of rows in columns for replacement? -

January 15, 2014

i'm trying figure out how identify instances of strings 1 column of dataframe on column in same dataframe in order replace. in case have forum postings i've pulled in people reference other users name , want rid of names analyses otherwise count high quantity words. below dput data frame:

structure(list(uber_name = structure(c(9l, 2l, 1l, 2l, 3l, 10l,  3l, 9l, 11l), .label = c("aluber1968", "bigdreamslittlemoney",  "fubernyc", "jamesm", "jonnyplastic", "justdre", "king d", "klimarov",  "nycgirl705", "shumacker", "spike69", "theitalian", "uberman8263",  "ez2dj", "manhmptn", "nycdriver", "staytune", "ubs", "ubured",  "jme10", "lennyyellowcab", "mir", "eagle88", "ibuys4730", "nousername",  "bathotrask", "douglas", "lgc", "jakeinny098", "rustyshackelford",  "shabbyroch", "ubershiza", "drbrkln", "elys123", "bossdriver",  "herbyherb", "jim1985", "malik38", "stidriver", "vxlon7", "waqar",  "tohunt4me", "dogpound", "sulib", "albrklyn", "john cunningham",  "mreeves", "pinkfoot", "alextheboss", "luisannalui", "censoredbythefcc",  "kony", "cieru", "jorlev", "smooth954", "marcusguber", "nyc321",  "tony new jersey", "vanstaal", "bkrah", "brunoamat2", "gebbels6",  "kevin7889", "uanic", "uber og", "uberkilledmymarriage", "ya mon me",  "hunkawestchester", "mr affinito", "ninja warrior", "nononsense",  "notacabdriver", "notauberhater", "twofiddymile", "bilyvh", "cybertec69",  "johnnyblanco", "sobe", "ubernyc"), class = "factor"), uber_write = c("i see people post getting w",  "you have 2 choices either drive", "more year ago didnt drive ",  "yeah stopped driving them for", "ive been getting promotions la",  "fubernyc saidive been getting ", "shumacker saidand feel importan",  "fubernyc saidive been getting ", "they start coming after few months " ), uber_date = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l ), .label = c("jan 19, 2017", "mar 30, 2017", "jan 23, 2017",  "jan 12, 2017", "jan 9, 2017", "jan 1, 2017", "dec 31, 2016",  "nov 26, 2016", "nov 3, 2016", "dec 22, 2016", "dec 13, 2016",  "dec 2, 2016", "nov 15, 2016", "oct 31, 2016", "oct 20, 2016",  "mar 14, 2017", "sep 1, 2016", "jul 26, 2016", "mar 1, 2017",  "feb 25, 2017", "sep 8, 2016", "sep 9, 2016", "apr 21, 2015"), class = "factor")), .names = c("uber_name",  "uber_write", "uber_date"), class = c("data.table", "data.frame" ), row.names = c(na, -9l), .internal.selfref = <pointer: 0x0000000000220788>)

i've used gsub before can't figure out how apply instance. want take names in "uber_names" column , removes these users of "uber_writes" postings.

you make vector uber_names of user names in data.table (dt) , generate regular expression (name1|name2|name3) replace matching user names "", like:

library(data.table) uber_names <- dt$uber_name dt[, uber_write_filtered := gsub(     pattern = paste0("(", paste(uber_names, collapse = "|"), ")"),     replacement = "", uber_write)]

Search This Blog

MOno

string - R pattern matching multiple combinations of rows in columns for replacement? -

Comments

Post a Comment

Popular posts from this blog

Retrieving ETA (estimated time of arrival) with Google Distance Matrix API and public transit as transport mode -

javascript - Confirm a form & display message if form is valid with JQuery -

ionic framework - Meteor - Error: Failed to execute 'insertBefore' on 'Node' -