regex - Python: UserWarning: This pattern has match groups. To actually get the groups, use str.extract -

June 15, 2013

i have dataframe , try string, on of column contain string df looks like

member_id,event_path,event_time,event_duration 30595,"2016-03-30 12:27:33",yandex.ru/,1 30595,"2016-03-30 12:31:42",yandex.ru/,0 30595,"2016-03-30 12:31:43",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:44",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:45",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:46",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:49",kinogo.co/,1 30595,"2016-03-30 12:32:11",kinogo.co/melodramy/,0

and df urls

url 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_bq_phoenix 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_fly_ 003\.ru\/sonyxperia 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony\/brands5d5bbr_23 1click\.ru\/sonyxperia 1click\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/chasy-motorola

i use

urls = pd.read_csv('relevant_url1.csv', error_bad_lines=false) substr = urls.url.values.tolist() data = pd.read_csv('data_nts2.csv', error_bad_lines=false, chunksize=50000) result = pd.dataframe() i, df in enumerate(data):     res = df[df['event_time'].str.contains('|'.join(substr), regex=true)]

but return me

userwarning: pattern has match groups. groups, use str.extract.

how can fix that?

at least 1 of regex patterns in urls must use capturing group. str.contains returns true or false each row in df['event_time'] -- not make use of capturing group. thus, userwarning alerting regex uses capturing group match not used.

if wish remove userwarning find , remove capturing group regex pattern(s). not shown in regex patterns posted, must there in actual file. parentheses outside of character classes.

alternatively, suppress particular userwarning putting

import warnings warnings.filterwarnings("ignore", 'this pattern has match groups')

before call str.contains.

here simple example demonstrates problem (and solution):

# import warnings # warnings.filterwarnings("ignore", 'this pattern has match groups') # uncomment suppress userwarning  import pandas pd  df = pd.dataframe({ 'event_time': ['gouda', 'stilton', 'gruyere']})  urls = pd.dataframe({'url': ['g(.*)']})   # capturing group, there userwarning # urls = pd.dataframe({'url': ['g.*']})   # without capturing group, there no userwarning. uncommenting line avoids userwarning.  substr = urls.url.values.tolist() df[df['event_time'].str.contains('|'.join(substr), regex=true)]

prints

  script.py:10: userwarning: pattern has match groups. groups, use str.extract.   df[df['event_time'].str.contains('|'.join(substr), regex=true)]

removing capturing group regex pattern:

urls = pd.dataframe({'url': ['g.*']})

avoids userwarning.

Search This Blog

MOno

regex - Python: UserWarning: This pattern has match groups. To actually get the groups, use str.extract -

Comments

Post a Comment

Popular posts from this blog

javascript - Confirm a form & display message if form is valid with JQuery -

Retrieving ETA (estimated time of arrival) with Google Distance Matrix API and public transit as transport mode -

ionic framework - Meteor - Error: Failed to execute 'insertBefore' on 'Node' -