regex - Python: UserWarning: This pattern has match groups. To actually get the groups, use str.extract -


i have dataframe , try string, on of column contain string df looks like

member_id,event_path,event_time,event_duration 30595,"2016-03-30 12:27:33",yandex.ru/,1 30595,"2016-03-30 12:31:42",yandex.ru/,0 30595,"2016-03-30 12:31:43",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:44",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:45",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:46",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%d1%84%d0%b8%d0%bb%d1%8c%d0%bc%d1%8b+%d0%be%d0%bd%d0%bb%d0%b0%d0%b9%d0%bd&suggest_reqid=168542624144922467267026838391360&csg=3381%2c3938%2c2%2c3%2c1%2c0%2c0,0 30595,"2016-03-30 12:31:49",kinogo.co/,1 30595,"2016-03-30 12:32:11",kinogo.co/melodramy/,0 

and df urls

url 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_bq_phoenix 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_fly_ 003\.ru\/sonyxperia 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony 003\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony\/brands5d5bbr_23 1click\.ru\/sonyxperia 1click\.ru\/[a-za-z0-9-_%$#?.:+=|()]+\/chasy-motorola 

i use

urls = pd.read_csv('relevant_url1.csv', error_bad_lines=false) substr = urls.url.values.tolist() data = pd.read_csv('data_nts2.csv', error_bad_lines=false, chunksize=50000) result = pd.dataframe() i, df in enumerate(data):     res = df[df['event_time'].str.contains('|'.join(substr), regex=true)] 

but return me

userwarning: pattern has match groups. groups, use str.extract. 

how can fix that?

at least 1 of regex patterns in urls must use capturing group. str.contains returns true or false each row in df['event_time'] -- not make use of capturing group. thus, userwarning alerting regex uses capturing group match not used.

if wish remove userwarning find , remove capturing group regex pattern(s). not shown in regex patterns posted, must there in actual file. parentheses outside of character classes.

alternatively, suppress particular userwarning putting

import warnings warnings.filterwarnings("ignore", 'this pattern has match groups') 

before call str.contains.


here simple example demonstrates problem (and solution):

# import warnings # warnings.filterwarnings("ignore", 'this pattern has match groups') # uncomment suppress userwarning  import pandas pd  df = pd.dataframe({ 'event_time': ['gouda', 'stilton', 'gruyere']})  urls = pd.dataframe({'url': ['g(.*)']})   # capturing group, there userwarning # urls = pd.dataframe({'url': ['g.*']})   # without capturing group, there no userwarning. uncommenting line avoids userwarning.  substr = urls.url.values.tolist() df[df['event_time'].str.contains('|'.join(substr), regex=true)] 

prints

  script.py:10: userwarning: pattern has match groups. groups, use str.extract.   df[df['event_time'].str.contains('|'.join(substr), regex=true)] 

removing capturing group regex pattern:

urls = pd.dataframe({'url': ['g.*']})    

avoids userwarning.


Comments

Popular posts from this blog

How to understand 2 main() functions after using uftrace to profile the C++ program? -

c# - Update a combobox from a presenter (MVP) -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -