python - Removing duplicates in pandas data frame if one column differs, but is in a given list -

February 15, 2011

i have dataframe duplicate entries coming 2 sources, values should unique, 1 column not formatted same, hence should remove duplicate different names in 1 column, if names within list.

technically, remove row in pandas dataframe if there exist row same a , b values, if row’s z value 'bar' , other’s 'z' 'foo'.

an example might clearer:

i have given dataframe df

     b     z  'a'   'a'   'foo' 'a'   'a'   'bar' 'b'   'a'   'bar' 'c'   'c'   'foo' 'd'   'd'   'blb'

and get

     b     z  'a'   'a'   'foo' 'b'   'a'   'bar' 'c'   'c'   'foo' 'd'   'd'   'blb'

note that:

the rows other values 'foo' , 'bar' in z column should not touched.
it’s not important if 'foo' , 'bar' stay same because changed same value afterwards.
it great generalize duo 'foo' , 'bar' list.

attempts far: here best guess, doesn’t work though… don’t understand groupby returns. i’m sure there magical pandas one-liner can’t find.

new_df = [] row in df.groupby('a'):     if rowloc['z'].isin('foo'):          if not row['z'].isin('bar'):                     new_df.append(row)

thanks !

i think can expected result concatenating 2 subsets of original dataframe:

one z values neither foo nor bar
and other 1 duplicates according a , b dropped

here's example gives me expected output:

data = """     b     z     foo     bar b     bar c   c   foo d   d   blb""" df = pd.read_csv(stringio(data),sep='\s+')  ls = ['foo','bar'] df1 = pd.concat((df.loc[~(df.z.isin(ls))], # no foos or bars here                  df.loc[  df.z.isin(ls)].drop_duplicates(subset=['a','b'])                  )).sort_index()

an simpler option might replace foo bar everywhere in z , drop duplicates:

df1 = df.replace({'z':{'foo':'bar'}}).drop_duplicates()

you replace both foo , bar other value you're going use:

df1 = df.replace({'z':{'foo':'xyz', 'bar':'xyz'}}).drop_duplicates()

Search This Blog

MOno

python - Removing duplicates in pandas data frame if one column differs, but is in a given list -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to provide dependency injections in Eclipse RCP 3.x? -