numpy - Pandas: How to calculate turnover rate? -


i want calculate turnover rate of group of persons using pandas. size of group may change, want know percentage of people left each year.

better explain example. here sample data:

  teachers  year 0     john  2007 1     paul  2007 2     mary  2007  3     john  2008 4     paul  2008 5     abel  2008 6     watt  2008  7     john  2009 8     mary  2009 

i'd arrive in dataset:

year turnover  2008 .33333  2009 .75 

in first year, mary left, in second year paul, abel , watt left. have kind of bias: if group shrinking, turnover rate bigger.

the plan

  • i'm going set index 'year' , 'teachers', assign dummy variable of x=1 ahead of time.
  • i want have 'year' index, unstack put 'teachers' in columns. use fill_value=0 option fill in zeros teachers weren't there particular year.
  • using diff , checking if equal -1 identifies turnover event. sum(1) sums turnover events.
  • d1.sum(1).shift() counts teachers prior year.
  • divide turnover.

d1 = pd.series(1, [df.year, df.teachers]).unstack(fill_value=0) d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()  year 2008    0.333333 2009    0.750000 dtype: float64 

as pointed out @jrjc in comments, first line crosstab. in mind, can reduce code to:

d1 = pd.crosstab(df.year, df.teachers) d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna() 

one line using pipe

pd.crosstab(df.year, df.teachers).pipe(     lambda c: c.diff().eq(-1).sum(1).div(c.sum(1).shift(),0).dropna() ) 

Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -