numpy - Pandas: How to calculate turnover rate? -
i want calculate turnover rate of group of persons using pandas. size of group may change, want know percentage of people left each year.
better explain example. here sample data:
teachers year 0 john 2007 1 paul 2007 2 mary 2007 3 john 2008 4 paul 2008 5 abel 2008 6 watt 2008 7 john 2009 8 mary 2009 i'd arrive in dataset:
year turnover 2008 .33333 2009 .75 in first year, mary left, in second year paul, abel , watt left. have kind of bias: if group shrinking, turnover rate bigger.
the plan
- i'm going set index
'year','teachers',assigndummy variable ofx=1ahead of time. - i want have
'year'index,unstackput'teachers'in columns. usefill_value=0option fill in zeros teachers weren't there particular year. - using
diff, checking if equal -1 identifies turnover event.sum(1)sums turnover events. d1.sum(1).shift()counts teachers prior year.- divide turnover.
d1 = pd.series(1, [df.year, df.teachers]).unstack(fill_value=0) d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna() year 2008 0.333333 2009 0.750000 dtype: float64 as pointed out @jrjc in comments, first line crosstab. in mind, can reduce code to:
d1 = pd.crosstab(df.year, df.teachers) d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna() one line using pipe
pd.crosstab(df.year, df.teachers).pipe( lambda c: c.diff().eq(-1).sum(1).div(c.sum(1).shift(),0).dropna() )
Comments
Post a Comment