numpy - Pandas: How to calculate turnover rate? -
i want calculate turnover rate of group of persons using pandas. size of group may change, want know percentage of people left each year.
better explain example. here sample data:
teachers year 0 john 2007 1 paul 2007 2 mary 2007 3 john 2008 4 paul 2008 5 abel 2008 6 watt 2008 7 john 2009 8 mary 2009
i'd arrive in dataset:
year turnover 2008 .33333 2009 .75
in first year, mary left, in second year paul, abel , watt left. have kind of bias: if group shrinking, turnover rate bigger.
the plan
- i'm going set index
'year'
,'teachers'
,assign
dummy variable ofx=1
ahead of time. - i want have
'year'
index,unstack
put'teachers'
in columns. usefill_value=0
option fill in zeros teachers weren't there particular year. - using
diff
, checking if equal -1 identifies turnover event.sum(1)
sums turnover events. d1.sum(1).shift()
counts teachers prior year.- divide turnover.
d1 = pd.series(1, [df.year, df.teachers]).unstack(fill_value=0) d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna() year 2008 0.333333 2009 0.750000 dtype: float64
as pointed out @jrjc in comments, first line crosstab
. in mind, can reduce code to:
d1 = pd.crosstab(df.year, df.teachers) d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()
one line using pipe
pd.crosstab(df.year, df.teachers).pipe( lambda c: c.diff().eq(-1).sum(1).div(c.sum(1).shift(),0).dropna() )
Comments
Post a Comment