Python/Pandas: How to combine cumsum and cumcount with agg function? -

June 15, 2014

i have dataframe grouping internal score , issue date (by quarter). want create statistical table includes cumulative counts of number of loans (represented distinct count of loan #), cumulative sum of loan amounts, , sums of actual loss , outstanding principal. cumulative sum , cumulative count should include snapshot of first date until particular point in time. (i.e. cumulative sum of q1 2015 q2 2015, q1 2015 q3 2015, q1 2015 q4 2015, etc.)

sample dataset:

   loan #   amount issue date tu status list internal score  last actual paid  \ 0   57144  3337.76 2017-04-03              b                            0.0    1   57145  5536.46 2017-04-03              b              c               0.0    2   57160  3443.91 2017-04-03              b              b               0.0    3   57161  1162.79 2017-04-03              b              b               0.0    4   57162  3845.98 2017-04-03              b              b               0.0    5   57163  3441.50 2017-04-03              b              b               0.0    6   57164  2039.96 2017-04-03              b              c               0.0    7   57165  4427.53 2017-04-03              b                            0.0    8   57166  4427.53 2017-04-03              b                            0.0    9   57167  1617.77 2017-04-03              b              b               0.0        outstanding-principal  actual loss   0                3337.76          0.0   1                5536.46          0.0   2                3443.91          0.0   3                1162.79          0.0   4                3845.98          0.0   5                3441.50          0.0   6                2039.96          0.0   7                4427.53          0.0   8                4427.53          0.0   9                1617.77          0.0

i tried this:

container = [] in ['a', 'b', 'c', 'd']:      subdf = df[df['internal score'].str.contains(i)]      # calculate quarterly vintages     subdf.set_index('issue date', inplace=true)     df2 = subdf.groupby(pd.timegrouper('q')).agg({'outstanding-principal': np.sum, 'actual loss': np.sum,                                                   'amount': cumsum, 'loan #': cumcount})     df2['internal score'] =     container.append(df2)  ddf = pd.concat(container)

you can first use groupby , apply cumsum afterwards.

i modified dummy data while changing dates span across quarters make example more clear:

print(df)      loan #  amount      issue date  internal score  outstanding principal   actual loss 0   57144   3337.76     2017-04-03                3337.76                 0.0 1   57145   5536.46     2017-04-03  c               5536.46                 0.0 2   57160   3443.91     2017-04-03  b               3443.91                 0.0 3   57161   1162.79     2017-04-03  b               1162.79                 0.0 4   57162   3845.98     2017-04-03  b               3845.98                 0.0 5   57163   3441.50     2017-07-03  b               3441.50                 0.0 6   57164   2039.96     2017-07-03  c               2039.96                 0.0 7   57165   4427.53     2017-07-03                4427.53                 0.0 8   57166   4427.53     2017-07-03                4427.53                 0.0 9   57167   1617.77     2017-07-03  b               1617.77                 0.0

first, create column containing key identifies quarter , year of given timestamp:

# in case not timestamp df["issue date"] = pd.to_datetime(df["issue date"])  dt = df["issue date"].dt df["quarter"] = dt.strftime("%y").str.cat(dt.quarter.astype(str), " q")  print(df["quarter"])  0    2017 q2 1    2017 q2 2    2017 q2 3    2017 q2 4    2017 q2 5    2017 q3 6    2017 q3 7    2017 q3 8    2017 q3 9    2017 q3 name: quarter, dtype: object

now, aggreagte:

funcs = {'outstanding principal': np.sum,           'actual loss': np.sum,           'amount': np.sum,           'loan #': len}  result = df.groupby(['internal score', "quarter"]).agg(funcs) print(result)                              outstanding principal   amount      actual loss     loan # internal score  quarter                                2017 q2     3337.76                 3337.76     0.0             1                 2017 q3     8855.06                 8855.06     0.0             2              b  2017 q2     8452.68                 8452.68     0.0             3                 2017 q3     5059.27                 5059.27     0.0             2              c  2017 q2     5536.46                 5536.46     0.0             1                 2017 q3     2039.96                 2039.96     0.0             1

and use transform cumsum:

cum_cols = ["amount", "loan #"] cumsums = result.groupby(level="internal score")[cum_cols].transform(lambda x: x.cumsum()) result.loc[:, cum_cols] = cumsums  print(result)                             outstanding principal   amount      actual loss     loan # internal score  quarter                                2017 q2     3337.76                 3337.76     0.0             1                 2017 q3     8855.06                12192.82     0.0             3              b  2017 q2     8452.68                 8452.68     0.0             3                 2017 q3     5059.27                13511.95     0.0             5              c  2017 q2     5536.46                 5536.46     0.0             1                 2017 q3     2039.96                 7576.42     0.0             2

Search This Blog

MOno

Python/Pandas: How to combine cumsum and cumcount with agg function? -

Comments

Post a Comment

Popular posts from this blog

javascript - Confirm a form & display message if form is valid with JQuery -

Retrieving ETA (estimated time of arrival) with Google Distance Matrix API and public transit as transport mode -

android - ConstraintLayout: Realign baseline constraint in case if dependent view visibility was set to GONE -