Python/Pandas: How to combine cumsum and cumcount with agg function? -


i have dataframe grouping internal score , issue date (by quarter). want create statistical table includes cumulative counts of number of loans (represented distinct count of loan #), cumulative sum of loan amounts, , sums of actual loss , outstanding principal. cumulative sum , cumulative count should include snapshot of first date until particular point in time. (i.e. cumulative sum of q1 2015 q2 2015, q1 2015 q3 2015, q1 2015 q4 2015, etc.)

sample dataset:

   loan #   amount issue date tu status list internal score  last actual paid  \ 0   57144  3337.76 2017-04-03              b                            0.0    1   57145  5536.46 2017-04-03              b              c               0.0    2   57160  3443.91 2017-04-03              b              b               0.0    3   57161  1162.79 2017-04-03              b              b               0.0    4   57162  3845.98 2017-04-03              b              b               0.0    5   57163  3441.50 2017-04-03              b              b               0.0    6   57164  2039.96 2017-04-03              b              c               0.0    7   57165  4427.53 2017-04-03              b                            0.0    8   57166  4427.53 2017-04-03              b                            0.0    9   57167  1617.77 2017-04-03              b              b               0.0        outstanding-principal  actual loss   0                3337.76          0.0   1                5536.46          0.0   2                3443.91          0.0   3                1162.79          0.0   4                3845.98          0.0   5                3441.50          0.0   6                2039.96          0.0   7                4427.53          0.0   8                4427.53          0.0   9                1617.77          0.0 

i tried this:

container = [] in ['a', 'b', 'c', 'd']:      subdf = df[df['internal score'].str.contains(i)]      # calculate quarterly vintages     subdf.set_index('issue date', inplace=true)     df2 = subdf.groupby(pd.timegrouper('q')).agg({'outstanding-principal': np.sum, 'actual loss': np.sum,                                                   'amount': cumsum, 'loan #': cumcount})     df2['internal score'] =     container.append(df2)  ddf = pd.concat(container) 

you can first use groupby , apply cumsum afterwards.

i modified dummy data while changing dates span across quarters make example more clear:

print(df)      loan #  amount      issue date  internal score  outstanding principal   actual loss 0   57144   3337.76     2017-04-03                3337.76                 0.0 1   57145   5536.46     2017-04-03  c               5536.46                 0.0 2   57160   3443.91     2017-04-03  b               3443.91                 0.0 3   57161   1162.79     2017-04-03  b               1162.79                 0.0 4   57162   3845.98     2017-04-03  b               3845.98                 0.0 5   57163   3441.50     2017-07-03  b               3441.50                 0.0 6   57164   2039.96     2017-07-03  c               2039.96                 0.0 7   57165   4427.53     2017-07-03                4427.53                 0.0 8   57166   4427.53     2017-07-03                4427.53                 0.0 9   57167   1617.77     2017-07-03  b               1617.77                 0.0 

first, create column containing key identifies quarter , year of given timestamp:

# in case not timestamp df["issue date"] = pd.to_datetime(df["issue date"])  dt = df["issue date"].dt df["quarter"] = dt.strftime("%y").str.cat(dt.quarter.astype(str), " q")  print(df["quarter"])  0    2017 q2 1    2017 q2 2    2017 q2 3    2017 q2 4    2017 q2 5    2017 q3 6    2017 q3 7    2017 q3 8    2017 q3 9    2017 q3 name: quarter, dtype: object 

now, aggreagte:

funcs = {'outstanding principal': np.sum,           'actual loss': np.sum,           'amount': np.sum,           'loan #': len}  result = df.groupby(['internal score', "quarter"]).agg(funcs) print(result)                              outstanding principal   amount      actual loss     loan # internal score  quarter                                2017 q2     3337.76                 3337.76     0.0             1                 2017 q3     8855.06                 8855.06     0.0             2              b  2017 q2     8452.68                 8452.68     0.0             3                 2017 q3     5059.27                 5059.27     0.0             2              c  2017 q2     5536.46                 5536.46     0.0             1                 2017 q3     2039.96                 2039.96     0.0             1 

and use transform cumsum:

cum_cols = ["amount", "loan #"] cumsums = result.groupby(level="internal score")[cum_cols].transform(lambda x: x.cumsum()) result.loc[:, cum_cols] = cumsums  print(result)                             outstanding principal   amount      actual loss     loan # internal score  quarter                                2017 q2     3337.76                 3337.76     0.0             1                 2017 q3     8855.06                12192.82     0.0             3              b  2017 q2     8452.68                 8452.68     0.0             3                 2017 q3     5059.27                13511.95     0.0             5              c  2017 q2     5536.46                 5536.46     0.0             1                 2017 q3     2039.96                 7576.42     0.0             2 

Comments

Popular posts from this blog

How to understand 2 main() functions after using uftrace to profile the C++ program? -

c# - Update a combobox from a presenter (MVP) -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -