excel - How to do conditional count based on row value in SAS/SQL? -
re-uploading since there problems last post, , did not know supposed post sample data. i'm new sas, , have problem know how solve in excel not sas. however, dataset large reasonably use in excel.
i have 4 variables: id, year_start, groupname, test_score.
sample data:
id year_start group_name test_score 1 19931231 red 90 1 19941230 red 89 1 19951231 red 91 1 19961231 red 92 2 19930630 red 85 2 19940629 red 87 2 19950630 red 95 3 19950931 blue 90 3 19960931 blue 90 4 19930331 red 95 4 19940331 red 97 4 19950330 red 98 4 19960331 red 95 5 19931231 red 96 5 19941231 red 97
my goal achieve ranked list (fractional) test_score each year. hoped able achieve using proc rank fraction. function calculate order test_score (highest 1, 2nd highest 2 , on) , divide total number of observations provide fractional rank. unfortunately, year_start differs row row. each id/year combo, want perform one-year look-back year-start, , rank observation compared other id's have year_start in 1 year range. i'm not interested in comparing calendar year, , rank of each id should relative own year_start. adding level of complication, rank performed groupname.
proc sql totally fine if has sql solution.
using above data, ranks this:
id year_start group_name test_score rank 1 19931231 red 90 0.75 1 19941230 red 89 0.8 1 19951231 red 91 1 1 19961231 red 92 1 2 19930630 red 85 1 2 19940629 red 87 0.8 2 19950630 red 95 0.75 3 19950931 blue 90 1 3 19960931 blue 90 1 4 19930331 red 95 1 4 19940331 red 97 0.2 4 19950330 red 98 0.2 4 19960331 red 95 0.333 5 19931231 red 96 0.25 5 19941231 red 97 0.667
in order calculate rank row 1,
- we first exclude blue observations.
- then count number of observations fall within year before year_start, 19931231 (so have 4 observations).
- we count how many of these observations have higher test_score, , add 1 find order of current observation (so 3rd highest).
- then, divide order total number rank (3/4= 0.75).
in excel, formula variable this. assume formula row 1 , there 100 rows. id=a, year_start=b, groupname=c, , test_score=d:
=(1+countifs(d1:d100,">"&d1, b1:b100,"<="&b1, b1:b100,">"&b1-365.25, c1:c100, c1))/ countifs(b1:b100,"<="&b1, b1:b100,">"&b1-365.25, c1:c100, c1)
thanks help!
ahammond428
your example isn't correct if i'm reading correctly, it's hard know you're trying do. try following , see if works. may need tweak inequalities open or closed depending on whether want include 1 year date. note year_start column needs imported in sas date format work. otherwise can change on input(year_start, yymmdd8.).
proc sql; select distinct a.id, a.year_start, a.group_name, a.test_score, 1+sum(case when b.test_score > a.test_score 1 else 0 end) rank_num, count(b.id) rank_denom, calculated rank_num / calculated rank_denom rank testdata left join testdata b on a.group_name = b.group_name , intnx('year',a.year_start,-1,'s') le b.year_start le a.year_start group a.id, a.year_start, a.group_name, a.test_score order id, year_start; quit;
note changed dates of 9/31 9/30 (since there no 9/31), left 3/30, 6/29, , 12/30 alone since perhaps intended, though other dates seem quarter-end.
Comments
Post a Comment