sas - Excel - How to split data into train and test sets that are equally distributed -


i've got data set (in excel) i'm going import sas undertake modelling.

i've got method randomly splitting excel dataset (using =rand() function), there way (at splitting stage) ensure distribution of samples (other keep randomly splitting , testing distribution until becomes acceptable)?

otherwise, if best performed in sas, efficient approach testing sample randomness?

the dataset contains 35 variables, mixture of binary, continuous , categorical variables.

in sas, can use proc surveyselect this.

proc surveyselect data=sashelp.cars out=cars_out outall samprate=0.7; run;  data train test;   set cars_out;   if selected output test;   else output train; run; 

if there particular variable[s] want make sure train , test sets balanced on, can use either strata or control depending on sort of thing you're talking about. control make approximate attempt things control variables (it sorts control variable, pulls every 3rd or whatever, sort of approximate balance; if have 2+ control variables snake-sorts, asc. desc. etc. inside, reduces randomness).

if use strata, guarantees sample rate inside strata - if did:

proc sort data=sashelp.cars out=cars;   origin; run;   proc surveyselect data=cars out=cars_out outall samprate=0.7; strata origin; run; 

(and final splitting data step same) you'd 70% of each separate origin pulled (which end being 70% of total, of course).

which depends on care being balanced by. more things with, less balanced else, cautious; may simple random sample best, if have enough n.

if don't have enough n, can use bootstrapping techniques, meaning take sample replacement 70% , take maybe 100 of samples, each higher n original. test or whatever on each sample selected, , variation in results tells how you're doing if n not enough in 1 pass.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -