Hi there,
I have a data set containing several variables. I would like to split the data into 10 groups (preferably roughly equal sized). The data within each group should contain a similar distribution of the variables from the original data set. Is there a way to do this in Stata?
For example, I am using the Cattaneo data set. I have created a binary outcome (i.e.,
). There are multiple variables, but to keep things simple I have chosen one variable (i.e.,
). The distribution of
in the original data set is 6.0% (for lbw=1) and 94.0% (for lbw=0).. The distribution of
in the original data set is 18.6% smokers and 81.4% non-smokers.
I have tried using
command with the
option, but I am not sure if this is the best way to go about this?
Any help is much appreciated
I have a data set containing several variables. I would like to split the data into 10 groups (preferably roughly equal sized). The data within each group should contain a similar distribution of the variables from the original data set. Is there a way to do this in Stata?
For example, I am using the Cattaneo data set. I have created a binary outcome (i.e.,
Code:
lbw
Code:
mbsmoke
Code:
lbw
Code:
mbsmoke
Code:
* Load the data use "http://www.stata-press.com/data/r14/cattaneo2.dta", clear * View the data and recode variables gen lbw = cond(bweight<2500,1,0.) lab var lbw "Low birthweight, <2500 g" tab mbsmoke
Code:
sample
Code:
by()
Any help is much appreciated

Comment