Hello!
I am trying to calculate the percentage of missing observations per variable by dividing the number of missing observations by the number of expected observations per variable. Because the denominator will vary across most variables (due to skip patterns in the survey), for each variable, I have created two corresponding unique variables, one to store the number of missing observations (with suffix _m attached to the original variable name) and one to store the number of expected observations (with suffix _exp attached to the original variable name).
Here's what I did to create the missing observations variables:
/* For each variable, create a corresponding variable with suffix _m which holds the number of missing values for that variable. */
foreach var of varlist _all {
capture confirm string variable `var'
if !_rc {
count if `var' == " "
gen `var'_m = r(N)
}
if _rc {
count if `var' == . | `var' == -99
gen `var'_m = r(N)
}
}
And here's a sample of the code used to create the variables for the expected number of observations per variable. The first line counts the number of observations meeting the conditions of the skip pattern, and the subsequent lines store that count in a new variable with suffix _exp for the variables that are dependent on the skip pattern.
quietly count if stage == 1 | stage == 2
gen ii1_exp = r(N)
gen ii2_exp = r(N)
gen ii3_exp = r(N)
quietly count if everotp_7 != 1
gen weekotp_1_exp = r(N)
gen weekotp_2_exp = r(N)
gen weekotp_3_exp = r(N)
gen weekotp_4_exp = r(N)
gen weekotp_5_exp = r(N)
gen weekotp_6_exp = r(N)
gen weekotp_7_exp = r(N)
(I realize this is probably not the best way to do this, as it creates a ton of extra variables that have the same value for each observation. I didn't know how to store single numbers though - feedback on that is welcome, as well!).
To create the percentage variable, I would like to create a foreach loop like this:
foreach var in varlist _all {
gen `var'_per = `var'_m/`var'_exp
}
This doesn't work, as the `var' names now include the suffixes _m and _exp, so Stata reads this as variables var2_m_m and var2_exp_exp, which don't exist. Is there a way to create the percentage variable in a loop such that the beginning of the variable before the _m and _exp can be used to match on which variables are supposed to be included in the division equation, and would also determine what precedes _per in creating the new percentage variable?
Thanks for any help you can provide!
I am trying to calculate the percentage of missing observations per variable by dividing the number of missing observations by the number of expected observations per variable. Because the denominator will vary across most variables (due to skip patterns in the survey), for each variable, I have created two corresponding unique variables, one to store the number of missing observations (with suffix _m attached to the original variable name) and one to store the number of expected observations (with suffix _exp attached to the original variable name).
Here's what I did to create the missing observations variables:
/* For each variable, create a corresponding variable with suffix _m which holds the number of missing values for that variable. */
foreach var of varlist _all {
capture confirm string variable `var'
if !_rc {
count if `var' == " "
gen `var'_m = r(N)
}
if _rc {
count if `var' == . | `var' == -99
gen `var'_m = r(N)
}
}
And here's a sample of the code used to create the variables for the expected number of observations per variable. The first line counts the number of observations meeting the conditions of the skip pattern, and the subsequent lines store that count in a new variable with suffix _exp for the variables that are dependent on the skip pattern.
quietly count if stage == 1 | stage == 2
gen ii1_exp = r(N)
gen ii2_exp = r(N)
gen ii3_exp = r(N)
quietly count if everotp_7 != 1
gen weekotp_1_exp = r(N)
gen weekotp_2_exp = r(N)
gen weekotp_3_exp = r(N)
gen weekotp_4_exp = r(N)
gen weekotp_5_exp = r(N)
gen weekotp_6_exp = r(N)
gen weekotp_7_exp = r(N)
(I realize this is probably not the best way to do this, as it creates a ton of extra variables that have the same value for each observation. I didn't know how to store single numbers though - feedback on that is welcome, as well!).
To create the percentage variable, I would like to create a foreach loop like this:
foreach var in varlist _all {
gen `var'_per = `var'_m/`var'_exp
}
This doesn't work, as the `var' names now include the suffixes _m and _exp, so Stata reads this as variables var2_m_m and var2_exp_exp, which don't exist. Is there a way to create the percentage variable in a loop such that the beginning of the variable before the _m and _exp can be used to match on which variables are supposed to be included in the division equation, and would also determine what precedes _per in creating the new percentage variable?
Thanks for any help you can provide!
Comment