Match variables on beginning of varname up to the common suffix

Kathryn Peebles

Join Date: Mar 2015

Posts: 4
#1

Match variables on beginning of varname up to the common suffix

11 Mar 2015, 09:18

Hello!

I am trying to calculate the percentage of missing observations per variable by dividing the number of missing observations by the number of expected observations per variable. Because the denominator will vary across most variables (due to skip patterns in the survey), for each variable, I have created two corresponding unique variables, one to store the number of missing observations (with suffix _m attached to the original variable name) and one to store the number of expected observations (with suffix _exp attached to the original variable name).

Here's what I did to create the missing observations variables:

/* For each variable, create a corresponding variable with suffix _m which holds the number of missing values for that variable. */
foreach var of varlist _all {
capture confirm string variable `var'
if !_rc {
count if `var' == " "
gen `var'_m = r(N)
}
if _rc {
count if `var' == . | `var' == -99
gen `var'_m = r(N)
}
}

And here's a sample of the code used to create the variables for the expected number of observations per variable. The first line counts the number of observations meeting the conditions of the skip pattern, and the subsequent lines store that count in a new variable with suffix _exp for the variables that are dependent on the skip pattern.

quietly count if stage == 1 | stage == 2
gen ii1_exp = r(N)
gen ii2_exp = r(N)
gen ii3_exp = r(N)

quietly count if everotp_7 != 1
gen weekotp_1_exp = r(N)
gen weekotp_2_exp = r(N)
gen weekotp_3_exp = r(N)
gen weekotp_4_exp = r(N)
gen weekotp_5_exp = r(N)
gen weekotp_6_exp = r(N)
gen weekotp_7_exp = r(N)

(I realize this is probably not the best way to do this, as it creates a ton of extra variables that have the same value for each observation. I didn't know how to store single numbers though - feedback on that is welcome, as well!).

To create the percentage variable, I would like to create a foreach loop like this:

foreach var in varlist _all {
gen `var'_per = `var'_m/`var'_exp
}

This doesn't work, as the `var' names now include the suffixes _m and _exp, so Stata reads this as variables var2_m_m and var2_exp_exp, which don't exist. Is there a way to create the percentage variable in a loop such that the beginning of the variable before the _m and _exp can be used to match on which variables are supposed to be included in the division equation, and would also determine what precedes _per in creating the new percentage variable?

Thanks for any help you can provide!
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3845
#2

11 Mar 2015, 09:34

I do not know what the expected number of observations is supposed to be, but this seems like a "reinventing the wheel" approach.

Here is what I would do

Step 1: define different types of missing values.

If an item/question/variable is missing due to the survey design code it as system missing (i.e. .). Code all other types of missing values with (the same) extended missing value code (see help missing).

Step 2:

Type in Stata

Code:

misstable summarize

Also, see if misstable with the generate() options and/or if qualifier gives you what you seek.

Best
Daniel

Last edited by daniel klein; 11 Mar 2015, 09:37.
Comment
Kathryn Peebles

Join Date: Mar 2015

Posts: 4
#3

11 Mar 2015, 12:32

Thanks, Daniel! Those resources were really helpful.
Comment

Announcement

Match variables on beginning of varname up to the common suffix

Comment

Comment