NHIS (National Health Interview Survey) imputed data

Fred Wolfe

Join Date: Mar 2014

Posts: 10
#1

NHIS (National Health Interview Survey) imputed data

27 Jun 2014, 03:13

NHIS (National Health Interview Survey) makes available an imputed income file together with a Stata do, and I have used the do file to create 5 Stata files. I wonder if anyone has had experience using these imputed file with Stata's MI commands. I notice there is an MI import NHANES, but no similar file for NHIS. I'd appreciate an suggestions or experience.

Thanks,

Fred
Tags: None
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#2

27 Jun 2014, 04:54

Fred, I didn't have a pleasure of working with those files but if this data is in the public domain I would be keen to have a look at it, if possible.

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment
Fred Wolfe

Join Date: Mar 2014

Posts: 10
#3

27 Jun 2014, 09:22

The data are available at http://www.cdc.gov/nchs/nhis/quest_d...97_forward.htm

Fred
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#4

27 Jun 2014, 10:33

EDIT: I suspect the strategy outlined in my next post is easier.
-------------------------------------------------------------------------------------------
I am guessing you want to use -mi import flongsep-. See the help. It seems a little tricky, though, because the imputed files only contain a few variables, and they don't seem to include the 0 (non-imputed) file.

Just off the top of my head, my guess would be that

* you should create the zero file that matches the vars in the imputed files, i.e. create an extract that has things like id and the few vars that are imputed.
* Use mi import flongsep, specifying the 5 imputed files
* use mi convert wide so each case uses a single line
* merge wide version of imputed data with the original
* use mi import wide
* Then mi convert to whatever format you want, e.g. mlong

Before I did all that, I would probably wait to see if somebody else answered, or do some googling. It seems like they could have made this a lot easier, and maybe they did but I don't know it.

Last edited by Richard Williams; 27 Jun 2014, 10:49.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#5

27 Jun 2014, 10:48

OR, to simplify things, maybe you can merge the 5 imputed files with the original unimputed file, with the imputed variables being tacked on at the end of each record. You might have to rename the variables first in each of the imputed files so they are unique and don't overwrite the original variables. Then, once merged, you can use mi import wide, followed by mi convert to whatever. See -help mi_styles##wide-.

So in other words, my theory is that you merge original and imputed data, with the imputed vars tacked on at the end of each record. Then you tell stata that it is an mi wide file, and then can convert to whatever you want (or leave as is for that matter). If you name the variables correctly, this is hopefully easy to do.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#6

27 Jun 2014, 11:40

You may also want to consider doing the imputations on your own and ignoring theirs.

* 5 imputations is now considered pretty wimpy; most sources seem to recommend at least 10 and preferably 20 or more. See http://www.statisticalhorizons.com/more-imputations. (Of course, you could keep their imputations and add more).

* You might be able to come up with a better imputation model. If the analytic model is y regressed on x1 through x10, then the imputation model for x10 should probably at least include y and x1 through x9.

Of course, they may be able to come up with a better imputation model if, say, they have access to confidential variables that were not released with the data set. Read their documentation.

I at first thought it was neat when places released 5 imputed data sets. Now I wonder if you aren't better off just doing it on your own.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Fred Wolfe

Join Date: Mar 2014

Posts: 10
#7

27 Jun 2014, 12:49

Thanks, Richard. What led me to my question was browsing through some published article. In one I found this description:

"The NHIS uses the method of multiple imputations for missing family income data [30]. The methods used to create and analyze the multiple imputations are described in the NHIS’ technical document [31]. Briefly, the NHIS provides five imputed income data. Each of the five completed data sets was separately analyzed with STATA. The point estimates (multivariate logits) and the estimated standard errors from the five models were combined to arrive at a single point estimate, its estimated standard error, and the associated confidence interval or significance test. The combined point estimate was calculated using the average of the point estimates obtained from the five completed data sets. The estimated variance of the combined point estimate was computed by adding two components:
(a) the average of the five estimated variances; (b) the variation among the five points (representing the
uncertainty due to imputing for the missing values). Finally, the 95% confidence intervals and significance tests were constructed using a t reference distribution.
Comparison of the non-imputed results to the imputed results did not show any significant changes. The multi- variate analyses using non-imputed income decreased the overall sample size by approximately 25%; therefore, using the imputed data increases the stability of the estimates."

Since this seems to deviate from the Stata methodolgy and, at least for me, was in some respects unclear, I wondered if there was an official Stata position or whether others had routines that might accomplish the task.

Fred
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#8

27 Jun 2014, 13:04

How old is this paper? I wonder if it predates the official Stata mi commands. It sort of sounds like they did everything one data set at a time and then combined the results themselves. If so, I don't know why anybody would do that, unless they didn't have the software to do everything for you.

I'm not sure how much what they say they did differs from what Stata actually does.

Anyway, unless they explicitly stated that there is some sort of flaw in the way Stata handles MI and that what they did was better, I would go ahead and use Stata's own routines. And if they really do claim that what they do is better I would be highly skeptical!

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Fred Wolfe

Join Date: Mar 2014

Posts: 10
#9

27 Jun 2014, 13:34

Its 2006, but I think you are correct.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4983
#10

27 Jun 2014, 13:49

That is pretty old as these things go. I think Stata's own routines weren't introduced until Stata 11 a few years later. I am guessing they used Royston's programs, or maybe they even just improvised on their own.

Anyway, if you want to use the data, I think my advice about needing to merge the various files still stands, unless somebody knows of a better way. Once you have the file in the right format then just use mi estimate for the analysis.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Samantha Tang

Join Date: Apr 2017

Posts: 1
#11

18 Apr 2017, 12:22

Can anyone illuminate me on the question why the NHIS has not provided one (or alternative) imputed income value(s) with the annually-released public use file so that users don't have to jump through hoops to get the imputed income value for each record/family? The Medical Expenditure Panel Survey does this:

"MEPS income data are carefully edited to fill in for missing and incomplete data prior to public release. Response rates for selected sources of income are provided in Table 4-a. The editing process relies on a sophisticated sequential hotdeck imputation program guided by regression analysis. The editing process makes use of other sources of information to guide the imputations and attempts to preserve key relationships between different sources of income. In particular, detailed questions about current employment status, current hours worked, number of weeks worked, wage and salary rates are asked of all household members age 16 in the employment section of the MEPS instrument. These data are used to fill in for missing data on annual earned income when necessary."

There must be a good reason...right?
Comment
Adam Gaffney

Join Date: May 2018

Posts: 13
#12

04 May 2018, 23:14

Samantha Tang I'm also very puzzled with why the NHIS made that choice, and it's really a huge headache. Richard Williams thanks so much for your advice on using the imputed income files to create new variables, one for each imputed file, which one can then tack on the end of the original dataset, followed by importing the file into mi wide format.

One additional problem is that NHIS doesn't provide an "original" (m = 0) variable to match the five imputations (m =1, m=2 ...) as far as I'm aware. However, they do have a flag variable to indicate whether the variable was imputed or not for each observation. Hence, if I'm not mistaken, one could recreate the original variable using the flag variable to indicate unimputed data from any of the 5 imputed files.

However, I have another question that it would be great if anyone could help me with. One of the imputed variables in the NHIS imputed data set is family income as a percentage of federal poverty. I want to create a new binary variable that classifies all individuals as above or below the federal poverty line using that variable. Could I simply do this for both the original variable and the 5 imputations, and then import this as a passive variable? In other words, let's say povertyratio is the original (m=0), and I've named the five imputations in my wide format dataset povertyratio1, povertyratio2, povertyratio3, povertyratio4, povertyratio5 (m=1, m=2, ... , m=5) using the five data files.

Could I then do:
generate poverty = 1 if povertyratio < 1
replace poverty = 0 if povertyratio >= 1
generate poverty1 = 1 if povertyratio1 < 1
replace poverty1 = 0 if povertyratio1 >= 1
generate poverty2 = 1 if povertyratio2 < 1
replace poverty2 = 0 if povertyratio2 >= 1

....... etc. to poverty 5. And then, do:

mi import wide, imputed( povertyratio = povertyratio1 povertyratio2 povertyratio3 povertyratio4 povertyratio5) ///
passive(poverty = poverty1 poverty2 poverty3 poverty4 poverty5)

?

Many many thanks in advance to anyone who can help me.

Best,

Adam
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#13

05 May 2018, 15:10

Post #12 was reposted as a new topic and the discussion continued at

https://www.statalist.org/forums/for...-data-in-stata
Comment

Announcement

NHIS (National Health Interview Survey) imputed data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment