Multiple imputation when there is a certain group that did not answer a certain question

SeungYong Han

Join Date: Jul 2015

Posts: 53
#1

Multiple imputation when there is a certain group that did not answer a certain question

29 Oct 2019, 08:27

Hello Statalist,

I have a general question about multiple imputations method in Stata.
There are multiple variables with missing data for my analysis, and I want to use MI. The problem is that a certain group of participants did not answer a particular question by design. As a result, the variable is missing for that group. In detail, the question was for anyone who had a child/children at the time of the survey, so those who did not have a child at the time of the survey did not answer the question at all.

I am not sure how to use MI for this variable. This is an important variable in the model, so I cannot exclude it.
One way might be to assign a certain value to those participants (e.g., -9) and treat it like non-missing then do MI. But I am not sure if this would create bias or not.

I would appreciate any advice on this issue.
Thank you.
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3912
#2

29 Oct 2019, 08:45

How are planning to treat the missing values for that particular variable in your analyses?

If the true values for the participants who have not answered the respective questions is known, you could insert the true value and perform conditional imputations.

Best
Daniel

Last edited by daniel klein; 29 Oct 2019, 08:48.
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#3

29 Oct 2019, 08:54

Good question! That was not clear.

I am using HINTS, which is a nationally representative data (https://hints.cancer.gov/). And I am only interested in the Asian population who had at least one kid at the time of the survey.
HINTS provides sampling weights to consider complex survey design, and my understanding is that I need to use the whole sample to get correct SE when I use weights. FYI, that was the case when I used NHANES data (advised by CDC), but please correct me if I am wrong. Hence, I am planning to do analysis with two domains: domain A with participants with at least one kid vs. domain B which is the rest. However, again, there are some variables only asked to participants in domain A, not to participants in domain B. I am not sure how to apply MI in this case.

Hope this helps.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3912
#4

29 Oct 2019, 09:08

I cannot say much about the weighting stuff. I can ask more questions, to which I do not know the answer either. For example, if you account for the weights (and other features of the survey design) during imputation, which you probably should, would you still applying weights in the final (MI) analyses? Given that you have already accounted for the survey design during imputation, would a second/repeated adjustment perhaps "over-correct" and make things worse?

Anyway, if there are no missing values in the indicator for having children or not, that is, in the question used to filter respondents without children, then you can just treat the two sub-samples separately from the beginning.

If there are missing values in the indicator for having children, you need to think about whether the true values for those who do not have children are known. For example, regardless of whether we ask them, male respondents could not be pregnant. Similarly, non-smokers would smoke 0 cigarets a day. Respondents who do not have children would not be members in parent-teachers associations.

Best
Daniel

Last edited by daniel klein; 29 Oct 2019, 09:11.
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#5

29 Oct 2019, 09:54

I am not applying weights during imputation. I use weights when I run models for analysis. Yes, using weights during imputation and repeat it again for the final model would bias estimates, I think.
I can separate the whole sample into two groups, and that is domain analysis is for. What I need to is using MI for the whole population to impute missing data for multiple variables including the variables measured only for a certain group then do domain analysis afterward. I will dig more. Thank you for your insight!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#6

29 Oct 2019, 10:31

A couple of things:

If you code a variable as, say. .a, it will NOT get an imputed value. This is good for things that are Not Applicable as opposed to not reported.

Also, for values that are missing because they are non-existent (e.g. Father's Education when there is no Father) Allison says you can use Substituted (plugged in) values plus missing data indicators. See the following, especially pp. 5.

https://www3.nd.edu/~rwilliam/stats3/MD01.pdf

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#7

29 Oct 2019, 10:35

This really helps. Thank you so much!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3912
#8

29 Oct 2019, 11:13

I am sorry, I do not understand this. Why do you think you need to apply MI to the full sample? If you had no missing data, you would run your analyses separately for the two subsamples. I do not see why you cannot do the same with MI.

Best
Daniel
Comment

Announcement

Multiple imputation when there is a certain group that did not answer a certain question

Comment

Comment

Comment

Comment

Comment

Comment

Comment