Clustering with imputation

Garrett Todd

Join Date: Apr 2021

Posts: 14
#1

Clustering with imputation

26 Aug 2022, 16:19

I am using a dataset that uses a school-based sampling design, and thus I want to cluster with school id in my modeling. However, I'm also using multiple imputation. Do I need to include the cluster id variable in the imputation model, or can I just specify it in the ensuing models? And if so, how do I do that given that it's in my dataset as a string--I don't think it would make sense to destring and then convert back to string after imputation, would it?

As of now, without the school cluster id included in the imputation model, I have something like:

Code:

mi set mlong mi register imputed y x1 x2 x3 mi impute chained (regress) x1 x2 x3 = y, add(5) rseed(100) mi estimate: regress y x1 x2 x3, vce(cluster school_id)
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#2

26 Aug 2022, 19:01

the following FAQ is relevant: https://www.stata.com/support/faqs/s...and-mi-impute/
1 like
Comment
Garrett Todd

Join Date: Apr 2021

Posts: 14
#3

26 Aug 2022, 20:22

Thanks Rich Goldstein. I'm a bit stuck, though: I have far too many clusters for option 1 and the clusters are too small for option 2. But option 3 seems to be best used with repeated-measures data (as it specifies), and my data are cross-sectional. Plus, many of my clusters have just 1 (or 2 or 3) observations within them, so I'm not quite sure how imputation within a cluster of 1 or 2 would occur.

Additionally, I have a lot of variables and a lot of observations, and many of my variables are continuous, so theoretical concerns aside, reshaping it wide would (I think) result in tens of thousands of new variables. So that doesn't seem practical. But I will take a closer look in case I am misunderstanding.

Last edited by Garrett Todd; 26 Aug 2022, 20:41.
Comment

Announcement

Clustering with imputation

Comment

Comment