Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering with imputation

    I am using a dataset that uses a school-based sampling design, and thus I want to cluster with school id in my modeling. However, I'm also using multiple imputation. Do I need to include the cluster id variable in the imputation model, or can I just specify it in the ensuing models? And if so, how do I do that given that it's in my dataset as a string--I don't think it would make sense to destring and then convert back to string after imputation, would it?

    As of now, without the school cluster id included in the imputation model, I have something like:

    Code:
    mi set mlong
    mi register imputed y x1 x2 x3 
    mi impute chained (regress) x1 x2 x3 = y, add(5) rseed(100)
    
    mi estimate: regress y x1 x2 x3, vce(cluster school_id)

  • #2
    the following FAQ is relevant: https://www.stata.com/support/faqs/s...and-mi-impute/

    Comment


    • #3
      Thanks Rich Goldstein. I'm a bit stuck, though: I have far too many clusters for option 1 and the clusters are too small for option 2. But option 3 seems to be best used with repeated-measures data (as it specifies), and my data are cross-sectional. Plus, many of my clusters have just 1 (or 2 or 3) observations within them, so I'm not quite sure how imputation within a cluster of 1 or 2 would occur.

      Additionally, I have a lot of variables and a lot of observations, and many of my variables are continuous, so theoretical concerns aside, reshaping it wide would (I think) result in tens of thousands of new variables. So that doesn't seem practical. But I will take a closer look in case I am misunderstanding.
      Last edited by Garrett Todd; 26 Aug 2022, 20:41.

      Comment

      Working...
      X