Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation confusion

    Hello, I was hoping that someone could explain something going on behind the scenes with -mi- that I'm not sure I understand. I have a dataset where "treatment_var" is my main predictor variable, and it has no missing values so I do not want/need to impute that variable at all. This is just a 0/1 variable, with 743 total observations (0 = 400, 1 = 343).

    This is the code I'm using for doing mi, which seemingly works perfectly:

    Code:
    mi set mlong
    
    mi register imputed depression race gender gpa p_educ age
    
    mi impute chained (regress) p_educ depression gpa age (logit) gender (ologit) race = treatment_var, add(20) rseed(100)
    
    mi estimate: regress: depression i.treatment_var i.race i.gender gpa c.age#c.age p_educ, robust

    When I do the -mi estimate: regress- command, I see my "Number of obs" is equal to 743, which is my original number of total observations, so that seems to make sense. But then if I do -tab treatment_var- afterwards (on this imputed dataset), there is something like 3,000 total responses.

    But I thought I was telling Stata not to impute that variable, as it has no missings, and indeed it seems like the actual regression output itself still has the correct original number of observations.

    Am I just overlooking something? What is happening with what that -tab- is showing me?

    Sorry for not providing data here, it is on a different server that I cannot access at the moment. Hopefully the question will still be clear otherwise.

  • #2
    Your dataset has had observations added to it, based on observations that had missing values. Open your dataset in the Data Editor window and scroll down past the initial 743 observations. You will see _mi_m = 1 and various observations (identified by _mi_id) which are the original observations with missing values filled in (compare them to the orignal observation with _mi_m = 0).
    Code:
    webuse mheart5                                                          
    mi set mlong                                                            
    mi register imputed age bmi                                             
    set seed 29390                                                          
    mi impute mvn age bmi = attack smokes hsgrad female, add(10)
    list if _mi_id==14
    Code:
    . list if _mi_id==14, clean
    
           attack   smokes        age        bmi   female   hsgrad   _mi_m   _mi_id   _mi_miss  
     14.        0        0          .          .        0        1       0       14          1  
    156.        0        0   38.57524   31.18536        0        1       1       14          .  
    184.        0        0   58.34894   19.88316        0        1       2       14          .  
    212.        0        0   68.46573   22.72963        0        1       3       14          .  
    240.        0        0   48.14063   25.00218        0        1       4       14          .  
    268.        0        0   66.52374   24.27379        0        1       5       14          .  
    296.        0        0   44.67178   23.36431        0        1       6       14          .  
    324.        0        0   60.70895    19.9942        0        1       7       14          .  
    352.        0        0   73.13823   25.92297        0        1       8       14          .  
    380.        0        0   63.83153   24.19435        0        1       9       14          .  
    408.        0        0   50.23097   28.49728        0        1      10       14          .

    Comment


    • #3
      Ah, I think this makes more sense--so the non-missing -treatment var- is not being imputed, there are just "new" observations incorporating the other imputed variables? And so when I'm doing the -mi estimate: regress-, is it sort of collapsing, in a sense, those all back into the original number of observations--which is why I'm still seeing the 743 in the regression output?

      Comment


      • #4
        Your understanding is correct, and I owe you and apology for omitting a critical reference from post #2, As I drafted post #2, I had included a recommendation that you read the discussion at
        Code:
        help mi##example
        to understand the spirit of mi. Somehow that sentence fell off my screen, and I didn't notice the loss.

        But on looking further, the real advice for understanding mi is to look at the Stata Multiple-Imputation Reference Manual PDF included in your Stata installation and accessible from Stata's Help menu. The very first section, although forbiddingly titled "Intro substantive" is in fact an overview of the substance of multiple imputation, a sort of prerequisite reading before leaping into the documentation for the commands.

        Comment


        • #5
          Anne:
          as an aside to William's helpful advice, why did you code your interaction as:
          Code:
          c.age#c.age
          instead of:
          Code:
          c.age##c.age
          ?
          Just exploiting William's code, in the following toy-example neither the linear, nor the squared term for -age- reach statistcal significance (which is neither a good, nor a bad finding), but I would investigate a possible turning point in your dataset:
          Code:
          . mi estimate: logistic attack smokes bmi female hsgrad c.age##c.age
          
          Multiple-imputation estimates                   Imputations       =         10
          Logistic regression                             Number of obs     =        154
                                                          Average RVI       =     0.0803
                                                          Largest FMI       =     0.2618
          DF adjustment:   Large sample                   DF:     min       =     142.24
                                                                  avg       =  13,394.40
                                                                  max       =  56,532.86
          Model F test:       Equal FMI                   F(   6, 6678.2)   =       2.84
          Within VCE type:          OIM                   Prob > F          =     0.0093
          
          ------------------------------------------------------------------------------
                attack |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                smokes |   1.206959   .3659681     3.30   0.001     .4895212    1.924398
                   bmi |   .1110131   .0520222     2.13   0.035     .0081766    .2138496
                female |   -.048182   .4162727    -0.12   0.908    -.8640789     .767715
                hsgrad |   .1854987   .4077014     0.45   0.649    -.6136161    .9846135
                   age |   .0987561   .1178324     0.84   0.402    -.1323856    .3298978
                       |
           c.age#c.age |  -.0005969   .0010289    -0.58   0.562    -.0026148    .0014211
                       |
                 _cons |  -7.257825   3.892616    -1.86   0.063    -14.91629    .4006389
          ------------------------------------------------------------------------------
          
          .
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you, William Lisowski, I appreciate the help very much.

            Comment


            • #7
              Carlo Lazzaro you are correct that it should be ##, as I want to allow age to be a squared term in the model...I just mis-typed it in the original post.

              Comment


              • #8
                Anne:
                thanks for clarifying.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment

                Working...
                X