MI Impute

Davia Downey

Join Date: Jul 2017
Posts: 131

15 Sep 2020, 08:58

Posting again as I did not receive any helpful suggestions from my previous post. I have imputed and registered data for a dataset with a lot of missing data. When I run my mi estimate: regression (or mi estimate: xtreg) commands, the STATA output on provides results for a subset of my data, not the newly imputed data. The commands and results are below. Any assistance is appreciated.

Code:

 mi query
mi set mlong
quietly misstable summarize log_accessions diff_minority deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd, generate (miss_)
describe miss_*
mi register imputed log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd
mi register regular employmentrate15_oecd mean_earn_nzstatmi impute mvn log_accessions pct_maori pctbachelors pctmasters pctdoctorate deprivationindex  pctturnover unemploymentrate15_oecd, add(20) rseed (1234)
mi estimate, saving (olsest2): reg $ylist diff_minority deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd policyscore##funding##postcanterbury

HTML Code:

 mi estimate, saving (ols): reg $ylist diff_minority deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd policyscore#funding#postcanterbury

Multiple-imputation estimates                   Imputations       =         20
Linear regression                               Number of obs     =         26
                                                Average RVI       =     0.1554
                                                Largest FMI       =     0.4478
                                                Complete DF       =         15
DF adjustment:   Small sample                   DF:     min       =       7.57
                                                        avg       =      11.25
                                                        max       =      13.28
Model F test:       Equal FMI                   F(  10,   12.7)   =       4.20
Within VCE type:          OLS                   Prob > F          =     0.0096

----------------------------------------------------------------------------------------------------
                    log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------------------------+----------------------------------------------------------------
                     diff_minority |  -.3833525   .8673539    -0.44   0.666    -2.253172    1.486467
                  deprivationindex |   .0433965   .0156211     2.78   0.015     .0096986    .0770944
                      pctbachelors |  -.0136013   .0548682    -0.25   0.811    -.1413866     .114184
                        pctmasters |   .1554737   .1071908     1.45   0.175    -.0809909    .3919383
                      pctdoctorate |  -.1303299   .1012635    -1.29   0.225    -.3540136    .0933537
                       pctturnover |  -.0194386   .0758147    -0.26   0.802    -.1843108    .1454336
           unemploymentrate15_oecd |  -.0868002   .1196383    -0.73   0.483    -.3495745    .1759741
                                   |
policyscore#funding#postcanterbury |
                            0 0 1  |          0  (omitted)
                            0 1 0  |   .6229431   .4451461     1.40   0.185    -.3374752    1.583361
                            0 1 1  |          0  (omitted)
                            1 0 0  |          0  (empty)
                            1 0 1  |   .0101002   .4038377     0.03   0.980    -.8671615     .887362
                            1 1 0  |          0  (empty)
                            1 1 1  |   .6662499   .5448059     1.22   0.246     -.525413    1.857913
                                   |
                             _cons |   10.27254   2.496735     4.11   0.004     4.482924    16.06216
----------------------------------------------------------------------------------------------------

Code:

. mi describe

  Style:  mlong
          last mi update 15sep2020 10:44:27, approximately 1 minute ago

  Obs.:   complete           24
          incomplete        144  (M = 20 imputations)
          ---------------------
          total             168

  Vars.:  imputed:  9; log_accessions(24) diff_minority(142) deprivationindex(126) pctbachelors(142) pctmasters(142)
                    pctdoctorate(142) pctturnover(24) unemploymentrate15_oecd(14) pct_maori(142)

          passive:  0

          regular:  2; employmentrate15_oecd mean_earn_nzstat

          system:   3; _mi_m _mi_id _mi_miss

         (there are 67 unregistered variables)

HTML Code:

 . sum $xlist $ylist

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
diff_minor~y |      2,906    .6226473    .2442557  -.2052834   1.466744
deprivatio~x |      2,922    6.849442    9.205344  -24.29531   42.14392
pctbachelors |      2,906    45.54763    5.907764   20.81843    69.3688
  pctmasters |      2,906    7.798592    2.991676  -4.717332   19.29404
pctdoctorate |      2,906    2.407225    2.249541  -6.408797   11.34198
-------------+---------------------------------------------------------
 pctturnover |      3,024    16.32659    2.005034    10.6232    22.9954
unemployme~d |      3,034    5.340202     1.58543  -.0036189   10.17787
 policyscore |      3,048    .5964567    .4906884          0          1
     funding |      3,048    .0695538    .2544353          0          1
postcanter~y |      3,048    .6929134    .4613613          0          1
-------------+---------------------------------------------------------
log_access~s |      3,024    9.936792    .6676036   7.808344   13.75531

Tags: multiple imputation

daniel klein

Join Date: Mar 2014

Posts: 3859
#2

15 Sep 2020, 12:12

You are probably referring to this post which (together with this post of yours) suggest that the problem is imputing the data. Show us your imputation steps including both commands and output.

My guess is that you are using the force option (which you should never ever use) during the imputation, leading to missing imputed values.

Moreover, from what you show here, you have 26 complete observations to which you are trying to fit a model with 7 predictors and a three-way interaction (of which you omit the lower-order terms, which is probably not the correct way to achieve what you want). This is a fairly complicated model (which you should get right in the imputation step, too) and you might want to discuss the research questions that you are trying to answer; perhaps in a separate thread.
1 like
Comment

Davia Downey

Join Date: Jul 2017
Posts: 131

15 Sep 2020, 14:56

Trying this again. You are correct, I did use force previously. I changed this line of code, but the result is still the same unfortunately. Here is each command I've run as well as the corresponding output. I am modeling the impact of minority population concentration, education, turnover, funding (dummy), policy creation (dummy), post-canterbury(dummy), and unemployment on the log of accession (i.e., new hiring activity). The untransformed data looks like this:

HTML Code:

sum $ylist $xlist

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
log_access~s |        144    9.937673    .6645318   8.783013   11.66515
sqdiff_min~y |         26    .4267783    .2059232   .0128739   .7321631
   pct_maori |         26    .1822573    .1052737   .0633117   .4508969
deprivatio~x |         42    7.071929    8.289129        .21      34.79
pctbachelors |         26    46.00424    4.241157   37.07946   56.49868
-------------+---------------------------------------------------------
  pctmasters |         26    7.814456    2.111405   4.385965   12.37723
pctdoctorate |         26    2.382557    1.310756   .9233611   5.045782
 pctturnover |        144     16.3684    2.021255       13.2     20.575
unemployme~d |        154    5.318182    1.585121        2.5        8.8
 policyscore |        168    .5833333    .4944805          0          1
-------------+---------------------------------------------------------
     funding |        168    .0714286    .2583093          0          1
postcanter~y |        168    .6666667    .4728138          0          1

Then I run

Code:

mi query

Then,

Code:

mi set flong

Then,

Code:

misstable patterns $ylist $xlist

HTML Code:

misstable patterns $ylist $xlist

             Missing-value patterns
               (1 means complete)

              |   Pattern
    Percent   |  1  2  3  4    5  6  7  8    9
  ------------+--------------------------------
       14%    |  1  1  1  1    1  1  1  1    1
              |
       64     |  1  1  1  0    0  0  0  0    0
       11     |  1  0  0  0    0  0  0  0    0
        7     |  0  1  1  1    0  0  0  0    0
        1     |  0  0  0  1    0  0  0  0    0
        1     |  1  0  0  1    0  0  0  0    0
        1     |  1  0  0  1    1  1  1  1    1
  ------------+--------------------------------
      100%    |

  Variables are  (1) unemploymentrate15_oecd  (2) log_accessions  (3) pctturnover  (4) deprivationindex  (5) pct_maori
                 (6) pctbachelors  (7) pctdoctorate  (8) pctmasters  (9) sqdiff_minority

Then I impute the following variables: log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd

Code:

mi register imputed log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd

Then I registered three regular variables which are not included in the analysis, but this seems to be common practice for doing a MI analysis:

Code:

mi register regular employmentrate15_oecd mean_earn_nzstat

Next, I run mi impute

Code:

mi impute chained (regress) log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd,  add(20) rseed(1234)

HTML Code:

Performing chained iterations ...

Multivariate imputation                     Imputations =       20
Chained equations                                 added =       20
Imputed: m=1 through m=20                       updated =        0

Initialization: monotone                     Iterations =      200
                                                burn-in =       10

    log_accessions: linear regression
     diff_minority: linear regression
         pct_maori: linear regression
    deprivationi~x: linear regression
      pctbachelors: linear regression
        pctmasters: linear regression
      pctdoctorate: linear regression
       pctturnover: linear regression
    unemployment~d: linear regression

------------------------------------------------------------------
                   |               Observations per m             
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
    log_accessions |        144           24        24 |       168
     diff_minority |         26          142       142 |       168
         pct_maori |         26          142       142 |       168
    deprivationi~x |         42          126       126 |       168
      pctbachelors |         26          142       142 |       168
        pctmasters |         26          142       142 |       168
      pctdoctorate |         26          142       142 |       168
       pctturnover |        144           24        24 |       168
    unemployment~d |        154           14        14 |       168
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

Then, I xtset the data

Code:

mi xtset $t $id

HTML Code:

 panel variable:  year (strongly balanced)
        time variable:  region, 1 to 14
                delta:  1 unit

After that, I finally run the Fixed Effects model:

Code:

mi estimate: xtreg $ylist $xlist, fe i(region)

HTML Code:

Multiple-imputation estimates                   Imputations       =         20
Fixed-effects (within) regression               Number of obs     =         26

Group variable: region                          Number of groups  =         13
                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2
                                                Average RVI       =     0.7426
                                                Largest FMI       =     0.8234
                                                Complete DF       =          3
DF adjustment:   Small sample                   DF:     min       =       0.53
                                                        avg       =       1.20
                                                        max       =       1.85
Model F test:       Equal FMI                   F(  10,    0.6)   =       0.71
Within VCE type: Conventional                   Prob > F          =     0.7661

-----------------------------------------------------------------------------------------
         log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
        sqdiff_minority |  -.3556701   3.698341    -0.10   0.934    -20.33893    19.62759
              pct_maori |  -4.850423   20.92281    -0.23   0.847    -155.6207    145.9198
       deprivationindex |   .0268486   .2832343     0.09   0.942    -5.610964    5.664661
           pctbachelors |   .0059091   .1647776     0.04   0.978     -2.83083    2.842649
             pctmasters |  -.0510626   .1601838    -0.32   0.782    -.7953342     .693209
           pctdoctorate |   .0419083   .1473924     0.28   0.805    -.6517255     .735542
            pctturnover |   .0778184   .3257464     0.24   0.872    -37.91117    38.06681
unemploymentrate15_oecd |  -.0230748   .1231354    -0.19   0.871    -.6599563    .6138067
              1.funding |          0  (omitted)
          1.policyscore |   .1958001   1.644004     0.12   0.930    -56.72888    57.12048
                        |
    funding#policyscore |
                   1 1  |   .1212798   .4652735     0.26   0.833    -4.064287    4.306847
                        |
         postcanterbury |          0  (omitted)
                  _cons |   9.519705   13.79675     0.69   0.646    -409.3752    428.4146
------------------------+----------------------------------------------------------------
                sigma_u |  2.2232374
                sigma_e |  .08667858
                    rho |  .99848228   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------
Note: sigma_u and sigma_e are combined in the original metric.

As you can see, the total number of imputations are correct, but the number of observations are not. Here's the summary of $ylist and $xlist

HTML Code:

 sum $ylist $xlist

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
log_access~s |      3,504    9.937929    .6682812   7.310234   11.82433
sqdiff_min~y |        546    .4267783    .2021095   .0128739   .7321631
   pct_maori |      3,386    .1877219    .1292799  -.3642823   .7171209
deprivatio~x |      3,402    7.159244     9.25423  -29.52997   42.83181
pctbachelors |      3,386    45.71936    5.349254    26.1667   66.19154
-------------+---------------------------------------------------------
  pctmasters |      3,386    7.848877    2.666734  -2.490291   18.43071
pctdoctorate |      3,386     2.34551    2.007263  -6.392637   10.67832
 pctturnover |      3,504    16.36014     2.02376   10.39309   24.29137
unemployme~d |      3,514     5.31252    1.582666   .7636836   9.752773
-------------+---------------------------------------------------------
   1.funding |      3,528    .0714286    .2575759          0          1
1.policysc~e |      3,528    .5833333    .4930765          0          1
             |
     funding#|
 policyscore |
        1 1  |      3,528    .0416667    .1998546          0          1
             |
postcanter~y |      3,528    .6666667    .4714713          0          1

Hopefully listing out of all of my steps this helps to find the problem. For what it's worth, if I run a regular pooled regression without using the mi command, everything works just fine.

Code:

reg $ylist $xlist, vce(robust)

HTML Code:

Linear regression                               Number of obs     =      3,384
                                                F(12, 3371)       =     325.61
                                                Prob > F          =     0.0000
                                                R-squared         =     0.5576
                                                Root MSE          =     .44543

-----------------------------------------------------------------------------------------
                        |               Robust
         log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
          diff_minority |   .4157945   .2257181     1.84   0.066    -.0267637    .8583526
              pct_maori |   1.004239   .4267461     2.35   0.019     .1675318    1.840947
       deprivationindex |   .0371159   .0012098    30.68   0.000     .0347439     .039488
           pctbachelors |  -.0122473   .0026935    -4.55   0.000    -.0175283   -.0069663
             pctmasters |    .092305   .0068401    13.49   0.000     .0788938    .1057162
           pctdoctorate |  -.0719911   .0059165   -12.17   0.000    -.0835914   -.0603908
            pctturnover |  -.0412018   .0057798    -7.13   0.000    -.0525342   -.0298695
unemploymentrate15_oecd |  -.0752735    .008427    -8.93   0.000    -.0917962   -.0587509
              1.funding |    .349902   .0354199     9.88   0.000     .2804555    .4193486
          1.policyscore |  -.1009608    .028314    -3.57   0.000    -.1564751   -.0454464
                        |
    funding#policyscore |
                   1 1  |   .0362795   .0450584     0.81   0.421     -.052065     .124624
                        |
         postcanterbury |   .0290124   .0302354     0.96   0.337    -.0302692    .0882939
                  _cons |   10.31721   .2529325    40.79   0.000     9.821296    10.81313
----------------------------------------------------------------------------------------

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#4

15 Sep 2020, 19:10

I'm not sure that this is causing your problem, but, while it is on rare occasions reasonable to have your time variable as your panel and your id as your "time" variable, I see nothing here that would have that make sense, so try switching them; also, why the "i(region)" as an option - what does that mean?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

15 Sep 2020, 19:59

Rich Goldstein The i(region) option is used to set the panel variable for an -xt- command. You can use it when you haven't done -xtset- (nor -tsset-) or you can use it to override the specification made in -xtset-. So, notwithstanding having set the time as the panel variable in -xtset-, the regression actually uses region as the panel.

It doesn't seem to be in the documentation for current Stata. I remember it from a very long time ago, and, I guess it still works. I've never used it myself.
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#6

15 Sep 2020, 20:59

You are using multiple imputation that assumes data are missing at random, and fixed effects regression which applies statistical inference to estimate population values based on a random sample. I don't know where your data come from, but I think a much better route for you would be to do something descriptive or qualitative to answer your research questions. Additionally, I do not think you have a large enough sample for fixed effects regression. It is pretty inefficient, so even if your data were perfect I don't think you would have enough power to detect an effect.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#7

15 Sep 2020, 22:01

Thanks for providing the complete output. Rich has already pointed to one potential problem with xtset-ting the data.

From a quick glance, I can see that you include the variable sqdiff_minority in your analysis model but not in your imputation model. The output from summarize suggests that this variable has many missing values, which be one cause of the technical problem that you are facing. If sqdiff_minority is the squared term for diff_minority, you must include both the lower-order term and the squared term in the imputation model (and, of course, in the analysis-model). Do not create the squared term after imputation, only.

Also, as I have mentioned earlier, if you are going to use interaction effects in your substantive model, you must somehow account for the interaction-effect in the imputation model. You also want to account for the nested (i.e., panel) structure of the data during imputation.

Unless you address these (and potentially more) points, I agree with Tom in that you might accidentally make things worse by using MI. We are happy to assist in getting things right as long as you keep providing relevant information.

After having edited this post several times now, I note that (a) I am still tired (it is pretty early here), (b) there seems to be a mix of technical and conceptual confusion. We should address both, the technical problems and the conceptual issues.

Last edited by daniel klein; 15 Sep 2020, 22:20.
2 likes
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#8

16 Sep 2020, 06:31

Clyde Schechter Thank you
Comment

Davia Downey

Join Date: Jul 2017
Posts: 131

16 Sep 2020, 07:51

I did go back and re-imput the sqdiff_minority variable. The same issues occurs whether the sqdiff_ and diff_minority variables are imputed prior to running the analysis. The same thing still happens when the xtset is reversed.

As for the data: The data comes from the NZStat which is the NZ Census. They only collect certain data in five-year increments, but the economic data which comes from the OECD provides certain data (economic) annually, so the data isn't MAR or MCAR, it's actually the full data on all regions in New Zealand for this 10 year period (maybe I understood your question wrong). What I am interested in is the effect of socio-economic variation on job accessions in the time period leading up to and post-Canterbury earthquakes. I've used the same modeling techniques for the other chapters for a book I'm writing so I don't think theortically my model is misspecified. I'm not sure if I need to impute the dummy variables (postcanterbury, funding and policyscore) as they don't have missing data, but I suppose I could try that if there's a substantive reason for it. Nonetheless, here's the results from making the adjustments as suggested above Rich Goldstein Clyde Schechter Tom Scott

Step 1: Summary Data

HTML Code:

sum $ylist $xlist

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
log_access~s |        144    9.937673    .6645318   8.783013   11.66515
sqdiff_min~y |         26    .4267783    .2059232   .0128739   .7321631
diff_minor~y |         26    .6250489    .1937416   .1134632   .8556653
   pct_maori |         26    .1822573    .1052737   .0633117   .4508969
deprivatio~x |         42    7.071929    8.289129        .21      34.79
-------------+---------------------------------------------------------
pctbachelors |         26    46.00424    4.241157   37.07946   56.49868
  pctmasters |         26    7.814456    2.111405   4.385965   12.37723
pctdoctorate |         26    2.382557    1.310756   .9233611   5.045782
 pctturnover |        144     16.3684    2.021255       13.2     20.575
unemployme~d |        154    5.318182    1.585121        2.5        8.8
-------------+---------------------------------------------------------
 policyscore |        168    .5833333    .4944805          0          1
     funding |        168    .0714286    .2583093          0          1
postcanter~y |        168    .6666667    .4728138          0          1

Step 2: MI Query

Code:

mi query

Step 3: MI Set

Code:

mi set flong

Step 4: Review Patterns of Missinginess

Code:

misstable patterns $ylist $xlist

HTML Code:

 Missing-value patterns
                (1 means complete)

              |   Pattern
    Percent   |  1  2  3  4    5  6  7  8    9 10
  ------------+-----------------------------------
       14%    |  1  1  1  1    1  1  1  1    1  1
              |
       64     |  1  1  1  0    0  0  0  0    0  0
       11     |  1  0  0  0    0  0  0  0    0  0
        7     |  0  1  1  1    0  0  0  0    0  0
        1     |  0  0  0  1    0  0  0  0    0  0
        1     |  1  0  0  1    0  0  0  0    0  0
        1     |  1  0  0  1    1  1  1  1    1  1
  ------------+-----------------------------------
      100%    |

  Variables are  (1) unemploymentrate15_oecd  (2) log_accessions  (3) pctturnover  (4) deprivationindex  (5) diff_minority
                 (6) pct_maori  (7) pctbachelors  (8) pctdoctorate  (9) pctmasters  (10) sqdiff_minority

Step 5: Register Variables

Code:

mi register imputed log_accessions  sqdiff_minority diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd

Step 6: Register Regular Variables

Code:

  
 mi register regular employmentrate15_oecd mean_earn_nzstat

Step 7: Impute Variables

Code:

mi impute chained (regress) log_accessions  sqdiff_minority diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd,  add(20) rseed(1234)

HTML Code:

Performing chained iterations ...

Multivariate imputation                     Imputations =       20
Chained equations                                 added =       20
Imputed: m=1 through m=20                       updated =        0

Initialization: monotone                     Iterations =      200
                                                burn-in =       10

    log_accessions: linear regression
    sqdiff_minor~y: linear regression
     diff_minority: linear regression
         pct_maori: linear regression
    deprivationi~x: linear regression
      pctbachelors: linear regression
        pctmasters: linear regression
      pctdoctorate: linear regression
       pctturnover: linear regression
    unemployment~d: linear regression

------------------------------------------------------------------
                   |               Observations per m             
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
    log_accessions |        144           24        24 |       168
    sqdiff_minor~y |         26          142       142 |       168
     diff_minority |         26          142       142 |       168
         pct_maori |         26          142       142 |       168
    deprivationi~x |         42          126       126 |       168
      pctbachelors |         26          142       142 |       168
        pctmasters |         26          142       142 |       168
      pctdoctorate |         26          142       142 |       168
       pctturnover |        144           24        24 |       168
    unemployment~d |        154           14        14 |       168
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

Step 8: XTSET Data

HTML Code:

mi xtset $id $t
       panel variable:  region (strongly balanced)
        time variable:  year, 2006 to 2017
                delta:  1 unit

Step 9: Run Fixed Effects

Code:

mi estimate: xtreg $ylist $xlist, fe i(region)

HTML Code:

mi estimate: xtreg $ylist $xlist, fe i(region)

Multiple-imputation estimates                   Imputations       =         20
Fixed-effects (within) regression               Number of obs     =        168

Group variable: region                          Number of groups  =         14
                                                Obs per group:
                                                              min =         12
                                                              avg =       12.0
                                                              max =         12
                                                Average RVI       =     2.3586
                                                Largest FMI       =     0.8141
                                                Complete DF       =        143
DF adjustment:   Small sample                   DF:     min       =      14.82
                                                        avg       =      22.37
                                                        max       =      28.45
Model F test:       Equal FMI                   F(  11,   92.1)   =       0.83
Within VCE type: Conventional                   Prob > F          =     0.6095

-----------------------------------------------------------------------------------------
         log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
        sqdiff_minority |   .7336144   .6594951     1.11   0.277    -.6307881    2.098017
          diff_minority |  -1.271836   1.449093    -0.88   0.387    -4.238059    1.694387
              pct_maori |  -1.066012   2.318821    -0.46   0.650    -5.853775    3.721751
       deprivationindex |   .0141208   .0067174     2.10   0.047     .0002283    .0280133
           pctbachelors |  -.0018677   .0136014    -0.14   0.892    -.0301815    .0264461
             pctmasters |   .0190938   .0323527     0.59   0.561    -.0480029    .0861905
           pctdoctorate |  -.0148487   .0301435    -0.49   0.627    -.0774679    .0477705
            pctturnover |  -.0119336   .0536681    -0.22   0.827    -.1264454    .1025782
unemploymentrate15_oecd |   -.037254   .0574626    -0.65   0.525    -.1582754    .0837674
            policyscore |    -.05017   .1165506    -0.43   0.671     -.290348    .1900079
                funding |          0  (omitted)
         postcanterbury |   .0505546   .1208875     0.42   0.679    -.1969391    .2980482
                  _cons |   10.87243    1.65603     6.57   0.000      7.42648    14.31839
------------------------+----------------------------------------------------------------
                sigma_u |  .50411314
                sigma_e |  .22240221
                    rho |  .83707538   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------
Note: sigma_u and sigma_e are combined in the original metric.

Step 10: Summary of MI Data

HTML Code:

sum $ylist $xlist

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
log_access~s |      3,504     9.93955      .66068   8.123594   12.27341
sqdiff_min~y |      3,386    .4261969    .2421935   -.470037   1.374426
diff_minor~y |      3,386    .6245527    .2326009   -.279737   1.573217
   pct_maori |      3,386    .1850628    .1263405  -.2826185   .6675549
deprivatio~x |      3,402    7.222229    8.785161  -22.11049   42.14102
-------------+---------------------------------------------------------
pctbachelors |      3,386     45.6051    5.527129   24.38922   65.56276
  pctmasters |      3,386    7.704817    2.725792  -3.818673   18.96408
pctdoctorate |      3,386     2.32867     2.03177  -5.680584   11.46432
 pctturnover |      3,504    16.36587    2.016278   11.03838   21.68333
unemployme~d |      3,514    5.321049    1.583113   .5403764   9.620695
-------------+---------------------------------------------------------
 policyscore |      3,528    .5833333    .4930765          0          1
     funding |      3,528    .0714286    .2575759          0          1
postcanter~y |      3,528    .6666667    .4714713          0          1

Still stumped. I have used this technique before and followed these same steps and have not had a problem with all observations being utilized in the estimation process. Any additional suggestions are welcome.

Comment

daniel klein

Join Date: Mar 2014

Posts: 3859
#10

16 Sep 2020, 08:02

I am confused. Perhaps there is a(n other) misunderstanding about multiple imputation. The output says that 168 observations are used. That is all the observations. How many observations are you expecting to be used?

Edit/Add (hopefully more helpful)

mi impute will create m (here: 20) complete datasets, meaning 20 datasets with 168 observations, each. Any analyses will then be carried out on each of the 20 completed datasets, using 168 observations. Results are combined and displayed.

The way Stata stores the information for the completed datasets depends on the mi style. Because you choose flong, the completed datasets are added as extra observations. Thus, you will end up with, e.g., 3,504 observations for log_accessions; 144 fully observed/complete cases/observations + 20 datasets * 168 completed cases.

Please note that you are still not done. You have not taken the nested (panel) structure into account during imputation. One way of doing this is reshape-ing the original dataset into a wide format. That way, e.g., the pct_maori of a region in 2006 will be used to impute that region's pct_maori in 2007, etc. The way you have set up the imputation model now totally ignores the fact that the variables within one region are probably strongly correlated over time -- which is the basis for within/FE regression that you are about to carry out. You will have to fix that.

Also, the FMI (fraction of missing value information) of 0.81 suggests that you should probably go for 80 or more completed datasets instead of just 20.

Edit 2: Almost forgot ... Do include all variables that you use in the analyses in the imputation model. The variables that do not have missing data (you might want to register these as regular but that is not mandatory) go to the right-hand side of the equals sign in

Code:

mi impute chained ... = complete_varlist , add(80)

Disclaimer: There might be more issues to address.

Last edited by daniel klein; 16 Sep 2020, 08:39.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#11

16 Sep 2020, 08:37

I might have more to say/suggest.

Originally posted by Davia Downey View Post

As for the data: [...] They only collect certain data in five-year increments,

If the data is not collected in certain years, those missing values definitely qualify as MCAR. But: If this data is not collected for any of the regions, it might not be desirable to impute it. Technically, you could probably do that but even with a lot of trust in MI, this would appear to get close to "making up data". I would argue that imputed data should be based on (at least partly) observed correlations. I might not have fully understood the data collection process but I would suggest sticking with the years where there are at least a couple of observed values.

Originally posted by Davia Downey View Post

I'm not sure if I need to impute the dummy variables (postcanterbury, funding and policyscore) as they don't have missing data,

If there is no missing data, there is nothing to impute. As I have indicated in my earlier post, you should include these variables as "independent" predictors in the impution model, anyway. If you are still interested in interaction effects, include those in the model, too (there are different ways to do this, so get back if you are not sure how to).
1 like
Comment
Davia Downey

Join Date: Jul 2017

Posts: 131
#12

16 Sep 2020, 10:12

daniel klein That is more helpful. So in essence, I need to first read in the data, reshape to wide, run mi set wide, then register imputed (missing) and regular variables, then impute chained all of the variables including interaction terms, and finally run the Fixed and Random effects. Is this correct?

Interestingly I re-ran this in SAS and it worked just fine, but I'll do as you suggest and report back.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#13

16 Sep 2020, 12:05

Originally posted by Davia Downey View Post

Interestingly I re-ran this in SAS and it worked just fine

I will start with this one. Do not get fooled by not seeing an error message. Just because SAS is not throwing errors, it does not mean that it magically solves all the issues we have touched upon in this thread. As you can see, Stata gives you results, too. Whether those results are trustworthy is another question and it really is up to you (and the scientific community) to decide. I will get back to this at the end of my post.

Originally posted by Davia Downey View Post

So in essence, I need to first read in the data, reshape to wide

That is one possibility to account for the within-region correlation over time and it is the one that I have mostly used. It is easy to implement and uses all available information. The downside is that you will create many variables; potentially too many given the comparatively few observations. Including interaction-terms poses additional problems, because you would probably want to include the lower-order terms and interaction-terms for all years.

An alternative approach would keep the dataset in the long-format. To account for within-region correlation, you could create lagged (and, perhaps, lead) variables and include them in the respective (conditional) imputation models. Each variable would then have its own imputation models. This way, you could decide how many lagged (and/or lead) values you want to include in the respective imputation model (as opposed to using all lags and leads in the wide-format). While this approach is more flexible than the wide-format-approach, it is obviously also more cumbersome to implement. Also, excluding some (or most) of the lagged/lead variables bears the question of how compatible the imputation model will be for the fixed-effects estimator, which uses (only) the/all within-region variation.

Originally posted by Davia Downey View Post

[...] I need to [...] run mi set wide

No. Well, you can do so but, in general, the style in which you store your imputations really does not matter. It does matter when you hit the limits of variables in your flavor of Stata (2,047 in Stata IC). I tend to use flong but the reason is not technical; I merely feel this is the style I can best wrap my head around.

Originally posted by Davia Downey View Post

[...] I need to [...] register imputed (missing) and regular variables

As stated, registering imputed variables is mandatory; registering regular variables does no harm and might even help to avoid mistakes.

Originally posted by Davia Downey View Post

[...] I need to [...] impute chained all of the variables including interaction terms

Yes. In general, you have to account for all correlations between the variables you are interested in. Omitting the interaction terms from the imputation model will bias the coefficients for those interaction terms in the substantive model towards zero. How to best include the interactions is, to the best of my knowledge, still an ongoing discussion. If none of the involved variables has missing values, then you can just include the respective interactions (along with the lower-order terms, of course). With categorical indicators, you could also try to run the imputation model for separate groups (indicated by the categorical predictor). That will allow you to assess interactions of any variable with that categorical predictor. Given the very low number of observations, I do not think this is an option here.

Originally posted by Davia Downey View Post

[...] I need to [...] run the Fixed and Random effects.

Fittingly, this last step brings us back to my first answer about SAS (apparently) not throwing errors and what that might or might not imply about the quality of the analyses/results.

Tom has pointed to some drawbacks of the FE-estimator in #6. I will just leave it at this summary: you essentially have 14 regions (cases) that have been observed over 12 years (but potentially only about two or three times because of the five-year cycle of the Census). You will need to decide how comfortable you are modeling this data within the framework of different parametric methods that often rely on asymptotics.

Last edited by daniel klein; 16 Sep 2020, 12:08.
1 like
Comment
Davia Downey

Join Date: Jul 2017

Posts: 131
#14

16 Sep 2020, 14:23

daniel klein I really am only running the FE and RE to determine the Hausman test. Auckland is the main powerhouse in terms of economic activity (it's the largest), but I am interested more generally in the impacts of disasters on job accessions and don't want to further restrict the data to Canterbury along (additionally, we know from the literature the disasters have secondary effects in other areas).

Thanks again for your comments and for providing input to working around this issue. It would be nice if my SES variables were collected with more regularity (i.e., minority populations, deprivation, and education---but I have looked and looked and even made inquiries about where this can be found and have come up with nothing. I really wish there was an American Community Survey alternative--I've used this in other research to fill in the years between the decennial census in the US which would make this entire estimation process much less frustrating.

Last edited by Davia Downey; 16 Sep 2020, 14:28.
Comment
Mohammad Mansour

Join Date: Jan 2021

Posts: 21
#15

23 Jan 2021, 15:14

Hello, I am trying to use collapse with MI data. I am imputing categorical variables (dependent variable is a count). I am trying to aggregate the data across all five imputations, but collapse wont work. Any idea on hoe I can perform collapse manually using mi xeq? I want to aggregate data by time and a stateID and by time alone.
Comment

Announcement

MI Impute

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment