Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MI Impute

    Posting again as I did not receive any helpful suggestions from my previous post. I have imputed and registered data for a dataset with a lot of missing data. When I run my mi estimate: regression (or mi estimate: xtreg) commands, the STATA output on provides results for a subset of my data, not the newly imputed data. The commands and results are below. Any assistance is appreciated.

    Code:
     mi query
    mi set mlong
    quietly misstable summarize log_accessions diff_minority deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd, generate (miss_)
    describe miss_*
    mi register imputed log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd
    mi register regular employmentrate15_oecd mean_earn_nzstatmi impute mvn log_accessions pct_maori pctbachelors pctmasters pctdoctorate deprivationindex  pctturnover unemploymentrate15_oecd, add(20) rseed (1234)
    mi estimate, saving (olsest2): reg $ylist diff_minority deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd policyscore##funding##postcanterbury
    HTML Code:
     mi estimate, saving (ols): reg $ylist diff_minority deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd policyscore#funding#postcanterbury
    
    Multiple-imputation estimates                   Imputations       =         20
    Linear regression                               Number of obs     =         26
                                                    Average RVI       =     0.1554
                                                    Largest FMI       =     0.4478
                                                    Complete DF       =         15
    DF adjustment:   Small sample                   DF:     min       =       7.57
                                                            avg       =      11.25
                                                            max       =      13.28
    Model F test:       Equal FMI                   F(  10,   12.7)   =       4.20
    Within VCE type:          OLS                   Prob > F          =     0.0096
    
    ----------------------------------------------------------------------------------------------------
                        log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
                         diff_minority |  -.3833525   .8673539    -0.44   0.666    -2.253172    1.486467
                      deprivationindex |   .0433965   .0156211     2.78   0.015     .0096986    .0770944
                          pctbachelors |  -.0136013   .0548682    -0.25   0.811    -.1413866     .114184
                            pctmasters |   .1554737   .1071908     1.45   0.175    -.0809909    .3919383
                          pctdoctorate |  -.1303299   .1012635    -1.29   0.225    -.3540136    .0933537
                           pctturnover |  -.0194386   .0758147    -0.26   0.802    -.1843108    .1454336
               unemploymentrate15_oecd |  -.0868002   .1196383    -0.73   0.483    -.3495745    .1759741
                                       |
    policyscore#funding#postcanterbury |
                                0 0 1  |          0  (omitted)
                                0 1 0  |   .6229431   .4451461     1.40   0.185    -.3374752    1.583361
                                0 1 1  |          0  (omitted)
                                1 0 0  |          0  (empty)
                                1 0 1  |   .0101002   .4038377     0.03   0.980    -.8671615     .887362
                                1 1 0  |          0  (empty)
                                1 1 1  |   .6662499   .5448059     1.22   0.246     -.525413    1.857913
                                       |
                                 _cons |   10.27254   2.496735     4.11   0.004     4.482924    16.06216
    ----------------------------------------------------------------------------------------------------
    Code:
    . mi describe
    
      Style:  mlong
              last mi update 15sep2020 10:44:27, approximately 1 minute ago
    
      Obs.:   complete           24
              incomplete        144  (M = 20 imputations)
              ---------------------
              total             168
    
      Vars.:  imputed:  9; log_accessions(24) diff_minority(142) deprivationindex(126) pctbachelors(142) pctmasters(142)
                        pctdoctorate(142) pctturnover(24) unemploymentrate15_oecd(14) pct_maori(142)
    
              passive:  0
    
              regular:  2; employmentrate15_oecd mean_earn_nzstat
    
              system:   3; _mi_m _mi_id _mi_miss
    
             (there are 67 unregistered variables)
    HTML Code:
     . sum $xlist $ylist
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    diff_minor~y |      2,906    .6226473    .2442557  -.2052834   1.466744
    deprivatio~x |      2,922    6.849442    9.205344  -24.29531   42.14392
    pctbachelors |      2,906    45.54763    5.907764   20.81843    69.3688
      pctmasters |      2,906    7.798592    2.991676  -4.717332   19.29404
    pctdoctorate |      2,906    2.407225    2.249541  -6.408797   11.34198
    -------------+---------------------------------------------------------
     pctturnover |      3,024    16.32659    2.005034    10.6232    22.9954
    unemployme~d |      3,034    5.340202     1.58543  -.0036189   10.17787
     policyscore |      3,048    .5964567    .4906884          0          1
         funding |      3,048    .0695538    .2544353          0          1
    postcanter~y |      3,048    .6929134    .4613613          0          1
    -------------+---------------------------------------------------------
    log_access~s |      3,024    9.936792    .6676036   7.808344   13.75531

  • #2
    You are probably referring to this post which (together with this post of yours) suggest that the problem is imputing the data. Show us your imputation steps including both commands and output.

    My guess is that you are using the force option (which you should never ever use) during the imputation, leading to missing imputed values.

    Moreover, from what you show here, you have 26 complete observations to which you are trying to fit a model with 7 predictors and a three-way interaction (of which you omit the lower-order terms, which is probably not the correct way to achieve what you want). This is a fairly complicated model (which you should get right in the imputation step, too) and you might want to discuss the research questions that you are trying to answer; perhaps in a separate thread.

    Comment


    • #3
      Trying this again. You are correct, I did use force previously. I changed this line of code, but the result is still the same unfortunately. Here is each command I've run as well as the corresponding output. I am modeling the impact of minority population concentration, education, turnover, funding (dummy), policy creation (dummy), post-canterbury(dummy), and unemployment on the log of accession (i.e., new hiring activity). The untransformed data looks like this:
      HTML Code:
      sum $ylist $xlist
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
      log_access~s |        144    9.937673    .6645318   8.783013   11.66515
      sqdiff_min~y |         26    .4267783    .2059232   .0128739   .7321631
         pct_maori |         26    .1822573    .1052737   .0633117   .4508969
      deprivatio~x |         42    7.071929    8.289129        .21      34.79
      pctbachelors |         26    46.00424    4.241157   37.07946   56.49868
      -------------+---------------------------------------------------------
        pctmasters |         26    7.814456    2.111405   4.385965   12.37723
      pctdoctorate |         26    2.382557    1.310756   .9233611   5.045782
       pctturnover |        144     16.3684    2.021255       13.2     20.575
      unemployme~d |        154    5.318182    1.585121        2.5        8.8
       policyscore |        168    .5833333    .4944805          0          1
      -------------+---------------------------------------------------------
           funding |        168    .0714286    .2583093          0          1
      postcanter~y |        168    .6666667    .4728138          0          1
      Then I run
      Code:
      mi query
      Then,
      Code:
      mi set flong
      Then,
      Code:
      misstable patterns $ylist $xlist
      HTML Code:
      misstable patterns $ylist $xlist
      
                   Missing-value patterns
                     (1 means complete)
      
                    |   Pattern
          Percent   |  1  2  3  4    5  6  7  8    9
        ------------+--------------------------------
             14%    |  1  1  1  1    1  1  1  1    1
                    |
             64     |  1  1  1  0    0  0  0  0    0
             11     |  1  0  0  0    0  0  0  0    0
              7     |  0  1  1  1    0  0  0  0    0
              1     |  0  0  0  1    0  0  0  0    0
              1     |  1  0  0  1    0  0  0  0    0
              1     |  1  0  0  1    1  1  1  1    1
        ------------+--------------------------------
            100%    |
      
        Variables are  (1) unemploymentrate15_oecd  (2) log_accessions  (3) pctturnover  (4) deprivationindex  (5) pct_maori
                       (6) pctbachelors  (7) pctdoctorate  (8) pctmasters  (9) sqdiff_minority
      Then I impute the following variables: log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd
      Code:
      mi register imputed log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd
      Then I registered three regular variables which are not included in the analysis, but this seems to be common practice for doing a MI analysis:
      Code:
      mi register regular employmentrate15_oecd mean_earn_nzstat
      Next, I run mi impute
      Code:
      mi impute chained (regress) log_accessions diff_minority pct_maori deprivationindex pctbachelors pctmasters pctdoctorate pctturnover unemploymentrate15_oecd,  add(20) rseed(1234)
      HTML Code:
      Performing chained iterations ...
      
      Multivariate imputation                     Imputations =       20
      Chained equations                                 added =       20
      Imputed: m=1 through m=20                       updated =        0
      
      Initialization: monotone                     Iterations =      200
                                                      burn-in =       10
      
          log_accessions: linear regression
           diff_minority: linear regression
               pct_maori: linear regression
          deprivationi~x: linear regression
            pctbachelors: linear regression
              pctmasters: linear regression
            pctdoctorate: linear regression
             pctturnover: linear regression
          unemployment~d: linear regression
      
      ------------------------------------------------------------------
                         |               Observations per m             
                         |----------------------------------------------
                Variable |   Complete   Incomplete   Imputed |     Total
      -------------------+-----------------------------------+----------
          log_accessions |        144           24        24 |       168
           diff_minority |         26          142       142 |       168
               pct_maori |         26          142       142 |       168
          deprivationi~x |         42          126       126 |       168
            pctbachelors |         26          142       142 |       168
              pctmasters |         26          142       142 |       168
            pctdoctorate |         26          142       142 |       168
             pctturnover |        144           24        24 |       168
          unemployment~d |        154           14        14 |       168
      ------------------------------------------------------------------
      (complete + incomplete = total; imputed is the minimum across m
       of the number of filled-in observations.)
      Then, I xtset the data
      Code:
      mi xtset $t $id
      HTML Code:
       panel variable:  year (strongly balanced)
              time variable:  region, 1 to 14
                      delta:  1 unit
      After that, I finally run the Fixed Effects model:
      Code:
      mi estimate: xtreg $ylist $xlist, fe i(region)
      HTML Code:
      Multiple-imputation estimates                   Imputations       =         20
      Fixed-effects (within) regression               Number of obs     =         26
      
      Group variable: region                          Number of groups  =         13
                                                      Obs per group:
                                                                    min =          2
                                                                    avg =        2.0
                                                                    max =          2
                                                      Average RVI       =     0.7426
                                                      Largest FMI       =     0.8234
                                                      Complete DF       =          3
      DF adjustment:   Small sample                   DF:     min       =       0.53
                                                              avg       =       1.20
                                                              max       =       1.85
      Model F test:       Equal FMI                   F(  10,    0.6)   =       0.71
      Within VCE type: Conventional                   Prob > F          =     0.7661
      
      -----------------------------------------------------------------------------------------
               log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      ------------------------+----------------------------------------------------------------
              sqdiff_minority |  -.3556701   3.698341    -0.10   0.934    -20.33893    19.62759
                    pct_maori |  -4.850423   20.92281    -0.23   0.847    -155.6207    145.9198
             deprivationindex |   .0268486   .2832343     0.09   0.942    -5.610964    5.664661
                 pctbachelors |   .0059091   .1647776     0.04   0.978     -2.83083    2.842649
                   pctmasters |  -.0510626   .1601838    -0.32   0.782    -.7953342     .693209
                 pctdoctorate |   .0419083   .1473924     0.28   0.805    -.6517255     .735542
                  pctturnover |   .0778184   .3257464     0.24   0.872    -37.91117    38.06681
      unemploymentrate15_oecd |  -.0230748   .1231354    -0.19   0.871    -.6599563    .6138067
                    1.funding |          0  (omitted)
                1.policyscore |   .1958001   1.644004     0.12   0.930    -56.72888    57.12048
                              |
          funding#policyscore |
                         1 1  |   .1212798   .4652735     0.26   0.833    -4.064287    4.306847
                              |
               postcanterbury |          0  (omitted)
                        _cons |   9.519705   13.79675     0.69   0.646    -409.3752    428.4146
      ------------------------+----------------------------------------------------------------
                      sigma_u |  2.2232374
                      sigma_e |  .08667858
                          rho |  .99848228   (fraction of variance due to u_i)
      -----------------------------------------------------------------------------------------
      Note: sigma_u and sigma_e are combined in the original metric.
      As you can see, the total number of imputations are correct, but the number of observations are not. Here's the summary of $ylist and $xlist
      HTML Code:
       sum $ylist $xlist
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
      log_access~s |      3,504    9.937929    .6682812   7.310234   11.82433
      sqdiff_min~y |        546    .4267783    .2021095   .0128739   .7321631
         pct_maori |      3,386    .1877219    .1292799  -.3642823   .7171209
      deprivatio~x |      3,402    7.159244     9.25423  -29.52997   42.83181
      pctbachelors |      3,386    45.71936    5.349254    26.1667   66.19154
      -------------+---------------------------------------------------------
        pctmasters |      3,386    7.848877    2.666734  -2.490291   18.43071
      pctdoctorate |      3,386     2.34551    2.007263  -6.392637   10.67832
       pctturnover |      3,504    16.36014     2.02376   10.39309   24.29137
      unemployme~d |      3,514     5.31252    1.582666   .7636836   9.752773
      -------------+---------------------------------------------------------
         1.funding |      3,528    .0714286    .2575759          0          1
      1.policysc~e |      3,528    .5833333    .4930765          0          1
                   |
           funding#|
       policyscore |
              1 1  |      3,528    .0416667    .1998546          0          1
                   |
      postcanter~y |      3,528    .6666667    .4714713          0          1
      Hopefully listing out of all of my steps this helps to find the problem. For what it's worth, if I run a regular pooled regression without using the mi command, everything works just fine.
      Code:
      reg $ylist $xlist, vce(robust)
      HTML Code:
      Linear regression                               Number of obs     =      3,384
                                                      F(12, 3371)       =     325.61
                                                      Prob > F          =     0.0000
                                                      R-squared         =     0.5576
                                                      Root MSE          =     .44543
      
      -----------------------------------------------------------------------------------------
                              |               Robust
               log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      ------------------------+----------------------------------------------------------------
                diff_minority |   .4157945   .2257181     1.84   0.066    -.0267637    .8583526
                    pct_maori |   1.004239   .4267461     2.35   0.019     .1675318    1.840947
             deprivationindex |   .0371159   .0012098    30.68   0.000     .0347439     .039488
                 pctbachelors |  -.0122473   .0026935    -4.55   0.000    -.0175283   -.0069663
                   pctmasters |    .092305   .0068401    13.49   0.000     .0788938    .1057162
                 pctdoctorate |  -.0719911   .0059165   -12.17   0.000    -.0835914   -.0603908
                  pctturnover |  -.0412018   .0057798    -7.13   0.000    -.0525342   -.0298695
      unemploymentrate15_oecd |  -.0752735    .008427    -8.93   0.000    -.0917962   -.0587509
                    1.funding |    .349902   .0354199     9.88   0.000     .2804555    .4193486
                1.policyscore |  -.1009608    .028314    -3.57   0.000    -.1564751   -.0454464
                              |
          funding#policyscore |
                         1 1  |   .0362795   .0450584     0.81   0.421     -.052065     .124624
                              |
               postcanterbury |   .0290124   .0302354     0.96   0.337    -.0302692    .0882939
                        _cons |   10.31721   .2529325    40.79   0.000     9.821296    10.81313
      ----------------------------------------------------------------------------------------

      Comment


      • #4
        I'm not sure that this is causing your problem, but, while it is on rare occasions reasonable to have your time variable as your panel and your id as your "time" variable, I see nothing here that would have that make sense, so try switching them; also, why the "i(region)" as an option - what does that mean?

        Comment


        • #5
          Rich Goldstein The i(region) option is used to set the panel variable for an -xt- command. You can use it when you haven't done -xtset- (nor -tsset-) or you can use it to override the specification made in -xtset-. So, notwithstanding having set the time as the panel variable in -xtset-, the regression actually uses region as the panel.

          It doesn't seem to be in the documentation for current Stata. I remember it from a very long time ago, and, I guess it still works. I've never used it myself.

          Comment


          • #6
            You are using multiple imputation that assumes data are missing at random, and fixed effects regression which applies statistical inference to estimate population values based on a random sample. I don't know where your data come from, but I think a much better route for you would be to do something descriptive or qualitative to answer your research questions. Additionally, I do not think you have a large enough sample for fixed effects regression. It is pretty inefficient, so even if your data were perfect I don't think you would have enough power to detect an effect.

            Comment


            • #7
              Thanks for providing the complete output. Rich has already pointed to one potential problem with xtset-ting the data.

              From a quick glance, I can see that you include the variable sqdiff_minority in your analysis model but not in your imputation model. The output from summarize suggests that this variable has many missing values, which be one cause of the technical problem that you are facing. If sqdiff_minority is the squared term for diff_minority, you must include both the lower-order term and the squared term in the imputation model (and, of course, in the analysis-model). Do not create the squared term after imputation, only.

              Also, as I have mentioned earlier, if you are going to use interaction effects in your substantive model, you must somehow account for the interaction-effect in the imputation model. You also want to account for the nested (i.e., panel) structure of the data during imputation.

              Unless you address these (and potentially more) points, I agree with Tom in that you might accidentally make things worse by using MI. We are happy to assist in getting things right as long as you keep providing relevant information.


              After having edited this post several times now, I note that (a) I am still tired (it is pretty early here), (b) there seems to be a mix of technical and conceptual confusion. We should address both, the technical problems and the conceptual issues.
              Last edited by daniel klein; 15 Sep 2020, 22:20.

              Comment


              • #8
                Clyde Schechter Thank you

                Comment


                • #9
                  I did go back and re-imput the sqdiff_minority variable. The same issues occurs whether the sqdiff_ and diff_minority variables are imputed prior to running the analysis. The same thing still happens when the xtset is reversed.

                  As for the data: The data comes from the NZStat which is the NZ Census. They only collect certain data in five-year increments, but the economic data which comes from the OECD provides certain data (economic) annually, so the data isn't MAR or MCAR, it's actually the full data on all regions in New Zealand for this 10 year period (maybe I understood your question wrong). What I am interested in is the effect of socio-economic variation on job accessions in the time period leading up to and post-Canterbury earthquakes. I've used the same modeling techniques for the other chapters for a book I'm writing so I don't think theortically my model is misspecified. I'm not sure if I need to impute the dummy variables (postcanterbury, funding and policyscore) as they don't have missing data, but I suppose I could try that if there's a substantive reason for it. Nonetheless, here's the results from making the adjustments as suggested above Rich Goldstein Clyde Schechter Tom Scott

                  Step 1: Summary Data
                  HTML Code:
                  sum $ylist $xlist
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                  log_access~s |        144    9.937673    .6645318   8.783013   11.66515
                  sqdiff_min~y |         26    .4267783    .2059232   .0128739   .7321631
                  diff_minor~y |         26    .6250489    .1937416   .1134632   .8556653
                     pct_maori |         26    .1822573    .1052737   .0633117   .4508969
                  deprivatio~x |         42    7.071929    8.289129        .21      34.79
                  -------------+---------------------------------------------------------
                  pctbachelors |         26    46.00424    4.241157   37.07946   56.49868
                    pctmasters |         26    7.814456    2.111405   4.385965   12.37723
                  pctdoctorate |         26    2.382557    1.310756   .9233611   5.045782
                   pctturnover |        144     16.3684    2.021255       13.2     20.575
                  unemployme~d |        154    5.318182    1.585121        2.5        8.8
                  -------------+---------------------------------------------------------
                   policyscore |        168    .5833333    .4944805          0          1
                       funding |        168    .0714286    .2583093          0          1
                  postcanter~y |        168    .6666667    .4728138          0          1
                  Step 2: MI Query
                  Code:
                  mi query
                  Step 3: MI Set
                  Code:
                  mi set flong
                  Step 4: Review Patterns of Missinginess
                  Code:
                  misstable patterns $ylist $xlist
                  HTML Code:
                   Missing-value patterns
                                  (1 means complete)
                  
                                |   Pattern
                      Percent   |  1  2  3  4    5  6  7  8    9 10
                    ------------+-----------------------------------
                         14%    |  1  1  1  1    1  1  1  1    1  1
                                |
                         64     |  1  1  1  0    0  0  0  0    0  0
                         11     |  1  0  0  0    0  0  0  0    0  0
                          7     |  0  1  1  1    0  0  0  0    0  0
                          1     |  0  0  0  1    0  0  0  0    0  0
                          1     |  1  0  0  1    0  0  0  0    0  0
                          1     |  1  0  0  1    1  1  1  1    1  1
                    ------------+-----------------------------------
                        100%    |
                  
                    Variables are  (1) unemploymentrate15_oecd  (2) log_accessions  (3) pctturnover  (4) deprivationindex  (5) diff_minority
                                   (6) pct_maori  (7) pctbachelors  (8) pctdoctorate  (9) pctmasters  (10) sqdiff_minority
                  Step 5: Register Variables
                  Code:
                  mi register imputed log_accessions  sqdiff_minority diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd
                  Step 6: Register Regular Variables
                  Code:
                    
                   mi register regular employmentrate15_oecd mean_earn_nzstat
                  Step 7: Impute Variables
                  Code:
                  mi impute chained (regress) log_accessions  sqdiff_minority diff_minority pct_maori deprivationindex pctbachelors pctmasters  pctdoctorate pctturnover unemploymentrate15_oecd,  add(20) rseed(1234)
                  HTML Code:
                  Performing chained iterations ...
                  
                  Multivariate imputation                     Imputations =       20
                  Chained equations                                 added =       20
                  Imputed: m=1 through m=20                       updated =        0
                  
                  Initialization: monotone                     Iterations =      200
                                                                  burn-in =       10
                  
                      log_accessions: linear regression
                      sqdiff_minor~y: linear regression
                       diff_minority: linear regression
                           pct_maori: linear regression
                      deprivationi~x: linear regression
                        pctbachelors: linear regression
                          pctmasters: linear regression
                        pctdoctorate: linear regression
                         pctturnover: linear regression
                      unemployment~d: linear regression
                  
                  ------------------------------------------------------------------
                                     |               Observations per m             
                                     |----------------------------------------------
                            Variable |   Complete   Incomplete   Imputed |     Total
                  -------------------+-----------------------------------+----------
                      log_accessions |        144           24        24 |       168
                      sqdiff_minor~y |         26          142       142 |       168
                       diff_minority |         26          142       142 |       168
                           pct_maori |         26          142       142 |       168
                      deprivationi~x |         42          126       126 |       168
                        pctbachelors |         26          142       142 |       168
                          pctmasters |         26          142       142 |       168
                        pctdoctorate |         26          142       142 |       168
                         pctturnover |        144           24        24 |       168
                      unemployment~d |        154           14        14 |       168
                  ------------------------------------------------------------------
                  (complete + incomplete = total; imputed is the minimum across m
                   of the number of filled-in observations.)
                  Step 8: XTSET Data
                  HTML Code:
                  mi xtset $id $t
                         panel variable:  region (strongly balanced)
                          time variable:  year, 2006 to 2017
                                  delta:  1 unit
                  Step 9: Run Fixed Effects
                  Code:
                  mi estimate: xtreg $ylist $xlist, fe i(region)

                  HTML Code:
                  mi estimate: xtreg $ylist $xlist, fe i(region)
                  
                  Multiple-imputation estimates                   Imputations       =         20
                  Fixed-effects (within) regression               Number of obs     =        168
                  
                  Group variable: region                          Number of groups  =         14
                                                                  Obs per group:
                                                                                min =         12
                                                                                avg =       12.0
                                                                                max =         12
                                                                  Average RVI       =     2.3586
                                                                  Largest FMI       =     0.8141
                                                                  Complete DF       =        143
                  DF adjustment:   Small sample                   DF:     min       =      14.82
                                                                          avg       =      22.37
                                                                          max       =      28.45
                  Model F test:       Equal FMI                   F(  11,   92.1)   =       0.83
                  Within VCE type: Conventional                   Prob > F          =     0.6095
                  
                  -----------------------------------------------------------------------------------------
                           log_accessions |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  ------------------------+----------------------------------------------------------------
                          sqdiff_minority |   .7336144   .6594951     1.11   0.277    -.6307881    2.098017
                            diff_minority |  -1.271836   1.449093    -0.88   0.387    -4.238059    1.694387
                                pct_maori |  -1.066012   2.318821    -0.46   0.650    -5.853775    3.721751
                         deprivationindex |   .0141208   .0067174     2.10   0.047     .0002283    .0280133
                             pctbachelors |  -.0018677   .0136014    -0.14   0.892    -.0301815    .0264461
                               pctmasters |   .0190938   .0323527     0.59   0.561    -.0480029    .0861905
                             pctdoctorate |  -.0148487   .0301435    -0.49   0.627    -.0774679    .0477705
                              pctturnover |  -.0119336   .0536681    -0.22   0.827    -.1264454    .1025782
                  unemploymentrate15_oecd |   -.037254   .0574626    -0.65   0.525    -.1582754    .0837674
                              policyscore |    -.05017   .1165506    -0.43   0.671     -.290348    .1900079
                                  funding |          0  (omitted)
                           postcanterbury |   .0505546   .1208875     0.42   0.679    -.1969391    .2980482
                                    _cons |   10.87243    1.65603     6.57   0.000      7.42648    14.31839
                  ------------------------+----------------------------------------------------------------
                                  sigma_u |  .50411314
                                  sigma_e |  .22240221
                                      rho |  .83707538   (fraction of variance due to u_i)
                  -----------------------------------------------------------------------------------------
                  Note: sigma_u and sigma_e are combined in the original metric.
                  
                  Step 10: Summary of MI Data
                  HTML Code:
                  sum $ylist $xlist
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                  log_access~s |      3,504     9.93955      .66068   8.123594   12.27341
                  sqdiff_min~y |      3,386    .4261969    .2421935   -.470037   1.374426
                  diff_minor~y |      3,386    .6245527    .2326009   -.279737   1.573217
                     pct_maori |      3,386    .1850628    .1263405  -.2826185   .6675549
                  deprivatio~x |      3,402    7.222229    8.785161  -22.11049   42.14102
                  -------------+---------------------------------------------------------
                  pctbachelors |      3,386     45.6051    5.527129   24.38922   65.56276
                    pctmasters |      3,386    7.704817    2.725792  -3.818673   18.96408
                  pctdoctorate |      3,386     2.32867     2.03177  -5.680584   11.46432
                   pctturnover |      3,504    16.36587    2.016278   11.03838   21.68333
                  unemployme~d |      3,514    5.321049    1.583113   .5403764   9.620695
                  -------------+---------------------------------------------------------
                   policyscore |      3,528    .5833333    .4930765          0          1
                       funding |      3,528    .0714286    .2575759          0          1
                  postcanter~y |      3,528    .6666667    .4714713          0          1
                  Still stumped. I have used this technique before and followed these same steps and have not had a problem with all observations being utilized in the estimation process. Any additional suggestions are welcome.

                  Comment


                  • #10
                    I am confused. Perhaps there is a(n other) misunderstanding about multiple imputation. The output says that 168 observations are used. That is all the observations. How many observations are you expecting to be used?


                    Edit/Add (hopefully more helpful)

                    mi impute will create m (here: 20) complete datasets, meaning 20 datasets with 168 observations, each. Any analyses will then be carried out on each of the 20 completed datasets, using 168 observations. Results are combined and displayed.

                    The way Stata stores the information for the completed datasets depends on the mi style. Because you choose flong, the completed datasets are added as extra observations. Thus, you will end up with, e.g., 3,504 observations for log_accessions; 144 fully observed/complete cases/observations + 20 datasets * 168 completed cases.

                    Please note that you are still not done. You have not taken the nested (panel) structure into account during imputation. One way of doing this is reshape-ing the original dataset into a wide format. That way, e.g., the pct_maori of a region in 2006 will be used to impute that region's pct_maori in 2007, etc. The way you have set up the imputation model now totally ignores the fact that the variables within one region are probably strongly correlated over time -- which is the basis for within/FE regression that you are about to carry out. You will have to fix that.

                    Also, the FMI (fraction of missing value information) of 0.81 suggests that you should probably go for 80 or more completed datasets instead of just 20.

                    Edit 2: Almost forgot ... Do include all variables that you use in the analyses in the imputation model. The variables that do not have missing data (you might want to register these as regular but that is not mandatory) go to the right-hand side of the equals sign in

                    Code:
                    mi impute chained ... = complete_varlist , add(80)

                    Disclaimer: There might be more issues to address.
                    Last edited by daniel klein; 16 Sep 2020, 08:39.

                    Comment


                    • #11
                      I might have more to say/suggest.

                      Originally posted by Davia Downey View Post
                      As for the data: [...] They only collect certain data in five-year increments,
                      If the data is not collected in certain years, those missing values definitely qualify as MCAR. But: If this data is not collected for any of the regions, it might not be desirable to impute it. Technically, you could probably do that but even with a lot of trust in MI, this would appear to get close to "making up data". I would argue that imputed data should be based on (at least partly) observed correlations. I might not have fully understood the data collection process but I would suggest sticking with the years where there are at least a couple of observed values.

                      Originally posted by Davia Downey View Post
                      I'm not sure if I need to impute the dummy variables (postcanterbury, funding and policyscore) as they don't have missing data,
                      If there is no missing data, there is nothing to impute. As I have indicated in my earlier post, you should include these variables as "independent" predictors in the impution model, anyway. If you are still interested in interaction effects, include those in the model, too (there are different ways to do this, so get back if you are not sure how to).

                      Comment


                      • #12
                        daniel klein That is more helpful. So in essence, I need to first read in the data, reshape to wide, run mi set wide, then register imputed (missing) and regular variables, then impute chained all of the variables including interaction terms, and finally run the Fixed and Random effects. Is this correct?

                        Interestingly I re-ran this in SAS and it worked just fine, but I'll do as you suggest and report back.

                        Comment


                        • #13
                          Originally posted by Davia Downey View Post
                          Interestingly I re-ran this in SAS and it worked just fine
                          I will start with this one. Do not get fooled by not seeing an error message. Just because SAS is not throwing errors, it does not mean that it magically solves all the issues we have touched upon in this thread. As you can see, Stata gives you results, too. Whether those results are trustworthy is another question and it really is up to you (and the scientific community) to decide. I will get back to this at the end of my post.


                          Originally posted by Davia Downey View Post
                          So in essence, I need to first read in the data, reshape to wide
                          That is one possibility to account for the within-region correlation over time and it is the one that I have mostly used. It is easy to implement and uses all available information. The downside is that you will create many variables; potentially too many given the comparatively few observations. Including interaction-terms poses additional problems, because you would probably want to include the lower-order terms and interaction-terms for all years.

                          An alternative approach would keep the dataset in the long-format. To account for within-region correlation, you could create lagged (and, perhaps, lead) variables and include them in the respective (conditional) imputation models. Each variable would then have its own imputation models. This way, you could decide how many lagged (and/or lead) values you want to include in the respective imputation model (as opposed to using all lags and leads in the wide-format). While this approach is more flexible than the wide-format-approach, it is obviously also more cumbersome to implement. Also, excluding some (or most) of the lagged/lead variables bears the question of how compatible the imputation model will be for the fixed-effects estimator, which uses (only) the/all within-region variation.


                          Originally posted by Davia Downey View Post
                          [...] I need to [...] run mi set wide
                          No. Well, you can do so but, in general, the style in which you store your imputations really does not matter. It does matter when you hit the limits of variables in your flavor of Stata (2,047 in Stata IC). I tend to use flong but the reason is not technical; I merely feel this is the style I can best wrap my head around.


                          Originally posted by Davia Downey View Post
                          [...] I need to [...] register imputed (missing) and regular variables
                          As stated, registering imputed variables is mandatory; registering regular variables does no harm and might even help to avoid mistakes.


                          Originally posted by Davia Downey View Post
                          [...] I need to [...] impute chained all of the variables including interaction terms
                          Yes. In general, you have to account for all correlations between the variables you are interested in. Omitting the interaction terms from the imputation model will bias the coefficients for those interaction terms in the substantive model towards zero. How to best include the interactions is, to the best of my knowledge, still an ongoing discussion. If none of the involved variables has missing values, then you can just include the respective interactions (along with the lower-order terms, of course). With categorical indicators, you could also try to run the imputation model for separate groups (indicated by the categorical predictor). That will allow you to assess interactions of any variable with that categorical predictor. Given the very low number of observations, I do not think this is an option here.


                          Originally posted by Davia Downey View Post
                          [...] I need to [...] run the Fixed and Random effects.
                          Fittingly, this last step brings us back to my first answer about SAS (apparently) not throwing errors and what that might or might not imply about the quality of the analyses/results.

                          Tom has pointed to some drawbacks of the FE-estimator in #6. I will just leave it at this summary: you essentially have 14 regions (cases) that have been observed over 12 years (but potentially only about two or three times because of the five-year cycle of the Census). You will need to decide how comfortable you are modeling this data within the framework of different parametric methods that often rely on asymptotics.
                          Last edited by daniel klein; 16 Sep 2020, 12:08.

                          Comment


                          • #14
                            daniel klein I really am only running the FE and RE to determine the Hausman test. Auckland is the main powerhouse in terms of economic activity (it's the largest), but I am interested more generally in the impacts of disasters on job accessions and don't want to further restrict the data to Canterbury along (additionally, we know from the literature the disasters have secondary effects in other areas).

                            Thanks again for your comments and for providing input to working around this issue. It would be nice if my SES variables were collected with more regularity (i.e., minority populations, deprivation, and education---but I have looked and looked and even made inquiries about where this can be found and have come up with nothing. I really wish there was an American Community Survey alternative--I've used this in other research to fill in the years between the decennial census in the US which would make this entire estimation process much less frustrating.
                            Last edited by Davia Downey; 16 Sep 2020, 14:28.

                            Comment


                            • #15
                              Hello, I am trying to use collapse with MI data. I am imputing categorical variables (dependent variable is a count). I am trying to aggregate the data across all five imputations, but collapse wont work. Any idea on hoe I can perform collapse manually using mi xeq? I want to aggregate data by time and a stateID and by time alone.

                              Comment

                              Working...
                              X