Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to deal with missing values in panel data (1 Item is missing)

    Dear all,

    I'm using Stata 14.2. and working with Panel-Data Wave 2-8. I have trouble to generate the AV for the FE model, because in Wave 3 Item ,pcr3i5' was not asked. I have the information of the previous wave (2) and the following wave (4). I know that there is a possibility (maybe with means?) but I didn't unterstand what I found on the internet. If I drop this Item it reduces the Alpha Value of the Intex-Variable. Furthermore I need to keep this Item because I'm working with Multi-Actor-Design.

    This is how I generate the AV:

    Code:
    *** BEZIEHUNGSQUALITÄT _ AUS SICHT ANKER ***
    
    *** Intimität: pcr3i1 pcr3i8 // Wertschätzung: pcr3i2 pcr3i5 // Conflict: pcr3i4 pcr3i6 ***
    
    alpha pcr3i1 pcr3i8 pcr3i2 pcr3i4 pcr3i6, item
    revv pcr3i4 pcr3i6 // Vorzeichen ändern
    
    alpha pcr3i1 pcr3i8 pcr3i2 rv_pcr3i4 rv_pcr3i6, item
    
    egen bezqual_anker = rowmean (pcr3i1 pcr3i8 pcr3i2 rv_pcr3i4 rv_pcr3i6) // Indexvariable erstellen
    label var bezqual_anker "Beziehungsqualität (Elternperspektive)"
    label def bezqual_anker 1 "1 niedrig" 5 "5 hoch"
    label val bezqual_anker bezqual_anker
    The next 'problem' is that I have left censored Data for the Variable ehc28p1 at wave 2. I decided to drop wave 2 and work with wave 3-8. Its no problemto drop wave 2 after using it for the missing values for Item pcr3i5 isn't it?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(id cid) byte(wave pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1)
      111000   111203 2  4  4  2  4  2  4 .
      111000   111203 3  4  4  1  .  1  5 0
      111000   111203 4  4  4  3  4  3  5 1
      111000   111203 5  3  3  2  4  2  3 1
      111000   111203 6  4  4  2  5  3  4 1
      111000   111203 7  4  4  2  4  2  4 1
      111000   111203 8  4  4  2  4  3  4 1
      907000   907201 8  3  5  2  5  1  4 1
     1300000  1300202 2  5  4  2  4  2  4 .
     1300000  1300202 3  5  5  2  .  2  3 1
     1300000  1300202 4  5  5  2  5  2  4 1
     1624000  1624201 8  4  3  2  5  2  4 1
     2767000  2767201 2  4  5  2  4  2  4 .
     2767000  2767201 3  5  5  3  .  3  4 1
     2767000  2767201 4  4  4  2  5  3  3 1
     3491000  3491201 8  4  4  2  5  3  4 1
     3902000  3902201 5  4  4  2  5  2  4 1
     3902000  3902201 6  4  4  1  5  2  4 1
     3902000  3902201 7  4  4  1  4  2  3 1
     3902000  3902201 8  4  4  1  5  2  4 1
     4814000  4814203 5  3  4  2  4  3  4 1
     4835000  4835201 2  4  4  2  4  3  4 .
     4835000  4835201 3  4  4  2  .  3  4 1
     4835000  4835201 4  3  4  3  4  4  3 1
     4835000  4835201 5  3  4  3  4  3  2 1
     4858000  4858201 4  5  5  1  5  1  5 1
     4858000  4858201 5  5  5  1  5  1  5 1
     4858000  4858201 6  4  4  3  5  3  4 1
     4858000  4858201 7  3  5  3  5  3  4 1
     4858000  4858201 8  2  5  3  3  1  2 1
     5780000  5780204 2  2  3  3  3  3  2 .
     6151000  6151201 5  4  5  2  5  2  3 1
     6151000  6151201 6  3  4  2  5  2  4 1
     6151000  6151201 7  3  4  1  5  2  3 1
     6151000  6151201 8  4  4  2  5  3  3 1
     6519000  6519201 4  3  3  3  4  3  4 1
     6519000  6519201 5  4  4  2  4  3  4 1
     6519000  6519201 6  4  3  3  4  4  4 1
     6519000  6519201 7  3  4  3  4  4  4 1
     6519000  6519201 8  3  4  3  4  3  3 1
     7631000  7631201 8  5  5  3  5  3  5 1
     8807000  8807203 4  3  3  1  3  3  3 1
     8948000  8948201 2  2  4  2  5  2  3 .
     8948000  8948201 3  2  4  2  .  2  3 1
     8948000  8948201 4  2  5  1  5  2  2 1
     8948000  8948201 5  2  4  2  4  2  2 1
     8948000  8948201 6  2  4  3  4  2  2 1
     8948000  8948201 7  2  4  2  4  2  3 1
     8948000  8948201 8  2  4  2  5  2  2 1
     9657000  9657201 5  5  3  3  3  3  5 1
     9657000  9657201 8  4  4  3  4  3  4 1
     9917000  9917201 5  4  5  3  5  3  4 1
     9917000  9917201 6  4  4  2  5  2  4 1
     9917000  9917201 8  4  4  3  4  3  3 1
     9980000  9980201 4  3  5  1  5  3  2 1
    10208000 10208201 3  4  4  2  .  2  4 1
    10208000 10208201 4  3  4  2  4  2  3 1
    10208000 10208201 5  5  5  2  4  2  4 1
    10208000 10208201 6  4  4  2  4  2  4 1
    10208000 10208201 7  4  4  2  5  3  4 1
    10208000 10208201 8  4  4  3  4  2  4 1
    10250000 10250201 8  4  4  2  5  2  4 1
    10564000 10564202 2  4  4  3  4  3  4 .
    10957000 10957202 2  5  4  1  4  2  4 .
    10957000 10957202 3  4  4  1  .  1  3 1
    10957000 10957202 4  3  4  1  4  1  3 1
    10957000 10957202 5  3  4  2  4  1  3 1
    11295000 11295201 4  4  5  2  5  3  3 0
    11295000 11295201 5  3  4  2  4  2  4 1
    11295000 11295201 6  4  4  2  4  3  4 1
    11295000 11295201 7  4  4  3  4  3  4 1
    11295000 11295201 8  3  4  2  4  3  3 1
    11470000 11470201 6  4  4  2  4  3  3 1
    12266000 12266201 2 -2 -2 -2 -2 -2 -2 .
    12266000 12266201 3  3  4  2  .  2  4 1
    12266000 12266201 6  4  4  3  5  4  3 1
    12471000 12471201 6  3  4  4  3  3  3 1
    12471000 12471201 8  2  4  4  3  4  2 1
    12490000 12490201 3  4  4  3  .  3  4 1
    12490000 12490201 4  4  3  3  4  3  4 1
    12490000 12490201 5  4  4  3  3  3  4 1
    12490000 12490201 6  4  3  4  3  4  4 1
    12490000 12490201 7  4  3  3  3  3  4 1
    12490000 12490201 8  3  3  3  3  3  3 1
    13345000 13345202 2  5  5  2  4  2  5 .
    13345000 13345202 3  5  4  2  .  2  5 1
    13345000 13345202 4  3  4  3  4  3  5 1
    13588000 13588201 6  4  5  2  5  1  4 1
    13588000 13588201 7  4  4  1  5  1  5 1
    13937000 13937201 6  3  5  3  5  2  4 1
    13937000 13937201 8  3  5  2  5  3  4 1
    14660000 14660201 2  5  4  1  5  1  4 .
    14685000 14685202 2  4  4  2  4  2  3 .
    14685000 14685202 3  4  4  2  .  2  4 1
    14722000 14722201 3  2  4  2  .  2  3 1
    14722000 14722201 4  3  3  2  4  1  3 1
    14722000 14722201 5  3  3  2  3  1  3 1
    14722000 14722201 6  3  4  2  4  2  3 1
    14722000 14722201 7  3  4  2  4  1  2 1
    14722000 14722201 8  2  3  2  4  2  2 1
    end
    label values wave WAVE_prt2
    label def WAVE_prt2 2 "2 2009/10", modify
    label values pcr3i1 LABE_prt2
    label values pcr3i2 LABE_prt2
    label values pcr3i4 LABE_prt2
    label values pcr3i5 LABE_prt2
    label values pcr3i6 LABE_prt2
    label values pcr3i8 LABE_prt2
    label def LABE_prt2 -2 "-2 Keine Angabe", modify
    label def LABE_prt2 2 "2 Selten", modify
    label def LABE_prt2 3 "3 Manchmal", modify
    label def LABE_prt2 4 "4 Häufig", modify
    label def LABE_prt2 5 "5 Immer", modify
    label def LABE_prt2 1 "1 Nie", modify
    label values ehc28p1 liste160a_ac3
    label def liste160a_ac3 0 "0 Nein", modify
    label def liste160a_ac3 1 "1 Ja", modify
    label var id "Personennummer Anker"
    label var cid "Personennummer Kind"
    label var wave "Erhebungsjahr"
    label var pcr3i1 "Ihr Kind erzählt Ihnen, was es beschäftigt (Frage 6)"
    label var pcr3i2 "Die Dinge, die Ihr Kind tut, werden von Ihnen anerkannt (Frage 6)"
    label var pcr3i4 "Sie und Ihr Kind sind ärgerlich oder wütend aufeinander (Frage 6)"
    label var pcr3i5 "Sie zeigen Ihrem Kind, dass Sie es gut finden (Frage 6)"
    label var pcr3i6 "Sie und Ihr Kind sind unterschiedlicher Meinung und streiten sich (Frage 6)"
    label var pcr3i8 "Ihr Kind teilt mit Ihnen seine Gefühle und Gedanken (Frage 6)"
    label var ehc28p1 "Jetzt leben in Wohnung Vorwelle [Lebensmittelpunkt] (EHC)"
    Tank you!

    Guest
    Last edited by sladmin; 28 Jan 2019, 09:25. Reason: anonymize original poster

  • #2
    I tried it like this:

    Code:
    *** ERZIEHUNGSDATEN ***
    
    global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
    use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting2.dta"  // Welle 2 öffnen
    drop if dropoffvers==2
    
    mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
    egen fehlend = rowmiss (pcr3i5)
    keep if fehlend==0  
    drop fehlend
    
    save "/Users/Guest/Desktop/MA/Daten/parenting2_a.dta", replace
    
    global mergevars id cid wave dropoffvers pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6    
    use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting3.dta"  // Welle 3 öffnen
    drop if dropoffvers==2
    save "/Users/Guest/Desktop/MA/Daten/parenting3_a.dta", replace
    
    global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
    use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting4.dta"  // Welle 4 öffnen
    drop if dropoffvers==2
    
    mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
    egen fehlend = rowmiss (pcr3i5)
    keep if fehlend==0  
    drop fehlend
    
    save "/Users/Guest/Desktop/MA/Daten/parenting4_a.dta", replace
    
    global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
    use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting5.dta"  // Welle 5 öffnen
    drop if dropoffvers==2
    
    mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
    egen fehlend = rowmiss (pcr3i5)
    keep if fehlend==0  
    drop fehlend
    
    save "/Users/Guest/Desktop/MA/Daten/parenting5_a.dta", replace
    
    global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
    use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting6.dta"  // Welle 6 öffnen
    drop if dropoffvers==2
    
    mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
    egen fehlend = rowmiss (pcr3i5)
    keep if fehlend==0  
    drop fehlend
    
    save "/Users/Guest/Desktop/MA/Daten/parenting6_a.dta", replace
    
    global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
    use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting7.dta"  // Welle 7 öffnen
    drop if dropoffvers==2
    
    mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
    egen fehlend = rowmiss (pcr3i5)
    keep if fehlend==0  
    drop fehlend
    
    save "/Users/Guest/Desktop/MA/Daten/parenting7_a.dta", replace
    
    global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
    use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting8.dta"  // Welle 8 öffnen
    drop if dropoffvers==2
    
    mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
    egen fehlend = rowmiss (pcr3i5)
    keep if fehlend==0  
    drop fehlend
    
    save "/Users/Guest/Desktop/MA/Daten/parenting8_a.dta", replace
    
    use "/Users/Guest/Desktop/MA/Daten/parenting2_a"
    append using "/Users/Guest/Desktop/MA/Daten/parenting3_a"
    append using "/Users/Guest/Desktop/MA/Daten/parenting4_a"
    append using "/Users/Guest/Desktop/MA/Daten/parenting5_a"
    append using "/Users/Guest/Desktop/MA/Daten/parenting6_a"
    append using "/Users/Guest/Desktop/MA/Daten/parenting7_a"
    append using "/Users/Guest/Desktop/MA/Daten/parenting8_a" // aneinander spielen
    
    sort id wave
    Code:
    ipolate pcr3i5 wave, gen(pcr3i5_imp) epolate by (id)
    But there are still missing values. Is this correct or am I absolutely on the wrong track?

    Comment


    • #3
      Guest:
      I would tackle the issue via -mi- (assuming that data are MAR. This assumption should be verified/justified).
      First of all, to avoid biased imputation, I replaced all -2 coded values with missing (.) :
      Code:
      foreach var of varlist pcr3i1 - ehc28p1 {
                      replace `var'=. if `var'==-2
              }
      Then an -mi- model follows (please note that the predictors are the only variables with no missing data in your excerpt; hence, you may want to rethink the whole -mi-model and consider a different/wider set of predictors) :
      Code:
      mi set flong
      set seed 12345
      mi register impute pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1
      mi impute chained (pmm, knn(5)) pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1 = id wave, add(20) noisily
      I decided to create 20 complete datasets and see what was going on during the -mi- process with the -noisily. option.

      Eventually, I looked at the pooled statistics for each variables, that you can compare with the _mi_m=0 ones (two variables only are detailed below):
      Code:
      mi estimate: proportion ehc28p1
      
      Multiple-imputation estimates     Imputations     =         20
      Proportion estimation             Number of obs   =        100
                                        Average RVI     =     0.1974
                                        Largest FMI     =     0.1707
                                        Complete DF     =         99
      DF adjustment:   Small sample     DF:     min     =      72.64
                                                avg     =      72.64
      Within VCE type:     Analytic             max     =      72.64
      
            _prop_1: ehc28p1 = 0 Nein
            _prop_2: ehc28p1 = 1 Ja
      
      --------------------------------------------------------------
                   | Proportion   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
           _prop_1 |      .0255    .017321     -.0090236    .0600236
           _prop_2 |      .9745    .017321      .9399764    1.009024
      --------------------------------------------------------------
      
      . mi estimate: proportion pcr3i8
      
      Multiple-imputation estimates     Imputations     =         20
      Proportion estimation             Number of obs   =        100
                                        Average RVI     =     0.0070
                                        Largest FMI     =     0.0105
                                        Complete DF     =         99
      DF adjustment:   Small sample     DF:     min     =      96.05
                                                avg     =      96.31
      Within VCE type:     Analytic             max     =      97.06
      
            _prop_1: pcr3i8 = 2 Selten
            _prop_2: pcr3i8 = 3 Manchmal
            _prop_3: pcr3i8 = 4 Häufig
            _prop_4: pcr3i8 = 5 Immer
      
      --------------------------------------------------------------
                   | Proportion   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
           _prop_1 |       .111   .0317272      .0480225    .1739775
           _prop_2 |      .2925   .0459443      .2013021    .3836979
           _prop_3 |      .4965    .050498      .3962631    .5967369
           _prop_4 |         .1   .0301511      .0401588    .1598412
      --------------------------------------------------------------
      What above may be a strating point for your panel data regression with complete data.
      Last edited by sladmin; 28 Jan 2019, 09:27. Reason: anonymize original poster
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you Carlo!!

        Do I have to impute for all the variables like you did? Because I only need it for pcr3i5 and ehc28p1.

        Code:
        mi register impute pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1
        I read the -mi- help but don't get it. And I have to admit, that im having trouble to unterstand what you did here:

        Code:
         
         set seed 12345
        and here:

        Code:
        mi impute chained (pmm, knn(5)) pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1 = id wave, add(20) noisily
        I expected to get new variables or values for the gaps in my data set. How do I get those? I'm sorry for bothering you, but if you know a tutorial which explains this easily let my know. What I found online is difficult to unterstand

        Comment


        • #5
          Guest:
          - in your dataset excerpt/example you have classified some values as -2 (I recall that the label for those values was something like "unavailable" in German): hence, I translated them in missing values (.);
          - then I've created 20 imputed datasets via the -chained- equation approach (see -help mi impute chained- for further details). The pretty strong assumption I made is that the data were missing at random.

          Taking variable pcr3i8 as an example, you can see the difference between the starting dataset (ie, the one with missing values) and the 20 imputed datasets via the following command:
          Code:
          mi xeq 0 (1) 20: tab pcr3i8
          whereas the pooled values of proportion can be retrieved via:
          Code:
           mi estimate: proportion pcr3i8
          Last edited by sladmin; 28 Jan 2019, 09:27. Reason: anonymize original poster
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            This was the only thing i got completely!
            in your dataset excerpt/example you have classified some values as -2 (I recall that the label for those values was something like "unavailable" in German): hence, I translated them in missing values (.)
            Taking variable pcr3i8 as an example, you can see the difference between the starting dataset (ie, the one with missing values) and the 20 imputed datasets via the following command:
            Okay and how do i know which of the 20 imputed data sets i should use?

            Comment


            • #7
              All of them!
              That's why :
              Code:
              mi estimate: proportion pcr3i8
              is invoked.
              You can get it clearer taking a look at -mi intro- and -mi intro substantive-. It's a highly rewarding hard job.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Guest:
                taking the issue a bit further, I've elaborated on my oprevious replies to post an example of a panel data regression run on incomplete (assumed MAR) and complete data, respectively.
                I've created a fictitious continuous dependent variable (-y) and taken predictors from your excerpt:
                Code:
                .  g y=runiform()*1000
                foreach var of varlist pcr3i1 - ehc28p1 {
                  2.
                .                 replace `var'=. if `var'==-2
                  3.
                .         }
                (1 real change made, 1 to missing)
                (1 real change made, 1 to missing)
                (1 real change made, 1 to missing)
                (1 real change made, 1 to missing)
                (1 real change made, 1 to missing)
                (1 real change made, 1 to missing)
                (0 real changes made)
                
                . xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1
                
                Random-effects GLS regression                   Number of obs     =         76
                Group variable: id                              Number of groups  =         30
                
                R-sq:                                           Obs per group:
                     within  = 0.4850                                         min =          1
                     between = 0.1329                                         avg =        2.5
                     overall = 0.3301                                         max =          5
                
                                                                Wald chi2(17)     =      35.37
                corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0056
                
                ------------------------------------------------------------------------------
                           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                      pcr3i1 |
                 3 Manchmal  |   109.6598   203.2149     0.54   0.589    -288.6341    507.9538
                   4 Häufig  |   95.24959   224.9335     0.42   0.672    -345.6121    536.1112
                    5 Immer  |   254.5397   260.0267     0.98   0.328    -255.1032    764.1826
                             |
                      pcr3i2 |
                   4 Häufig  |   143.4363   103.3024     1.39   0.165    -59.03267    345.9053
                    5 Immer  |  -255.0081   146.8519    -1.74   0.082    -542.8326    32.81646
                             |
                      pcr3i4 |
                   2 Selten  |  -170.4727   114.2503    -1.49   0.136    -394.3991     53.4537
                 3 Manchmal  |    143.308   130.1368     1.10   0.271    -111.7555    398.3715
                   4 Häufig  |   88.51766    270.306     0.33   0.743    -441.2725    618.3078
                             |
                      pcr3i5 |
                   4 Häufig  |   127.3432   138.6514     0.92   0.358    -144.4084    399.0949
                    5 Immer  |   429.9593   161.4702     2.66   0.008     113.4834    746.4352
                             |
                      pcr3i6 |
                   2 Selten  |  -108.6853   137.6404    -0.79   0.430    -378.4555    161.0849
                 3 Manchmal  |  -234.0482   133.4804    -1.75   0.080     -495.665    27.56853
                   4 Häufig  |  -204.9479   202.3308    -1.01   0.311    -601.5091    191.6133
                             |
                      pcr3i8 |
                 3 Manchmal  |  -194.5615   153.6333    -1.27   0.205    -495.6773    106.5543
                   4 Häufig  |  -223.2908   172.1323    -1.30   0.195    -560.6639    114.0822
                    5 Immer  |   -529.096   217.0752    -2.44   0.015    -954.5555   -103.6364
                             |
                     ehc28p1 |
                       1 Ja  |   211.8828     311.09     0.68   0.496    -397.8424     821.608
                       _cons |    313.107   400.7957     0.78   0.435    -472.4381    1098.652
                -------------+----------------------------------------------------------------
                     sigma_u |  206.25062
                     sigma_e |  254.93028
                         rho |  .39560872   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                . mi set flong
                
                .
                . set seed 12345
                
                .
                . mi register impute pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1
                (24 m=0 obs. now marked as incomplete)
                
                .
                . mi impute chained (pmm, knn(5)) pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1 = id wave, add(20)
                
                Conditional models:
                            pcr3i1: pmm pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                            pcr3i2: pmm pcr3i2 pcr3i1 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                            pcr3i4: pmm pcr3i4 pcr3i1 pcr3i2 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                            pcr3i6: pmm pcr3i6 pcr3i1 pcr3i2 pcr3i4 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                            pcr3i8: pmm pcr3i8 pcr3i1 pcr3i2 pcr3i4 pcr3i6 ehc28p1 pcr3i5 id wave , knn(5)
                           ehc28p1: pmm ehc28p1 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 pcr3i5 id wave , knn(5)
                            pcr3i5: pmm pcr3i5 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 id wave , knn(5)
                
                Performing chained iterations ...
                
                Multivariate imputation                     Imputations =       20
                Chained equations                                 added =       20
                Imputed: m=1 through m=20                       updated =        0
                
                Initialization: monotone                     Iterations =      200
                                                                burn-in =       10
                
                            pcr3i1: predictive mean matching
                            pcr3i2: predictive mean matching
                            pcr3i4: predictive mean matching
                            pcr3i5: predictive mean matching
                            pcr3i6: predictive mean matching
                            pcr3i8: predictive mean matching
                           ehc28p1: predictive mean matching
                
                ------------------------------------------------------------------
                                   |               Observations per m            
                                   |----------------------------------------------
                          Variable |   Complete   Incomplete   Imputed |     Total
                -------------------+-----------------------------------+----------
                            pcr3i1 |         99            1         1 |       100
                            pcr3i2 |         99            1         1 |       100
                            pcr3i4 |         99            1         1 |       100
                            pcr3i5 |         87           13        13 |       100
                            pcr3i6 |         99            1         1 |       100
                            pcr3i8 |         99            1         1 |       100
                           ehc28p1 |         88           12        12 |       100
                ------------------------------------------------------------------
                (complete + incomplete = total; imputed is the minimum across m
                 of the number of filled-in observations.)
                
                . mi estimate: xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1
                
                Multiple-imputation estimates                   Imputations       =         20
                Random-effects GLS regression                   Number of obs     =        100
                
                Group variable: id                              Number of groups  =         34
                                                                Obs per group:
                                                                              min =          1
                                                                              avg =        2.9
                                                                              max =          7
                                                                Average RVI       =     0.0507
                                                                Largest FMI       =     0.2447
                DF adjustment:   Large sample                   DF:     min       =     329.44
                                                                        avg       =  27,367.77
                                                                        max       = 109,602.78
                Model F test:       Equal FMI                   F(  17,124393.9)  =       1.89
                Within VCE type: Conventional                   Prob > F          =     0.0146
                
                ------------------------------------------------------------------------------
                           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                      pcr3i1 |
                 3 Manchmal  |   83.84471   150.0855     0.56   0.576    -210.3628    378.0522
                   4 Häufig  |   144.5773   164.3623     0.88   0.379    -177.6082    466.7627
                    5 Immer  |   220.7726   191.7085     1.15   0.250    -155.0751    596.6202
                             |
                      pcr3i2 |
                   4 Häufig  |   100.2005   98.35517     1.02   0.308    -92.57609    292.9771
                    5 Immer  |  -182.5404   126.3146    -1.45   0.148      -430.12    65.03923
                             |
                      pcr3i4 |
                   2 Selten  |  -103.2356   99.77115    -1.03   0.301     -298.788    92.31674
                 3 Manchmal  |   156.8843   121.5121     1.29   0.197    -81.28229     395.051
                   4 Häufig  |   19.19786     263.17     0.07   0.942    -496.6348    535.0305
                             |
                      pcr3i5 |
                   4 Häufig  |   83.33518   134.6154     0.62   0.536    -180.7715    347.4419
                    5 Immer  |   294.8475   149.4258     1.97   0.049     1.461407    588.2337
                             |
                      pcr3i6 |
                   2 Selten  |  -56.32939   111.6178    -0.50   0.614     -275.103    162.4442
                 3 Manchmal  |  -219.9922   115.3463    -1.91   0.056    -446.0693    6.084873
                   4 Häufig  |  -155.1145   185.8078    -0.83   0.404    -519.3017    209.0728
                             |
                      pcr3i8 |
                 3 Manchmal  |  -149.0072   122.8482    -1.21   0.225    -389.7989    91.78452
                   4 Häufig  |   -251.298   138.3298    -1.82   0.069     -522.433    19.83695
                    5 Immer  |   -480.448   176.8839    -2.72   0.007    -827.1543   -133.7416
                             |
                     ehc28p1 |
                       1 Ja  |   223.5938   225.1112     0.99   0.321    -219.2428    666.4304
                       _cons |    339.631   277.7912     1.22   0.222    -205.6719     884.934
                -------------+----------------------------------------------------------------
                     sigma_u |  162.91602
                     sigma_e |  249.27039
                         rho |  .29930559   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                Note: sigma_u and sigma_e are combined in the original metric.
                ù

                Interstingly, while the relative average variance increase (RVI) is limited (5%), the largest fraction of missing information (FMI) tells that the number of imputations should have been increased (0.2447*100=24 vs 20).
                Last edited by sladmin; 28 Jan 2019, 09:27. Reason: anonymize original poster
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Guest:
                  taking the issue a bit further, I've elaborated on my oprevious replies to post an example of a panel data regression run on incomplete (assumed MAR) and complete data, respectively.
                  I've created a fictitious continuous dependent variable (-y) and taken predictors from your excerpt:
                  Code:
                  . g y=runiform()*1000
                  foreach var of varlist pcr3i1 - ehc28p1 {
                  2.
                  . replace `var'=. if `var'==-2
                  3.
                  . }
                  (1 real change made, 1 to missing)
                  (1 real change made, 1 to missing)
                  (1 real change made, 1 to missing)
                  (1 real change made, 1 to missing)
                  (1 real change made, 1 to missing)
                  (1 real change made, 1 to missing)
                  (0 real changes made)
                  
                  . xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1
                  
                  Random-effects GLS regression Number of obs = 76
                  Group variable: id Number of groups = 30
                  
                  R-sq: Obs per group:
                  within = 0.4850 min = 1
                  between = 0.1329 avg = 2.5
                  overall = 0.3301 max = 5
                  
                  Wald chi2(17) = 35.37
                  corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0056
                  
                  ------------------------------------------------------------------------------
                  y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  pcr3i1 |
                  3 Manchmal | 109.6598 203.2149 0.54 0.589 -288.6341 507.9538
                  4 Häufig | 95.24959 224.9335 0.42 0.672 -345.6121 536.1112
                  5 Immer | 254.5397 260.0267 0.98 0.328 -255.1032 764.1826
                  |
                  pcr3i2 |
                  4 Häufig | 143.4363 103.3024 1.39 0.165 -59.03267 345.9053
                  5 Immer | -255.0081 146.8519 -1.74 0.082 -542.8326 32.81646
                  |
                  pcr3i4 |
                  2 Selten | -170.4727 114.2503 -1.49 0.136 -394.3991 53.4537
                  3 Manchmal | 143.308 130.1368 1.10 0.271 -111.7555 398.3715
                  4 Häufig | 88.51766 270.306 0.33 0.743 -441.2725 618.3078
                  |
                  pcr3i5 |
                  4 Häufig | 127.3432 138.6514 0.92 0.358 -144.4084 399.0949
                  5 Immer | 429.9593 161.4702 2.66 0.008 113.4834 746.4352
                  |
                  pcr3i6 |
                  2 Selten | -108.6853 137.6404 -0.79 0.430 -378.4555 161.0849
                  3 Manchmal | -234.0482 133.4804 -1.75 0.080 -495.665 27.56853
                  4 Häufig | -204.9479 202.3308 -1.01 0.311 -601.5091 191.6133
                  |
                  pcr3i8 |
                  3 Manchmal | -194.5615 153.6333 -1.27 0.205 -495.6773 106.5543
                  4 Häufig | -223.2908 172.1323 -1.30 0.195 -560.6639 114.0822
                  5 Immer | -529.096 217.0752 -2.44 0.015 -954.5555 -103.6364
                  |
                  ehc28p1 |
                  1 Ja | 211.8828 311.09 0.68 0.496 -397.8424 821.608
                  _cons | 313.107 400.7957 0.78 0.435 -472.4381 1098.652
                  -------------+----------------------------------------------------------------
                  sigma_u | 206.25062
                  sigma_e | 254.93028
                  rho | .39560872 (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  
                  . mi set flong
                  
                  .
                  . set seed 12345
                  
                  .
                  . mi register impute pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1
                  (24 m=0 obs. now marked as incomplete)
                  
                  .
                  . mi impute chained (pmm, knn(5)) pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1 = id wave, add(20)
                  
                  Conditional models:
                  pcr3i1: pmm pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                  pcr3i2: pmm pcr3i2 pcr3i1 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                  pcr3i4: pmm pcr3i4 pcr3i1 pcr3i2 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                  pcr3i6: pmm pcr3i6 pcr3i1 pcr3i2 pcr3i4 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
                  pcr3i8: pmm pcr3i8 pcr3i1 pcr3i2 pcr3i4 pcr3i6 ehc28p1 pcr3i5 id wave , knn(5)
                  ehc28p1: pmm ehc28p1 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 pcr3i5 id wave , knn(5)
                  pcr3i5: pmm pcr3i5 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 id wave , knn(5)
                  
                  Performing chained iterations ...
                  
                  Multivariate imputation Imputations = 20
                  Chained equations added = 20
                  Imputed: m=1 through m=20 updated = 0
                  
                  Initialization: monotone Iterations = 200
                  burn-in = 10
                  
                  pcr3i1: predictive mean matching
                  pcr3i2: predictive mean matching
                  pcr3i4: predictive mean matching
                  pcr3i5: predictive mean matching
                  pcr3i6: predictive mean matching
                  pcr3i8: predictive mean matching
                  ehc28p1: predictive mean matching
                  
                  ------------------------------------------------------------------
                  | Observations per m
                  |----------------------------------------------
                  Variable | Complete Incomplete Imputed | Total
                  -------------------+-----------------------------------+----------
                  pcr3i1 | 99 1 1 | 100
                  pcr3i2 | 99 1 1 | 100
                  pcr3i4 | 99 1 1 | 100
                  pcr3i5 | 87 13 13 | 100
                  pcr3i6 | 99 1 1 | 100
                  pcr3i8 | 99 1 1 | 100
                  ehc28p1 | 88 12 12 | 100
                  ------------------------------------------------------------------
                  (complete + incomplete = total; imputed is the minimum across m
                  of the number of filled-in observations.)
                  
                  . mi estimate: xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1
                  
                  Multiple-imputation estimates Imputations = 20
                  Random-effects GLS regression Number of obs = 100
                  
                  Group variable: id Number of groups = 34
                  Obs per group:
                  min = 1
                  avg = 2.9
                  max = 7
                  Average RVI = 0.0507
                  Largest FMI = 0.2447
                  DF adjustment: Large sample DF: min = 329.44
                  avg = 27,367.77
                  max = 109,602.78
                  Model F test: Equal FMI F( 17,124393.9) = 1.89
                  Within VCE type: Conventional Prob > F = 0.0146
                  
                  ------------------------------------------------------------------------------
                  y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  pcr3i1 |
                  3 Manchmal | 83.84471 150.0855 0.56 0.576 -210.3628 378.0522
                  4 Häufig | 144.5773 164.3623 0.88 0.379 -177.6082 466.7627
                  5 Immer | 220.7726 191.7085 1.15 0.250 -155.0751 596.6202
                  |
                  pcr3i2 |
                  4 Häufig | 100.2005 98.35517 1.02 0.308 -92.57609 292.9771
                  5 Immer | -182.5404 126.3146 -1.45 0.148 -430.12 65.03923
                  |
                  pcr3i4 |
                  2 Selten | -103.2356 99.77115 -1.03 0.301 -298.788 92.31674
                  3 Manchmal | 156.8843 121.5121 1.29 0.197 -81.28229 395.051
                  4 Häufig | 19.19786 263.17 0.07 0.942 -496.6348 535.0305
                  |
                  pcr3i5 |
                  4 Häufig | 83.33518 134.6154 0.62 0.536 -180.7715 347.4419
                  5 Immer | 294.8475 149.4258 1.97 0.049 1.461407 588.2337
                  |
                  pcr3i6 |
                  2 Selten | -56.32939 111.6178 -0.50 0.614 -275.103 162.4442
                  3 Manchmal | -219.9922 115.3463 -1.91 0.056 -446.0693 6.084873
                  4 Häufig | -155.1145 185.8078 -0.83 0.404 -519.3017 209.0728
                  |
                  pcr3i8 |
                  3 Manchmal | -149.0072 122.8482 -1.21 0.225 -389.7989 91.78452
                  4 Häufig | -251.298 138.3298 -1.82 0.069 -522.433 19.83695
                  5 Immer | -480.448 176.8839 -2.72 0.007 -827.1543 -133.7416
                  |
                  ehc28p1 |
                  1 Ja | 223.5938 225.1112 0.99 0.321 -219.2428 666.4304
                  _cons | 339.631 277.7912 1.22 0.222 -205.6719 884.934
                  -------------+----------------------------------------------------------------
                  sigma_u | 162.91602
                  sigma_e | 249.27039
                  rho | .29930559 (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  Note: sigma_u and sigma_e are combined in the original metric.
                  ù

                  Interstingly, while the relative average variance increase (RVI) is limited (5%), the largest fraction of missing information (FMI) tells that the number of imputations should have been increased (0.2447*100=24 vs 20).
                  Thank you!!! I think I got it (a bit better) now. You helped me a lot. I'm waiting for a reply of my professor what she thinks is the best possibility.
                  Last edited by sladmin; 28 Jan 2019, 09:28. Reason: anonymize original poster

                  Comment

                  Working...
                  X