how to deal with missing values in panel data (1 Item is missing)

Guest

how to deal with missing values in panel data (1 Item is missing)

01 Sep 2017, 07:29

Dear all,

I'm using Stata 14.2. and working with Panel-Data Wave 2-8. I have trouble to generate the AV for the FE model, because in Wave 3 Item ,pcr3i5' was not asked. I have the information of the previous wave (2) and the following wave (4). I know that there is a possibility (maybe with means?) but I didn't unterstand what I found on the internet. If I drop this Item it reduces the Alpha Value of the Intex-Variable. Furthermore I need to keep this Item because I'm working with Multi-Actor-Design.

This is how I generate the AV:

Code:

*** BEZIEHUNGSQUALITÄT _ AUS SICHT ANKER ***

*** Intimität: pcr3i1 pcr3i8 // Wertschätzung: pcr3i2 pcr3i5 // Conflict: pcr3i4 pcr3i6 ***

alpha pcr3i1 pcr3i8 pcr3i2 pcr3i4 pcr3i6, item
revv pcr3i4 pcr3i6 // Vorzeichen ändern

alpha pcr3i1 pcr3i8 pcr3i2 rv_pcr3i4 rv_pcr3i6, item

egen bezqual_anker = rowmean (pcr3i1 pcr3i8 pcr3i2 rv_pcr3i4 rv_pcr3i6) // Indexvariable erstellen
label var bezqual_anker "Beziehungsqualität (Elternperspektive)"
label def bezqual_anker 1 "1 niedrig" 5 "5 hoch"
label val bezqual_anker bezqual_anker

The next 'problem' is that I have left censored Data for the Variable ehc28p1 at wave 2. I decided to drop wave 2 and work with wave 3-8. Its no problemto drop wave 2 after using it for the missing values for Item pcr3i5 isn't it?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id cid) byte(wave pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1)
  111000   111203 2  4  4  2  4  2  4 .
  111000   111203 3  4  4  1  .  1  5 0
  111000   111203 4  4  4  3  4  3  5 1
  111000   111203 5  3  3  2  4  2  3 1
  111000   111203 6  4  4  2  5  3  4 1
  111000   111203 7  4  4  2  4  2  4 1
  111000   111203 8  4  4  2  4  3  4 1
  907000   907201 8  3  5  2  5  1  4 1
 1300000  1300202 2  5  4  2  4  2  4 .
 1300000  1300202 3  5  5  2  .  2  3 1
 1300000  1300202 4  5  5  2  5  2  4 1
 1624000  1624201 8  4  3  2  5  2  4 1
 2767000  2767201 2  4  5  2  4  2  4 .
 2767000  2767201 3  5  5  3  .  3  4 1
 2767000  2767201 4  4  4  2  5  3  3 1
 3491000  3491201 8  4  4  2  5  3  4 1
 3902000  3902201 5  4  4  2  5  2  4 1
 3902000  3902201 6  4  4  1  5  2  4 1
 3902000  3902201 7  4  4  1  4  2  3 1
 3902000  3902201 8  4  4  1  5  2  4 1
 4814000  4814203 5  3  4  2  4  3  4 1
 4835000  4835201 2  4  4  2  4  3  4 .
 4835000  4835201 3  4  4  2  .  3  4 1
 4835000  4835201 4  3  4  3  4  4  3 1
 4835000  4835201 5  3  4  3  4  3  2 1
 4858000  4858201 4  5  5  1  5  1  5 1
 4858000  4858201 5  5  5  1  5  1  5 1
 4858000  4858201 6  4  4  3  5  3  4 1
 4858000  4858201 7  3  5  3  5  3  4 1
 4858000  4858201 8  2  5  3  3  1  2 1
 5780000  5780204 2  2  3  3  3  3  2 .
 6151000  6151201 5  4  5  2  5  2  3 1
 6151000  6151201 6  3  4  2  5  2  4 1
 6151000  6151201 7  3  4  1  5  2  3 1
 6151000  6151201 8  4  4  2  5  3  3 1
 6519000  6519201 4  3  3  3  4  3  4 1
 6519000  6519201 5  4  4  2  4  3  4 1
 6519000  6519201 6  4  3  3  4  4  4 1
 6519000  6519201 7  3  4  3  4  4  4 1
 6519000  6519201 8  3  4  3  4  3  3 1
 7631000  7631201 8  5  5  3  5  3  5 1
 8807000  8807203 4  3  3  1  3  3  3 1
 8948000  8948201 2  2  4  2  5  2  3 .
 8948000  8948201 3  2  4  2  .  2  3 1
 8948000  8948201 4  2  5  1  5  2  2 1
 8948000  8948201 5  2  4  2  4  2  2 1
 8948000  8948201 6  2  4  3  4  2  2 1
 8948000  8948201 7  2  4  2  4  2  3 1
 8948000  8948201 8  2  4  2  5  2  2 1
 9657000  9657201 5  5  3  3  3  3  5 1
 9657000  9657201 8  4  4  3  4  3  4 1
 9917000  9917201 5  4  5  3  5  3  4 1
 9917000  9917201 6  4  4  2  5  2  4 1
 9917000  9917201 8  4  4  3  4  3  3 1
 9980000  9980201 4  3  5  1  5  3  2 1
10208000 10208201 3  4  4  2  .  2  4 1
10208000 10208201 4  3  4  2  4  2  3 1
10208000 10208201 5  5  5  2  4  2  4 1
10208000 10208201 6  4  4  2  4  2  4 1
10208000 10208201 7  4  4  2  5  3  4 1
10208000 10208201 8  4  4  3  4  2  4 1
10250000 10250201 8  4  4  2  5  2  4 1
10564000 10564202 2  4  4  3  4  3  4 .
10957000 10957202 2  5  4  1  4  2  4 .
10957000 10957202 3  4  4  1  .  1  3 1
10957000 10957202 4  3  4  1  4  1  3 1
10957000 10957202 5  3  4  2  4  1  3 1
11295000 11295201 4  4  5  2  5  3  3 0
11295000 11295201 5  3  4  2  4  2  4 1
11295000 11295201 6  4  4  2  4  3  4 1
11295000 11295201 7  4  4  3  4  3  4 1
11295000 11295201 8  3  4  2  4  3  3 1
11470000 11470201 6  4  4  2  4  3  3 1
12266000 12266201 2 -2 -2 -2 -2 -2 -2 .
12266000 12266201 3  3  4  2  .  2  4 1
12266000 12266201 6  4  4  3  5  4  3 1
12471000 12471201 6  3  4  4  3  3  3 1
12471000 12471201 8  2  4  4  3  4  2 1
12490000 12490201 3  4  4  3  .  3  4 1
12490000 12490201 4  4  3  3  4  3  4 1
12490000 12490201 5  4  4  3  3  3  4 1
12490000 12490201 6  4  3  4  3  4  4 1
12490000 12490201 7  4  3  3  3  3  4 1
12490000 12490201 8  3  3  3  3  3  3 1
13345000 13345202 2  5  5  2  4  2  5 .
13345000 13345202 3  5  4  2  .  2  5 1
13345000 13345202 4  3  4  3  4  3  5 1
13588000 13588201 6  4  5  2  5  1  4 1
13588000 13588201 7  4  4  1  5  1  5 1
13937000 13937201 6  3  5  3  5  2  4 1
13937000 13937201 8  3  5  2  5  3  4 1
14660000 14660201 2  5  4  1  5  1  4 .
14685000 14685202 2  4  4  2  4  2  3 .
14685000 14685202 3  4  4  2  .  2  4 1
14722000 14722201 3  2  4  2  .  2  3 1
14722000 14722201 4  3  3  2  4  1  3 1
14722000 14722201 5  3  3  2  3  1  3 1
14722000 14722201 6  3  4  2  4  2  3 1
14722000 14722201 7  3  4  2  4  1  2 1
14722000 14722201 8  2  3  2  4  2  2 1
end
label values wave WAVE_prt2
label def WAVE_prt2 2 "2 2009/10", modify
label values pcr3i1 LABE_prt2
label values pcr3i2 LABE_prt2
label values pcr3i4 LABE_prt2
label values pcr3i5 LABE_prt2
label values pcr3i6 LABE_prt2
label values pcr3i8 LABE_prt2
label def LABE_prt2 -2 "-2 Keine Angabe", modify
label def LABE_prt2 2 "2 Selten", modify
label def LABE_prt2 3 "3 Manchmal", modify
label def LABE_prt2 4 "4 Häufig", modify
label def LABE_prt2 5 "5 Immer", modify
label def LABE_prt2 1 "1 Nie", modify
label values ehc28p1 liste160a_ac3
label def liste160a_ac3 0 "0 Nein", modify
label def liste160a_ac3 1 "1 Ja", modify
label var id "Personennummer Anker"
label var cid "Personennummer Kind"
label var wave "Erhebungsjahr"
label var pcr3i1 "Ihr Kind erzählt Ihnen, was es beschäftigt (Frage 6)"
label var pcr3i2 "Die Dinge, die Ihr Kind tut, werden von Ihnen anerkannt (Frage 6)"
label var pcr3i4 "Sie und Ihr Kind sind ärgerlich oder wütend aufeinander (Frage 6)"
label var pcr3i5 "Sie zeigen Ihrem Kind, dass Sie es gut finden (Frage 6)"
label var pcr3i6 "Sie und Ihr Kind sind unterschiedlicher Meinung und streiten sich (Frage 6)"
label var pcr3i8 "Ihr Kind teilt mit Ihnen seine Gefühle und Gedanken (Frage 6)"
label var ehc28p1 "Jetzt leben in Wohnung Vorwelle [Lebensmittelpunkt] (EHC)"

Tank you!

Guest

Last edited by sladmin; 28 Jan 2019, 09:25. Reason: anonymize original poster

Tags: None

Guest

02 Sep 2017, 05:21

I tried it like this:

Code:

*** ERZIEHUNGSDATEN ***

global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting2.dta"  // Welle 2 öffnen
drop if dropoffvers==2

mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
egen fehlend = rowmiss (pcr3i5)
keep if fehlend==0  
drop fehlend

save "/Users/Guest/Desktop/MA/Daten/parenting2_a.dta", replace

global mergevars id cid wave dropoffvers pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6    
use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting3.dta"  // Welle 3 öffnen
drop if dropoffvers==2
save "/Users/Guest/Desktop/MA/Daten/parenting3_a.dta", replace

global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting4.dta"  // Welle 4 öffnen
drop if dropoffvers==2

mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
egen fehlend = rowmiss (pcr3i5)
keep if fehlend==0  
drop fehlend

save "/Users/Guest/Desktop/MA/Daten/parenting4_a.dta", replace

global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting5.dta"  // Welle 5 öffnen
drop if dropoffvers==2

mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
egen fehlend = rowmiss (pcr3i5)
keep if fehlend==0  
drop fehlend

save "/Users/Guest/Desktop/MA/Daten/parenting5_a.dta", replace

global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting6.dta"  // Welle 6 öffnen
drop if dropoffvers==2

mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
egen fehlend = rowmiss (pcr3i5)
keep if fehlend==0  
drop fehlend

save "/Users/Guest/Desktop/MA/Daten/parenting6_a.dta", replace

global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting7.dta"  // Welle 7 öffnen
drop if dropoffvers==2

mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
egen fehlend = rowmiss (pcr3i5)
keep if fehlend==0  
drop fehlend

save "/Users/Guest/Desktop/MA/Daten/parenting7_a.dta", replace

global mergevars id cid wave dropoffvers pcr3i5 pcr3i2 pcr3i1 pcr3i8 pcr3i4 pcr3i6
use id wave $mergevars using "/Users/Guest/Desktop/MA/Daten/parenting8.dta"  // Welle 8 öffnen
drop if dropoffvers==2

mvdecode pcr3i5, mv(-2= .a \  -1= .b \  -9= .c)
egen fehlend = rowmiss (pcr3i5)
keep if fehlend==0  
drop fehlend

save "/Users/Guest/Desktop/MA/Daten/parenting8_a.dta", replace

use "/Users/Guest/Desktop/MA/Daten/parenting2_a"
append using "/Users/Guest/Desktop/MA/Daten/parenting3_a"
append using "/Users/Guest/Desktop/MA/Daten/parenting4_a"
append using "/Users/Guest/Desktop/MA/Daten/parenting5_a"
append using "/Users/Guest/Desktop/MA/Daten/parenting6_a"
append using "/Users/Guest/Desktop/MA/Daten/parenting7_a"
append using "/Users/Guest/Desktop/MA/Daten/parenting8_a" // aneinander spielen

sort id wave

Code:

ipolate pcr3i5 wave, gen(pcr3i5_imp) epolate by (id)

But there are still missing values. Is this correct or am I absolutely on the wrong track?

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

02 Sep 2017, 09:33

Guest:
I would tackle the issue via -mi- (assuming that data are MAR. This assumption should be verified/justified).
First of all, to avoid biased imputation, I replaced all -2 coded values with missing (.) :

Code:

foreach var of varlist pcr3i1 - ehc28p1 {
                replace `var'=. if `var'==-2
        }

Then an -mi- model follows (please note that the predictors are the only variables with no missing data in your excerpt; hence, you may want to rethink the whole -mi-model and consider a different/wider set of predictors) :

Code:

mi set flong
set seed 12345
mi register impute pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1
mi impute chained (pmm, knn(5)) pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1 = id wave, add(20) noisily

I decided to create 20 complete datasets and see what was going on during the -mi- process with the -noisily. option.

Eventually, I looked at the pooled statistics for each variables, that you can compare with the _mi_m=0 ones (two variables only are detailed below):

Code:

mi estimate: proportion ehc28p1

Multiple-imputation estimates     Imputations     =         20
Proportion estimation             Number of obs   =        100
                                  Average RVI     =     0.1974
                                  Largest FMI     =     0.1707
                                  Complete DF     =         99
DF adjustment:   Small sample     DF:     min     =      72.64
                                          avg     =      72.64
Within VCE type:     Analytic             max     =      72.64

      _prop_1: ehc28p1 = 0 Nein
      _prop_2: ehc28p1 = 1 Ja

--------------------------------------------------------------
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     _prop_1 |      .0255    .017321     -.0090236    .0600236
     _prop_2 |      .9745    .017321      .9399764    1.009024
--------------------------------------------------------------

. mi estimate: proportion pcr3i8

Multiple-imputation estimates     Imputations     =         20
Proportion estimation             Number of obs   =        100
                                  Average RVI     =     0.0070
                                  Largest FMI     =     0.0105
                                  Complete DF     =         99
DF adjustment:   Small sample     DF:     min     =      96.05
                                          avg     =      96.31
Within VCE type:     Analytic             max     =      97.06

      _prop_1: pcr3i8 = 2 Selten
      _prop_2: pcr3i8 = 3 Manchmal
      _prop_3: pcr3i8 = 4 Häufig
      _prop_4: pcr3i8 = 5 Immer

--------------------------------------------------------------
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     _prop_1 |       .111   .0317272      .0480225    .1739775
     _prop_2 |      .2925   .0459443      .2013021    .3836979
     _prop_3 |      .4965    .050498      .3962631    .5967369
     _prop_4 |         .1   .0301511      .0401588    .1598412
--------------------------------------------------------------

What above may be a strating point for your panel data regression with complete data.

Last edited by sladmin; 28 Jan 2019, 09:27. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)

Comment

Guest
#4

04 Sep 2017, 09:08

Thank you Carlo!!

Do I have to impute for all the variables like you did? Because I only need it for pcr3i5 and ehc28p1.

Code:

mi register impute pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1

I read the -mi- help but don't get it. And I have to admit, that im having trouble to unterstand what you did here:

Code:

set seed 12345

and here:

Code:

mi impute chained (pmm, knn(5)) pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1 = id wave, add(20) noisily

I expected to get new variables or values for the gaps in my data set. How do I get those? I'm sorry for bothering you, but if you know a tutorial which explains this easily let my know. What I found online is difficult to unterstand
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#5

04 Sep 2017, 09:30

Guest:
- in your dataset excerpt/example you have classified some values as -2 (I recall that the label for those values was something like "unavailable" in German): hence, I translated them in missing values (.);
- then I've created 20 imputed datasets via the -chained- equation approach (see -help mi impute chained- for further details). The pretty strong assumption I made is that the data were missing at random.

Taking variable pcr3i8 as an example, you can see the difference between the starting dataset (ie, the one with missing values) and the 20 imputed datasets via the following command:

Code:

mi xeq 0 (1) 20: tab pcr3i8

whereas the pooled values of proportion can be retrieved via:

Code:

mi estimate: proportion pcr3i8

Last edited by sladmin; 28 Jan 2019, 09:27. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#6

04 Sep 2017, 09:58

This was the only thing i got completely!

in your dataset excerpt/example you have classified some values as -2 (I recall that the label for those values was something like "unavailable" in German): hence, I translated them in missing values (.)

Taking variable pcr3i8 as an example, you can see the difference between the starting dataset (ie, the one with missing values) and the 20 imputed datasets via the following command:

Okay and how do i know which of the 20 imputed data sets i should use?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

04 Sep 2017, 10:47

All of them!
That's why :

Code:

mi estimate: proportion pcr3i8

is invoked.
You can get it clearer taking a look at -mi intro- and -mi intro substantive-. It's a highly rewarding hard job.

Kind regards,
Carlo
(Stata 19.0)
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

05 Sep 2017, 02:06

Guest:
taking the issue a bit further, I've elaborated on my oprevious replies to post an example of a panel data regression run on incomplete (assumed MAR) and complete data, respectively.
I've created a fictitious continuous dependent variable (-y) and taken predictors from your excerpt:

Code:

.  g y=runiform()*1000
foreach var of varlist pcr3i1 - ehc28p1 {
  2.
.                 replace `var'=. if `var'==-2
  3.
.         }
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(0 real changes made)

. xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1

Random-effects GLS regression                   Number of obs     =         76
Group variable: id                              Number of groups  =         30

R-sq:                                           Obs per group:
     within  = 0.4850                                         min =          1
     between = 0.1329                                         avg =        2.5
     overall = 0.3301                                         max =          5

                                                Wald chi2(17)     =      35.37
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0056

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      pcr3i1 |
 3 Manchmal  |   109.6598   203.2149     0.54   0.589    -288.6341    507.9538
   4 Häufig  |   95.24959   224.9335     0.42   0.672    -345.6121    536.1112
    5 Immer  |   254.5397   260.0267     0.98   0.328    -255.1032    764.1826
             |
      pcr3i2 |
   4 Häufig  |   143.4363   103.3024     1.39   0.165    -59.03267    345.9053
    5 Immer  |  -255.0081   146.8519    -1.74   0.082    -542.8326    32.81646
             |
      pcr3i4 |
   2 Selten  |  -170.4727   114.2503    -1.49   0.136    -394.3991     53.4537
 3 Manchmal  |    143.308   130.1368     1.10   0.271    -111.7555    398.3715
   4 Häufig  |   88.51766    270.306     0.33   0.743    -441.2725    618.3078
             |
      pcr3i5 |
   4 Häufig  |   127.3432   138.6514     0.92   0.358    -144.4084    399.0949
    5 Immer  |   429.9593   161.4702     2.66   0.008     113.4834    746.4352
             |
      pcr3i6 |
   2 Selten  |  -108.6853   137.6404    -0.79   0.430    -378.4555    161.0849
 3 Manchmal  |  -234.0482   133.4804    -1.75   0.080     -495.665    27.56853
   4 Häufig  |  -204.9479   202.3308    -1.01   0.311    -601.5091    191.6133
             |
      pcr3i8 |
 3 Manchmal  |  -194.5615   153.6333    -1.27   0.205    -495.6773    106.5543
   4 Häufig  |  -223.2908   172.1323    -1.30   0.195    -560.6639    114.0822
    5 Immer  |   -529.096   217.0752    -2.44   0.015    -954.5555   -103.6364
             |
     ehc28p1 |
       1 Ja  |   211.8828     311.09     0.68   0.496    -397.8424     821.608
       _cons |    313.107   400.7957     0.78   0.435    -472.4381    1098.652
-------------+----------------------------------------------------------------
     sigma_u |  206.25062
     sigma_e |  254.93028
         rho |  .39560872   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. mi set flong

.
. set seed 12345

.
. mi register impute pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1
(24 m=0 obs. now marked as incomplete)

.
. mi impute chained (pmm, knn(5)) pcr3i1  pcr3i2  pcr3i4  pcr3i5 pcr3i6  pcr3i8  ehc28p1 = id wave, add(20)

Conditional models:
            pcr3i1: pmm pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
            pcr3i2: pmm pcr3i2 pcr3i1 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
            pcr3i4: pmm pcr3i4 pcr3i1 pcr3i2 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
            pcr3i6: pmm pcr3i6 pcr3i1 pcr3i2 pcr3i4 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
            pcr3i8: pmm pcr3i8 pcr3i1 pcr3i2 pcr3i4 pcr3i6 ehc28p1 pcr3i5 id wave , knn(5)
           ehc28p1: pmm ehc28p1 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 pcr3i5 id wave , knn(5)
            pcr3i5: pmm pcr3i5 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 id wave , knn(5)

Performing chained iterations ...

Multivariate imputation                     Imputations =       20
Chained equations                                 added =       20
Imputed: m=1 through m=20                       updated =        0

Initialization: monotone                     Iterations =      200
                                                burn-in =       10

            pcr3i1: predictive mean matching
            pcr3i2: predictive mean matching
            pcr3i4: predictive mean matching
            pcr3i5: predictive mean matching
            pcr3i6: predictive mean matching
            pcr3i8: predictive mean matching
           ehc28p1: predictive mean matching

------------------------------------------------------------------
                   |               Observations per m            
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
            pcr3i1 |         99            1         1 |       100
            pcr3i2 |         99            1         1 |       100
            pcr3i4 |         99            1         1 |       100
            pcr3i5 |         87           13        13 |       100
            pcr3i6 |         99            1         1 |       100
            pcr3i8 |         99            1         1 |       100
           ehc28p1 |         88           12        12 |       100
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

. mi estimate: xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1

Multiple-imputation estimates                   Imputations       =         20
Random-effects GLS regression                   Number of obs     =        100

Group variable: id                              Number of groups  =         34
                                                Obs per group:
                                                              min =          1
                                                              avg =        2.9
                                                              max =          7
                                                Average RVI       =     0.0507
                                                Largest FMI       =     0.2447
DF adjustment:   Large sample                   DF:     min       =     329.44
                                                        avg       =  27,367.77
                                                        max       = 109,602.78
Model F test:       Equal FMI                   F(  17,124393.9)  =       1.89
Within VCE type: Conventional                   Prob > F          =     0.0146

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      pcr3i1 |
 3 Manchmal  |   83.84471   150.0855     0.56   0.576    -210.3628    378.0522
   4 Häufig  |   144.5773   164.3623     0.88   0.379    -177.6082    466.7627
    5 Immer  |   220.7726   191.7085     1.15   0.250    -155.0751    596.6202
             |
      pcr3i2 |
   4 Häufig  |   100.2005   98.35517     1.02   0.308    -92.57609    292.9771
    5 Immer  |  -182.5404   126.3146    -1.45   0.148      -430.12    65.03923
             |
      pcr3i4 |
   2 Selten  |  -103.2356   99.77115    -1.03   0.301     -298.788    92.31674
 3 Manchmal  |   156.8843   121.5121     1.29   0.197    -81.28229     395.051
   4 Häufig  |   19.19786     263.17     0.07   0.942    -496.6348    535.0305
             |
      pcr3i5 |
   4 Häufig  |   83.33518   134.6154     0.62   0.536    -180.7715    347.4419
    5 Immer  |   294.8475   149.4258     1.97   0.049     1.461407    588.2337
             |
      pcr3i6 |
   2 Selten  |  -56.32939   111.6178    -0.50   0.614     -275.103    162.4442
 3 Manchmal  |  -219.9922   115.3463    -1.91   0.056    -446.0693    6.084873
   4 Häufig  |  -155.1145   185.8078    -0.83   0.404    -519.3017    209.0728
             |
      pcr3i8 |
 3 Manchmal  |  -149.0072   122.8482    -1.21   0.225    -389.7989    91.78452
   4 Häufig  |   -251.298   138.3298    -1.82   0.069     -522.433    19.83695
    5 Immer  |   -480.448   176.8839    -2.72   0.007    -827.1543   -133.7416
             |
     ehc28p1 |
       1 Ja  |   223.5938   225.1112     0.99   0.321    -219.2428    666.4304
       _cons |    339.631   277.7912     1.22   0.222    -205.6719     884.934
-------------+----------------------------------------------------------------
     sigma_u |  162.91602
     sigma_e |  249.27039
         rho |  .29930559   (fraction of variance due to u_i)
------------------------------------------------------------------------------
Note: sigma_u and sigma_e are combined in the original metric.

ù

Interstingly, while the relative average variance increase (RVI) is limited (5%), the largest fraction of missing information (FMI) tells that the number of imputations should have been increased (0.2447*100=24 vs 20).

Last edited by sladmin; 28 Jan 2019, 09:27. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)

Comment

Guest

06 Sep 2017, 02:32

Originally posted by Carlo Lazzaro View Post

Code:

. g y=runiform()*1000
foreach var of varlist pcr3i1 - ehc28p1 {
2.
. replace `var'=. if `var'==-2
3.
. }
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(1 real change made, 1 to missing)
(0 real changes made)

. xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1

Random-effects GLS regression Number of obs = 76
Group variable: id Number of groups = 30

R-sq: Obs per group:
within = 0.4850 min = 1
between = 0.1329 avg = 2.5
overall = 0.3301 max = 5

Wald chi2(17) = 35.37
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0056

------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pcr3i1 |
3 Manchmal | 109.6598 203.2149 0.54 0.589 -288.6341 507.9538
4 Häufig | 95.24959 224.9335 0.42 0.672 -345.6121 536.1112
5 Immer | 254.5397 260.0267 0.98 0.328 -255.1032 764.1826
|
pcr3i2 |
4 Häufig | 143.4363 103.3024 1.39 0.165 -59.03267 345.9053
5 Immer | -255.0081 146.8519 -1.74 0.082 -542.8326 32.81646
|
pcr3i4 |
2 Selten | -170.4727 114.2503 -1.49 0.136 -394.3991 53.4537
3 Manchmal | 143.308 130.1368 1.10 0.271 -111.7555 398.3715
4 Häufig | 88.51766 270.306 0.33 0.743 -441.2725 618.3078
|
pcr3i5 |
4 Häufig | 127.3432 138.6514 0.92 0.358 -144.4084 399.0949
5 Immer | 429.9593 161.4702 2.66 0.008 113.4834 746.4352
|
pcr3i6 |
2 Selten | -108.6853 137.6404 -0.79 0.430 -378.4555 161.0849
3 Manchmal | -234.0482 133.4804 -1.75 0.080 -495.665 27.56853
4 Häufig | -204.9479 202.3308 -1.01 0.311 -601.5091 191.6133
|
pcr3i8 |
3 Manchmal | -194.5615 153.6333 -1.27 0.205 -495.6773 106.5543
4 Häufig | -223.2908 172.1323 -1.30 0.195 -560.6639 114.0822
5 Immer | -529.096 217.0752 -2.44 0.015 -954.5555 -103.6364
|
ehc28p1 |
1 Ja | 211.8828 311.09 0.68 0.496 -397.8424 821.608
_cons | 313.107 400.7957 0.78 0.435 -472.4381 1098.652
-------------+----------------------------------------------------------------
sigma_u | 206.25062
sigma_e | 254.93028
rho | .39560872 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. mi set flong

.
. set seed 12345

.
. mi register impute pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1
(24 m=0 obs. now marked as incomplete)

.
. mi impute chained (pmm, knn(5)) pcr3i1 pcr3i2 pcr3i4 pcr3i5 pcr3i6 pcr3i8 ehc28p1 = id wave, add(20)

Conditional models:
pcr3i1: pmm pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
pcr3i2: pmm pcr3i2 pcr3i1 pcr3i4 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
pcr3i4: pmm pcr3i4 pcr3i1 pcr3i2 pcr3i6 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
pcr3i6: pmm pcr3i6 pcr3i1 pcr3i2 pcr3i4 pcr3i8 ehc28p1 pcr3i5 id wave , knn(5)
pcr3i8: pmm pcr3i8 pcr3i1 pcr3i2 pcr3i4 pcr3i6 ehc28p1 pcr3i5 id wave , knn(5)
ehc28p1: pmm ehc28p1 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 pcr3i5 id wave , knn(5)
pcr3i5: pmm pcr3i5 pcr3i1 pcr3i2 pcr3i4 pcr3i6 pcr3i8 ehc28p1 id wave , knn(5)

Performing chained iterations ...

Multivariate imputation Imputations = 20
Chained equations added = 20
Imputed: m=1 through m=20 updated = 0

Initialization: monotone Iterations = 200
burn-in = 10

pcr3i1: predictive mean matching
pcr3i2: predictive mean matching
pcr3i4: predictive mean matching
pcr3i5: predictive mean matching
pcr3i6: predictive mean matching
pcr3i8: predictive mean matching
ehc28p1: predictive mean matching

------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
pcr3i1 | 99 1 1 | 100
pcr3i2 | 99 1 1 | 100
pcr3i4 | 99 1 1 | 100
pcr3i5 | 87 13 13 | 100
pcr3i6 | 99 1 1 | 100
pcr3i8 | 99 1 1 | 100
ehc28p1 | 88 12 12 | 100
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)

. mi estimate: xtreg y i.pcr3i1 i.pcr3i2 i.pcr3i4 i.pcr3i5 i.pcr3i6 i.pcr3i8 i.ehc28p1

Multiple-imputation estimates Imputations = 20
Random-effects GLS regression Number of obs = 100

Group variable: id Number of groups = 34
Obs per group:
min = 1
avg = 2.9
max = 7
Average RVI = 0.0507
Largest FMI = 0.2447
DF adjustment: Large sample DF: min = 329.44
avg = 27,367.77
max = 109,602.78
Model F test: Equal FMI F( 17,124393.9) = 1.89
Within VCE type: Conventional Prob > F = 0.0146

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pcr3i1 |
3 Manchmal | 83.84471 150.0855 0.56 0.576 -210.3628 378.0522
4 Häufig | 144.5773 164.3623 0.88 0.379 -177.6082 466.7627
5 Immer | 220.7726 191.7085 1.15 0.250 -155.0751 596.6202
|
pcr3i2 |
4 Häufig | 100.2005 98.35517 1.02 0.308 -92.57609 292.9771
5 Immer | -182.5404 126.3146 -1.45 0.148 -430.12 65.03923
|
pcr3i4 |
2 Selten | -103.2356 99.77115 -1.03 0.301 -298.788 92.31674
3 Manchmal | 156.8843 121.5121 1.29 0.197 -81.28229 395.051
4 Häufig | 19.19786 263.17 0.07 0.942 -496.6348 535.0305
|
pcr3i5 |
4 Häufig | 83.33518 134.6154 0.62 0.536 -180.7715 347.4419
5 Immer | 294.8475 149.4258 1.97 0.049 1.461407 588.2337
|
pcr3i6 |
2 Selten | -56.32939 111.6178 -0.50 0.614 -275.103 162.4442
3 Manchmal | -219.9922 115.3463 -1.91 0.056 -446.0693 6.084873
4 Häufig | -155.1145 185.8078 -0.83 0.404 -519.3017 209.0728
|
pcr3i8 |
3 Manchmal | -149.0072 122.8482 -1.21 0.225 -389.7989 91.78452
4 Häufig | -251.298 138.3298 -1.82 0.069 -522.433 19.83695
5 Immer | -480.448 176.8839 -2.72 0.007 -827.1543 -133.7416
|
ehc28p1 |
1 Ja | 223.5938 225.1112 0.99 0.321 -219.2428 666.4304
_cons | 339.631 277.7912 1.22 0.222 -205.6719 884.934
-------------+----------------------------------------------------------------
sigma_u | 162.91602
sigma_e | 249.27039
rho | .29930559 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Note: sigma_u and sigma_e are combined in the original metric.

Thank you!!! I think I got it (a bit better) now. You helped me a lot. I'm waiting for a reply of my professor what she thinks is the best possibility.

Last edited by sladmin; 28 Jan 2019, 09:28. Reason: anonymize original poster

Announcement

how to deal with missing values in panel data (1 Item is missing)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment