sorting issue with multiple imputations

Oscar Weinzettl

Join Date: Nov 2018
Posts: 70

sorting issue with multiple imputations

22 Jun 2019, 20:30

Hello,

I use panel data that I have set via

Code:

mi xtset id survey

With id being the identification number and survey the time variable (1 or 2)

Now I also have multiple imputation (_mi_m).

I want to generate a new variable that I will then use in my regression.

Code:

sort id survey _mi_m

Code:

by  id _mi_m: gen treatment=0 if (expectation==2 & f.gift_received==2 & gift_total < 50000000) | (l.expectation==2 & gift_received==2 & gift_total < 50000000)

by id _mi_m: replace treatment=1 if (expectation==2 & f.gift_received==1 & gift_total < 50000000) | (l.expectation==2 & gift_received==1 & gift_total < 50000000)

I cannot have all 3 variables or the l. and f. operators won't work.

However, for some reason I cannot get the code above to run. It always tells me not sorted, even though I have sorted "sort id survey _mi_m"

Shouldn't by id _mi_m then work? Or what am I doing wrong? Why does stata not let me run the code above?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id byte(_mi_m survey expectation gift_received)
27 0 2 1 2
27 1 2 1 2
27 2 2 1 2
27 3 2 1 2
27 4 2 1 2
27 5 2 1 2
36 0 1 2 1
36 0 2 2 2
36 1 1 2 1
36 1 2 2 2
36 2 1 2 1
36 2 2 2 2
36 3 1 2 1
36 3 2 2 2
36 4 1 2 1
36 4 2 2 2
36 5 1 2 1
36 5 2 2 2
67 0 1 2 1
67 0 2 2 1
67 1 1 2 1
67 1 2 2 1
67 2 1 2 1
67 2 2 2 1
67 3 1 2 1
67 3 2 2 1
67 4 1 2 1
67 4 2 2 1
67 5 1 2 1
67 5 2 2 1
86 0 1 2 1
86 0 2 2 2
86 1 1 2 1
86 1 2 2 2
86 2 1 2 1
86 2 2 2 2
86 3 1 2 1
86 3 2 2 2
86 4 1 2 1
86 4 2 2 2
86 5 1 2 1
86 5 2 2 2
92 0 1 2 2
92 0 2 2 2
92 1 1 2 2
92 1 2 2 2
92 2 1 2 2
92 2 2 2 2
92 3 1 2 2
92 3 2 2 2
92 4 1 2 2
92 4 2 2 2
92 5 1 2 2
92 5 2 2 2
128 0 1 2 2
128 0 2 2 2
128 1 1 2 2
128 1 2 2 2
128 2 1 2 2
128 2 2 2 2
128 3 1 2 2
128 3 2 2 2
128 4 1 2 2
128 4 2 2 2
128 5 1 2 2
128 5 2 2 2
130 0 1 1 1
130 1 1 1 1
130 2 1 1 1
130 3 1 1 1
130 4 1 1 1
130 5 1 1 1
178 0 1 2 1
178 1 1 2 1
178 2 1 2 1
178 3 1 2 1
178 4 1 2 1
178 5 1 2 1
303 0 1 1 2
303 1 1 1 2
303 2 1 1 2
303 3 1 1 2
303 4 1 1 2
303 5 1 1 2
484 0 1 2 1
484 0 2 2 2
484 1 1 2 1
484 1 2 2 2
484 2 1 2 1
484 2 2 2 2
484 3 1 2 1
484 3 2 2 2
484 4 1 2 1
484 4 2 2 2
484 5 1 2 1
484 5 2 2 2
594 0 1 1 1
594 0 2 2 2
594 1 1 1 1
594 1 2 2 2
end

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

22 Jun 2019, 21:32

However, for some reason I cannot get the code above to run. It always tells me not sorted, even though I have sorted "sort id survey _mi_m"

Shouldn't by id _mi_m then work? Or what am I doing wrong? Why does stata not let me run the code above?

No, it shouldn't. After your sort command, the data are sorted by id survey _mi_m. But that is definitely not sorted by id _mi_m because consecutive values of _mi_m within an id are now separated by all the values of survey.

I think your approach to this is wrong. You are trying to include _mi_m in the sorting somehow because you want this to be done separately for each of your multiple imputations. But -xtset- won't allow you to do that. But Stata's -mi- commands have a mechanism for carrying out commands separately in each imputation: -mi xeq-. Here's an example of how it works using one of StataCorp's example panel data sets, to which I make some modifications to get a multiply imputed data set.

Code:

// CREATE A DEMONSTRATION PANEL DATA SET // WITH MISSING VALUES AND MULTIPLE IMPUTATIONS webuse grunfeld, clear set seed 1234 replace invest = . if runiform() < 0.1 replace mvalue = . if runiform() < 0.1 mi set mlong mi register imputed invest mvalue mi register regular company year kstock time mi impute mvn invest mvalue = kstock time, add(5) // SHOW HOW TO USE LAG AND LEAD OPERATORS mi xtset company year mi xeq: by company (year), sort: gen byte treatment = L.mvalue > F.mvalue

All of that said, I'm not really sure that this is the appropriate way to handle your problem. My knowledge of multiple imputation is limited, so I may have this wrong, but I would think that it is better to actually generate the treatment variable before doing multiple imputation, and if it contains missing values, register it as an imputed variable and then impute it.
Comment
Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#3

25 Jun 2019, 05:14

Sorry for the late replay, I just wanted to say thank you Clyde. You are right, my approach was wrong to this. Think I should have something now that works.
Comment

Announcement

sorting issue with multiple imputations

Comment

Comment