operators and how my data is arranged

Oscar Weinzettl

Join Date: Nov 2018
Posts: 70

operators and how my data is arranged

13 Jun 2019, 06:44

Hello,

I use panel data that I have set via

Code:

mi xtset id survey

With id being the identification number and survey the time variable (1 or 2)

Now I also have multiple imputation, and I don't want those imputation to vary when regressed so I do the following:

Code:

sort id survey _mi_m

Code:

bys id survey _mi_m: gen treatment=0 if (expectation==2 & f.gift_received==2 & gift_total < 50000000) | (l.expectation==2 & gift_received==2 & gift_total < 50000000) 
bys id survey _mi_m: replace treatment=1 if (expectation==2 & f.gift_received==1 & gift_total < 50000000) | (l.expectation==2 & gift_received==1 & gift_total < 50000000)

This is the variable that I later want to be my dependent variable in the regression. However, I was warned that there could be an issue here with my sorting and the use of my l. and f. operators.

However, I cannot find any issue, and no thread I can find on statalist or elsewhere indicates this. bys and sort are functionally the same so they shouldn't cause any trouble.

The only thing I can think of is that the l. and f. operators use the presuppossed sort from mi xtset, which is id and survey. However I also just want to sort within each id the multiple imputations. Is this a source of issue? If not, what could be? What am I missing?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id byte(_mi_m survey expectation gift_received)
27 0 2 1 2
27 1 2 1 2
27 2 2 1 2
27 3 2 1 2
27 4 2 1 2
27 5 2 1 2
36 0 1 2 1
36 0 2 2 2
36 1 1 2 1
36 1 2 2 2
36 2 1 2 1
36 2 2 2 2
36 3 1 2 1
36 3 2 2 2
36 4 1 2 1
36 4 2 2 2
36 5 1 2 1
36 5 2 2 2
67 0 1 2 1
67 0 2 2 1
67 1 1 2 1
67 1 2 2 1
67 2 1 2 1
67 2 2 2 1
67 3 1 2 1
67 3 2 2 1
67 4 1 2 1
67 4 2 2 1
67 5 1 2 1
67 5 2 2 1
86 0 1 2 1
86 0 2 2 2
86 1 1 2 1
86 1 2 2 2
86 2 1 2 1
86 2 2 2 2
86 3 1 2 1
86 3 2 2 2
86 4 1 2 1
86 4 2 2 2
86 5 1 2 1
86 5 2 2 2
92 0 1 2 2
92 0 2 2 2
92 1 1 2 2
92 1 2 2 2
92 2 1 2 2
92 2 2 2 2
92 3 1 2 2
92 3 2 2 2
92 4 1 2 2
92 4 2 2 2
92 5 1 2 2
92 5 2 2 2
128 0 1 2 2
128 0 2 2 2
128 1 1 2 2
128 1 2 2 2
128 2 1 2 2
128 2 2 2 2
128 3 1 2 2
128 3 2 2 2
128 4 1 2 2
128 4 2 2 2
128 5 1 2 2
128 5 2 2 2
130 0 1 1 1
130 1 1 1 1
130 2 1 1 1
130 3 1 1 1
130 4 1 1 1
130 5 1 1 1
178 0 1 2 1
178 1 1 2 1
178 2 1 2 1
178 3 1 2 1
178 4 1 2 1
178 5 1 2 1
303 0 1 1 2
303 1 1 1 2
303 2 1 1 2
303 3 1 1 2
303 4 1 1 2
303 5 1 1 2
484 0 1 2 1
484 0 2 2 2
484 1 1 2 1
484 1 2 2 2
484 2 1 2 1
484 2 2 2 2
484 3 1 2 1
484 3 2 2 2
484 4 1 2 1
484 4 2 2 2
484 5 1 2 1
484 5 2 2 2
594 0 1 1 1
594 0 2 2 2
594 1 1 1 1
594 1 2 2 2
end

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

13 Jun 2019, 12:24

The only thing I can think of is that the l. and f. operators use the presuppossed sort from mi xtset, which is id and survey. However I also just want to sort within each id the multiple imputations. Is this a source of issue?

Yes, I believe that is the case. In any case, there is no reason you need to sort the data in the order your commands are specifying in order to run those particular commands. So I would leave things in the sort order that -mi xtset- created, and run these commands. If later on you have a need to specifically sort them on id survey _mi_m then you can do so when that need arises. (To be honest, I can't think of any reason you would actually need to sort the data that way, but I haven't spent a lot of time trying to figure out what that might be.)
Comment
Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#3

13 Jun 2019, 16:31

Thanks for the reply Clyde,

My professor told me if I don't also sort by the multiple imputation variable, then stata doesn't sort within the IDs, which he says gives me wrong outcomes when using the l. and f. operators. Which is why I want to sort by this third variable too. He also said I need to watch for the consequence of using bys or by when using l. and f. operators, but I cannot find any difference when using the operators. So I am wondering what I could be potentially doing wrong.

As you said, I can run the commands, but he is hinting that there may be an issue within these commands. But no video or previous thread I can find indicates what the issue could be.

I need to sort also by the MI, so I use

Code:

sort id survey _mi_m

and as far as I can tell by the help file and online sources, bys and by are functionally the same. So what could be the issue here?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

13 Jun 2019, 17:31

My professor told me if I don't also sort by the multiple imputation variable, then stata doesn't sort within the IDs, which he says gives me wrong outcomes when using the l. and f. operators.

I'm speculating here, because I have never actually used multiple imputation on a data set where I also needed time series operators. So I have no direct experience of this and am just working from general principles. But one of those general principles is that Stata is pretty picky about only using the L. and F. operators when the data organization supports it properly. In the non-MI context, if you have -xtset panelvar timevar- and then sort the data in some other order, the L. and F. operators won't work. By won't work, I mean that if you try to use them, Stata will refuse and give you an error message (usually, somewhat confusingly, a "data not sorted" message.) I am not aware of things working any differently in MI data sets, though, as I said, I have never actually tried it. But my sense is that if you are not getting error messages from those commands, then I would trust that Stata is doing things properly.

I think you need to challenge your professor to be more specific about what his or her concerns are--I can't see any, but maybe he or she knows this area better than I do.

bys and by are functionally the same

-bys- is just an abbreviation of -bysort-, which, in turn, is synonymous with -by varlist, sort-. If you first sort the data with the -sort varlist- command and then run -by varlist: whatever-, that is identical to just running -bys varlist: whatever-.
Comment

Announcement

operators and how my data is arranged

Comment

Comment

Comment