Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sorting issue with multiple imputations

    Hello,

    I use panel data that I have set via

    Code:
    mi xtset id survey
    With id being the identification number and survey the time variable (1 or 2)

    Now I also have multiple imputation (_mi_m).

    I want to generate a new variable that I will then use in my regression.

    Code:
    sort id survey _mi_m
    Code:
    by  id _mi_m: gen treatment=0 if (expectation==2 & f.gift_received==2 & gift_total < 50000000) | (l.expectation==2 & gift_received==2 & gift_total < 50000000)
    
    by id _mi_m: replace treatment=1 if (expectation==2 & f.gift_received==1 & gift_total < 50000000) | (l.expectation==2 & gift_received==1 & gift_total < 50000000)
    I cannot have all 3 variables or the l. and f. operators won't work.

    However, for some reason I cannot get the code above to run. It always tells me not sorted, even though I have sorted "sort id survey _mi_m"

    Shouldn't by id _mi_m then work? Or what am I doing wrong? Why does stata not let me run the code above?


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id byte(_mi_m survey expectation gift_received)
    27 0 2 1 2
    27 1 2 1 2
    27 2 2 1 2
    27 3 2 1 2
    27 4 2 1 2
    27 5 2 1 2
    36 0 1 2 1
    36 0 2 2 2
    36 1 1 2 1
    36 1 2 2 2
    36 2 1 2 1
    36 2 2 2 2
    36 3 1 2 1
    36 3 2 2 2
    36 4 1 2 1
    36 4 2 2 2
    36 5 1 2 1
    36 5 2 2 2
    67 0 1 2 1
    67 0 2 2 1
    67 1 1 2 1
    67 1 2 2 1
    67 2 1 2 1
    67 2 2 2 1
    67 3 1 2 1
    67 3 2 2 1
    67 4 1 2 1
    67 4 2 2 1
    67 5 1 2 1
    67 5 2 2 1
    86 0 1 2 1
    86 0 2 2 2
    86 1 1 2 1
    86 1 2 2 2
    86 2 1 2 1
    86 2 2 2 2
    86 3 1 2 1
    86 3 2 2 2
    86 4 1 2 1
    86 4 2 2 2
    86 5 1 2 1
    86 5 2 2 2
    92 0 1 2 2
    92 0 2 2 2
    92 1 1 2 2
    92 1 2 2 2
    92 2 1 2 2
    92 2 2 2 2
    92 3 1 2 2
    92 3 2 2 2
    92 4 1 2 2
    92 4 2 2 2
    92 5 1 2 2
    92 5 2 2 2
    128 0 1 2 2
    128 0 2 2 2
    128 1 1 2 2
    128 1 2 2 2
    128 2 1 2 2
    128 2 2 2 2
    128 3 1 2 2
    128 3 2 2 2
    128 4 1 2 2
    128 4 2 2 2
    128 5 1 2 2
    128 5 2 2 2
    130 0 1 1 1
    130 1 1 1 1
    130 2 1 1 1
    130 3 1 1 1
    130 4 1 1 1
    130 5 1 1 1
    178 0 1 2 1
    178 1 1 2 1
    178 2 1 2 1
    178 3 1 2 1
    178 4 1 2 1
    178 5 1 2 1
    303 0 1 1 2
    303 1 1 1 2
    303 2 1 1 2
    303 3 1 1 2
    303 4 1 1 2
    303 5 1 1 2
    484 0 1 2 1
    484 0 2 2 2
    484 1 1 2 1
    484 1 2 2 2
    484 2 1 2 1
    484 2 2 2 2
    484 3 1 2 1
    484 3 2 2 2
    484 4 1 2 1
    484 4 2 2 2
    484 5 1 2 1
    484 5 2 2 2
    594 0 1 1 1
    594 0 2 2 2
    594 1 1 1 1
    594 1 2 2 2
    end

  • #2
    However, for some reason I cannot get the code above to run. It always tells me not sorted, even though I have sorted "sort id survey _mi_m"

    Shouldn't by id _mi_m then work? Or what am I doing wrong? Why does stata not let me run the code above?
    No, it shouldn't. After your sort command, the data are sorted by id survey _mi_m. But that is definitely not sorted by id _mi_m because consecutive values of _mi_m within an id are now separated by all the values of survey.

    I think your approach to this is wrong. You are trying to include _mi_m in the sorting somehow because you want this to be done separately for each of your multiple imputations. But -xtset- won't allow you to do that. But Stata's -mi- commands have a mechanism for carrying out commands separately in each imputation: -mi xeq-. Here's an example of how it works using one of StataCorp's example panel data sets, to which I make some modifications to get a multiply imputed data set.

    Code:
    // CREATE A DEMONSTRATION PANEL DATA SET
    // WITH MISSING VALUES AND MULTIPLE IMPUTATIONS
     webuse grunfeld, clear
    
    set seed 1234
    replace invest = . if runiform() < 0.1
    replace mvalue = . if runiform() < 0.1
    
    mi set mlong
    mi register imputed invest mvalue
    mi register regular company year kstock time
    
    mi impute mvn invest mvalue = kstock time, add(5)
    
    // SHOW HOW TO USE LAG AND LEAD OPERATORS
    mi xtset company year
    mi xeq: by company (year), sort: gen byte treatment = L.mvalue > F.mvalue
    All of that said, I'm not really sure that this is the appropriate way to handle your problem. My knowledge of multiple imputation is limited, so I may have this wrong, but I would think that it is better to actually generate the treatment variable before doing multiple imputation, and if it contains missing values, register it as an imputed variable and then impute it.

    Comment


    • #3
      Sorry for the late replay, I just wanted to say thank you Clyde. You are right, my approach was wrong to this. Think I should have something now that works.

      Comment

      Working...
      X