Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • only keep observations used in lagged regression

    Hello,

    I want to perform xtreg, fe on a panel dataset with lagged independent variables (1, 2 and 3 lags), while the dependent variable is not lagged. However, there are a lot of missing values in my dataset. Therefore I want to delete the observations that were not used in the regression and winsorize the remaining ones. I only have data for my dependent variable for 2014-2018, but for the independent variable, my data goes from 2010 to 2018. When I use
    Code:
    gen var = e(sample)
    only observations from 2014-2018 are marked, so when I drop all zeroes, the lags are dropped too. So I want to obtain that all observations used in the sample are kept in the sample, together with their lagged observations. Could somebody help me, please?

    Kind regards,
    Timea De Wispelaere



  • #2
    Instead of using the lag operator in your xtreg statement, you could lag the variables before then they should be retained with the e(sample) statement. Alternatively, you certainly can do it with a very long nasty drop if statement. Or, you might be able to combine your original generate plus some conditions based on the lags in the drop statement.

    Comment


    • #3
      Hello Timea,

      I had the same question and stumbled across yours while looking for an answer. After some thinking I found an easy solution and while it's probably irrelevant for you now it may help others with the same question.

      This is my solution dropping variables not used for Lags:
      First perform your main panel regression defining your sample. Then generate a sample-variable with missing values. Set the sample-variable to 1 for all observations in e(sample). Set the sample-variable to 2 for all obvservations that are still missing a value on the sample-variable but whose lead of the sample-variable has the value 1. This can be repeated with a two-period-lead for the two-period-lag, setting the sample-variable to value 3 and so on. Now check if everyhing worked by performing the regression again but only for variables not missing on the new sample-variable. The number of used observations should be the same, so now you can drop all observations which are missing on your sample variable.
      With this method you can also identify, how many observations are only used to obtain lags.

      Code:
      *Define Sample
      xtreg depvar indepvar1 L.indepvar2
      *Generate sample-variable
      cap drop usedvar
      gen usedvar = .
      replace usedvar = 1 if e(sample)
      replace usedvar = 2 if F.usedvar == 1 & usedvar == .
      tab usedvar
      *Chek if Sample is the same
      xtreg depvar indepvar1 L.indepvar2 if usedvar != .
      *Drop all observations not included in the final sample
      drop if usedvar == .
      Last edited by Anton Lang; 12 Jul 2023, 07:20.

      Comment

      Working...
      X