Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata Omits Observations from regression

    I am running a regression using reghdfe.The following regression produces an event study style figure. I have longitudinal data at the individual level. I am using many fixed effects. Using reghdfe, I am aware that singletons are dropped, however, I identify the singletons and account for the fact that they are dropped iteratively. I do this in order to have a balanced panel. However, stata is still not including some of the observations that I expect it to include in the regression. I double checked that those observations are not singletons, and I also know that none of the variables I use in the regression are missing. My code can be found below.

    bys tenure_group year orchestra SAMPLE: gen is_sing=_N==1
    bys principal year orchestra SAMPLE is_sing: gen is_sing2= _N==1
    replace is_sing2 = . if is_sing==1
    bys musician_id pos SAMPLE is_sing is_sing2: gen is_sing3= _N==1
    replace is_sing3 = . if is_sing2==1|is_sing==1
    egen is_singall = rowtotal(is_sing is_sing2 is_sing3), missing
    tab is_singall if SAMPLE==1

    drop if inlist(is_singall,1)

    global es5any lag10_any lag9_any lag8_any lag7_any lag6_any lag5_any lag4_any lag3_any lag2_any lag1_any lead0_any lead1_any lead2_any lead3_any lead4_any lead5_any

    reghdfe in990 $es5any, absorb(tenure_1#year#orchestra tenure_2#year#orchestra tenure_3#year#orchestra tenure_4#year#orchestra tenure_5#year#orchestra tenure_6#year#orchestra principal#orchestra#year musician_id#pos)

    Here is an example of observations that are not included in the regression but should be:

    fullname year in990 t principal tenure_group _est_reg
    X 2005 0 -3 1 1 0
    X 2006 1 -2 1 1 1
    X 2007 1 -1 1 1 1
    X 2008 1 0 1 1 1
    X 2009 1 1 1 1 1
    X 2010 1 2 1 2 0
    X 2011 1 3 1 2 1
    X 2012 1 4 1 2 1
    X 2013 1 5 1 2 1
    X 2014 1 6 1 2 1
    X 2015 1 7 1 3 0
    X 2016 1 8 1 3 1
    X 2017 0 9 1 3 1

    in990 is the outcome variable, t is years relative to the year of treatment, principal is a dummy for FEs, tenure group is also used for the FEs, _est_reg is 1 for observations in the regressions and 0 otherwise.

    The bold observations are not in the regression even though I verified they are NOT singletons. I am not sure what the problem is!


    Dana



  • #2
    Well, unfortunately, the example data you show does not include the variables that are in the regression. But I'm going to speculate here. The global macro (and you should use a local, not a global for this, but that isn't the cause of your problem here) es5any contains a bunch of variables whose names begin with lag and lead. I'm going to guess that these variables are lagged and forward values of some variable called any. Now, let's look at an observation in year N. If N is the first year of the panel, all of the variables lag_any will be missing because, you cannot compute any lags from the first observation. Similarly if N is the second year of the panel's data, then lag2 through lag10 will be missing. A similar consideration applies at the end: if N is the last year of the panel's data, all lead*any variables will have missing values. If N is the penultimate year, then lead1_any will be non-missing, but all the other lead's will be missing. Since your variables run from lag10 through lead5, the only observations that will have non-missing values for all of these variables will be the 11th through _N-4th (where _N is the total number of observations). Note also that unless _N >15 this means no observations at all will be included!

    Added: And the above doesn't take into account the possibility that after all of the observations with one or more missing leads or lags are removed, you will be left with a singleton!

    Comment


    • #3
      I'll just add that dataex is great, but you should make sure you can replicate your problem using the extract you've created. People leave out key variables all the time or else the particular extract doesn't reproduce the problem.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment

      Working...
      X