Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Same observations across all models without using e(sample)

    Hello!
    I'm running a set of regressions using Employer-Employee Data (LEED). I'm a running a OLS, worker fixed effects, firm fixed effects and finally worker and firm fixed effects. I want to ensure that the observations used is the same across all models. In a normal situation, I would solve the problem with the following example:

    Code:
    webuse nlswork, clear
    xtset idcode
    
    reg ln_w grade age ttl_exp tenure not_smsa south
        gen s1 = (e(sample))
    
    xtreg ln_w grade age ttl_exp tenure not_smsa south, fe
        gen s2 = (e(sample))
        
    reghdfe ln_w grade age ttl_exp tenure not_smsa south, absorb(idcode)    
        gen s3 = (e(sample))
        
    reg ln_w grade age ttl_exp tenure not_smsa south if s1 == 1 & s2 == 1 & s3 ==1
        est store m1
    
    xtreg ln_w grade age ttl_exp tenure not_smsa south if s1 == 1 & s2 == 1 & s3 ==1, fe
        est store m2
        
    reghdfe ln_w grade age ttl_exp tenure not_smsa south if s1 == 1 & s2 == 1 & s3 ==1, absorb(idcode)    
        est store m3
        
        esttab m1 m2 m3
    However, this implies running all the models first, which I'm trying to avoid because of the time it will take to run the reghdfe command using millions of observations.

    A solution I've attempted was to ensure that none of my variables had missing values and that my each id would appear at least twice so that FE can be estimated. Example:

    Code:
    webuse nlswork, clear
    xtset idcode
    
    egen miss = rowmiss(ln_w grade age ttl_exp tenure not_smsa south)
    egen n_all = count(idcode), by(idcode)
    
    reg ln_w grade age ttl_exp tenure not_smsa south if n_all > 1 & miss == 0
        est store m1
        
    xtreg ln_w grade age ttl_exp tenure not_smsa south if n_all > 1 & miss == 0, fe
        est store m2
        
    reghdfe ln_w grade age ttl_exp tenure not_smsa south if n_all > 1 & miss == 0, absorb(idcode)    
        est store m3
        
        esttab m1 m2 m3
    However, when I run this second example, the 3rd column has 17 less observations than the rest due to singleton observations which I don't understand the cause.

    Thank you in advance for you help!
    Hélder

  • #2
    The treatment of singleton panels in fixed effects can proceed in two ways--including them or excluding them. Singleton observations provide no information about the coefficients in a fixed-effects regression because these regressions estimate within-panel effects only, and a singleton observation necessarily has no within-panel variation. They do, however, provide information about the fixed-effects themselves. You may notice, for example, that the coefficients in your three models are identical, but the constant terms differ--because of the inclusion vs exclusion of singletons.

    -xtreg, fe- includes the singletons. If you want to exclude them, you have to identify them ahead of time and either drop them from the data set or exclude them with an appropriate -if- qualifier. By contrast, -reghdfe-, by default, excludes them. However, you can easily get -reghdfe- to retain them by adding the -keepsingletons- option.

    Comment


    • #3
      Thank you for you reply Clyde.

      Looking to my second example, do you have any suggestions on how can I identify the singletons missed by the filter
      Code:
      if n_all > 1
      on my regressions? I assumed that having at least 2 observations per individual would be enough to deal with this problem, but the reghdfe command still identifies 17.

      Comment


      • #4
        Your variable n_all is simply a count of all the observations for the idcode. And with -miss == 0-, you exclude observations with missing values for any model variables. But now think what happens if an idcode has several observations, and all but one of them has a missing value on some model variable. Because there are many observations of idcode, n_all will be > 1. And when we then filter out the all but one that has no missing values on any model variable, we are left with a singleton.

        What you need, instead of n_all, is a variable that gives a count of the number of observations for idcode that have no missing values on the variables.


        Code:
        clear*
        
        webuse nlswork, clear
        xtset idcode
        
        egen miss = rowmiss(ln_w grade age ttl_exp tenure not_smsa south)
        egen n_usable = total(miss == 0), by(idcode)
        
        reg ln_w grade age ttl_exp tenure not_smsa south if n_usable > 1 & miss == 0
            est store m1
            
        xtreg ln_w grade age ttl_exp tenure not_smsa south if n_usable > 1 & miss == 0, fe
            est store m2
            
            
        reghdfe ln_w grade age ttl_exp tenure not_smsa south if n_usable > 1 & miss == 0, ///
            absorb(idcode)
            est store m3
            
        esttab m1 m2 m3
        You will now see that all three of the regressions omit anything that would be otherwise omitted as a singleton by -reghdfe- but retained by -xtreg, fe-.

        Comment


        • #5
          Thank you Clyde, that was very helpful!

          Comment

          Working...
          X