Same observations across all models without using e(sample)

Helder Costa

Join Date: Dec 2019
Posts: 72

Same observations across all models without using e(sample)

22 Apr 2023, 08:39

Hello!
I'm running a set of regressions using Employer-Employee Data (LEED). I'm a running a OLS, worker fixed effects, firm fixed effects and finally worker and firm fixed effects. I want to ensure that the observations used is the same across all models. In a normal situation, I would solve the problem with the following example:

Code:

webuse nlswork, clear
xtset idcode

reg ln_w grade age ttl_exp tenure not_smsa south
    gen s1 = (e(sample))

xtreg ln_w grade age ttl_exp tenure not_smsa south, fe
    gen s2 = (e(sample))
    
reghdfe ln_w grade age ttl_exp tenure not_smsa south, absorb(idcode)    
    gen s3 = (e(sample))
    
reg ln_w grade age ttl_exp tenure not_smsa south if s1 == 1 & s2 == 1 & s3 ==1
    est store m1

xtreg ln_w grade age ttl_exp tenure not_smsa south if s1 == 1 & s2 == 1 & s3 ==1, fe
    est store m2
    
reghdfe ln_w grade age ttl_exp tenure not_smsa south if s1 == 1 & s2 == 1 & s3 ==1, absorb(idcode)    
    est store m3
    
    esttab m1 m2 m3

However, this implies running all the models first, which I'm trying to avoid because of the time it will take to run the reghdfe command using millions of observations.

A solution I've attempted was to ensure that none of my variables had missing values and that my each id would appear at least twice so that FE can be estimated. Example:

Code:

webuse nlswork, clear
xtset idcode

egen miss = rowmiss(ln_w grade age ttl_exp tenure not_smsa south)
egen n_all = count(idcode), by(idcode)

reg ln_w grade age ttl_exp tenure not_smsa south if n_all > 1 & miss == 0
    est store m1
    
xtreg ln_w grade age ttl_exp tenure not_smsa south if n_all > 1 & miss == 0, fe
    est store m2
    
reghdfe ln_w grade age ttl_exp tenure not_smsa south if n_all > 1 & miss == 0, absorb(idcode)    
    est store m3
    
    esttab m1 m2 m3

However, when I run this second example, the 3rd column has 17 less observations than the rest due to singleton observations which I don't understand the cause.

Thank you in advance for you help!
Hélder

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

22 Apr 2023, 10:11

The treatment of singleton panels in fixed effects can proceed in two ways--including them or excluding them. Singleton observations provide no information about the coefficients in a fixed-effects regression because these regressions estimate within-panel effects only, and a singleton observation necessarily has no within-panel variation. They do, however, provide information about the fixed-effects themselves. You may notice, for example, that the coefficients in your three models are identical, but the constant terms differ--because of the inclusion vs exclusion of singletons.

-xtreg, fe- includes the singletons. If you want to exclude them, you have to identify them ahead of time and either drop them from the data set or exclude them with an appropriate -if- qualifier. By contrast, -reghdfe-, by default, excludes them. However, you can easily get -reghdfe- to retain them by adding the -keepsingletons- option.
1 like
Comment
Helder Costa

Join Date: Dec 2019

Posts: 72
#3

22 Apr 2023, 13:27

Thank you for you reply Clyde.

Looking to my second example, do you have any suggestions on how can I identify the singletons missed by the filter

Code:

if n_all > 1

on my regressions? I assumed that having at least 2 observations per individual would be enough to deal with this problem, but the reghdfe command still identifies 17.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#4

22 Apr 2023, 14:36

Your variable n_all is simply a count of all the observations for the idcode. And with -miss == 0-, you exclude observations with missing values for any model variables. But now think what happens if an idcode has several observations, and all but one of them has a missing value on some model variable. Because there are many observations of idcode, n_all will be > 1. And when we then filter out the all but one that has no missing values on any model variable, we are left with a singleton.

What you need, instead of n_all, is a variable that gives a count of the number of observations for idcode that have no missing values on the variables.

Code:

clear* webuse nlswork, clear xtset idcode egen miss = rowmiss(ln_w grade age ttl_exp tenure not_smsa south) egen n_usable = total(miss == 0), by(idcode) reg ln_w grade age ttl_exp tenure not_smsa south if n_usable > 1 & miss == 0 est store m1 xtreg ln_w grade age ttl_exp tenure not_smsa south if n_usable > 1 & miss == 0, fe est store m2 reghdfe ln_w grade age ttl_exp tenure not_smsa south if n_usable > 1 & miss == 0, /// absorb(idcode) est store m3 esttab m1 m2 m3

You will now see that all three of the regressions omit anything that would be otherwise omitted as a singleton by -reghdfe- but retained by -xtreg, fe-.
Comment
Helder Costa

Join Date: Dec 2019

Posts: 72
#5

23 Apr 2023, 06:20

Thank you Clyde, that was very helpful!
Comment

Announcement

Same observations across all models without using e(sample)

Comment

Comment

Comment

Comment