FE with demeaned variables: Coefficients do not match.

Student

Join Date: Jan 2015

Posts: 6
#1

FE with demeaned variables: Coefficients do not match.

15 Sep 2016, 14:34

Below is my code. The coefficient I get from xtreg is 3.39. The coefficient I get from demeaned variables is 3.38. I believe the coefficient should be exactly the same!
Please correct code. Much appreciated!!

************************************************** *******************
sysuse auto, clear

drop if missing(rep78)

*Fixed effects by demeaning data

*Demean by foreign

foreach x in price weight {
egen mean_`x' = mean(`x'), by(foreign)
gen c_f_`x' = mean_`x' - `x'
}

*Demean by rep78
foreach x in price weight {
egen mean_c_f_`x' = mean(c_f_`x'), by(rep78)
gen c_f_r_`x' = mean_c_f_`x' - c_f_`x' // r stands for rep78. So centered by foreign, rep78
}

*Compare coefficients I obtained with demeaned data and xtreg.

reg c_f_r_price c_f_r_weight, robust
xtset foreign
xtreg price weight i.rep78, fe
Tags: None
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#2

15 Sep 2016, 15:20

Please use the octothorpe (#) tags to format your code.

It is not clear to me why you would expect this to produce the same results.

If you want to replicate what xtreg, fe does, try this:

Code:

sysuse auto, clear drop if missing(rep78) tab rep78, gen(d_) drop d_1 foreach x of varlist price weight d_* { egen mean_`x' = mean(`x'), by(foreign) gen c_f_`x' = mean_`x' - `x' } reg c_f_price c_f_weight c_f_d* xtset foreign xtreg price weight i.rep78, fe
Comment
Student

Join Date: Jan 2015

Posts: 6
#3

15 Sep 2016, 15:52

I am trying to understand what STATA does when you have two or more dummy variables in a regression. In this example the dummy variables are foreign and rep78 (Values 1-5). ie with your above example suppose you did not have the command "tab rep78, gen(d_)" and other inbuilt STATA commands like areg. But you could demean the data. How could you get estimates similar to xtreg in this case?
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#4

15 Sep 2016, 16:03

Two sets of dummies work the same as one:

Code:

sysuse auto, clear drop if missing(rep78) tab rep78, gen(d_) drop d_1 xtile mpg_q =mpg, nq(4) tab mpg_q, gen(d2_) drop d2_1 foreach x of varlist price weight d_* d2_* { egen mean_`x' = mean(`x'), by(foreign) gen c_f_`x' = mean_`x' - `x' } reg c_f_price c_f_weight c_f_d* xtset foreign xtreg price weight i.rep78 i.mpg_q, fe

-tab, gen()- is just a convenient way to turn a categorical variables into dummies. This is what Stata (NB the spelling) uses under the hood. Then it demeans the dummies, like I did by hand.

If you want the SEs to match, you can use add i.foreign to the -reg-.

Last edited by Dimitriy V. Masterov; 15 Sep 2016, 16:06.
Comment
Student

Join Date: Jan 2015

Posts: 6
#5

15 Sep 2016, 16:58

I think my question was misunderstood. Let me put it this way. Suppose that rep78 had too many categories so placing dummy variables of rep78 was not feasible. . Then answer the question below:

I am trying to understand what STATA does when you have two or more dummy variables in a regression. In this example the dummy variables are foreign and rep78 (Values 1-5). ie with your above example suppose you did not have the command "tab rep78, gen(d_)" and other inbuilt STATA commands like areg. But you could demean the data. How could you get estimates similar to xtreg in this case?
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#6

15 Sep 2016, 17:35

Sorry, but I don't really follow what you are saying. In the last example, there are two sets of dummies, one for rep78 and one for mpg_q. There is no foreign dummy in either regression: it is removed by the demeaning transformation since it does not vary within panels. Thus you don't need to create dummies for it, demean them, and include them in the regression.

Note that you are not using rep78 as the panel id variable. You are using foreign in -xtset-. If you wanted rep78 fixed effects, you would proceed differently. In either case, you cannot avoid turning your time-varying categorical variables into dummies and demeaning them if you want to add them as regressors.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#7

15 Sep 2016, 17:37

If the problem is that one of the categorical variables has so many categories that trying to represent it with indicator variables is not feasible due to constraints on memory or the like, then you can't do it with -xtreg, fe-. For that situation, use Sergio Correia's reghdfe.ado (-ssc install reghdfe-).

In terms of understanding what Stata does, Dimitry has shown you how it works: when a second "fixed effect" is added to the model it acts just like any other predictor in the varlist of the -xtreg, fe- command and it gets demeaned. This is true whether it's an indicator representing a categorical variable ("dummy") or not.

Finally, it is the cultural norm in this community that we use our real given and surnames as our username, to promote collegiality, professionalism, and responsibility. At your earliest convenience, click on "Contact Us" in the lower right corner of the page and send a message to the forum administrators requesting that they change your user name.

Thank you.
2 likes
Comment
Student

Join Date: Jan 2015

Posts: 6
#8

15 Sep 2016, 17:42

I think this article helps in understanding what I was trying to do http://www.levyinstitute.org/pubs/wp_782.pdf
Comment
Student

Join Date: Jan 2015

Posts: 6
#9

15 Sep 2016, 18:04

It provides an explanation and solution.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#10

15 Sep 2016, 18:24

Student Now it is much clearer what you want to do. Take a look at the second example in the SJ paper about -reghdfe- for how to do it by hand.
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#11

15 Sep 2016, 22:57

Hi Student

Your code is wrong because you are missing an extra for loop outside your two loops. If you look at the article you cited, Fernando mentions that you need to iterate that procedure until it converges (but you only did one iteration).

There is an example on slide 10 of this presentation: http://www.stata.com/meeting/chicago...16_correia.pdf
Note that -areg- is doing the same as your -egen- (just demeaning each variable on two sets of fixed effects), but then outside you need to run until converge (or until 10 in this case)

Best,
S
Comment

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

#12

15 Sep 2016, 23:02

The iterative FRA algorithm you have in mind is very similar to the G-P one that Clyde Schechter and I suggested. In the FRA approach, you de-mean the covariates that aren't FEs effects in each iteration one by one. This is sort of like what -xtreg, fe- does, but only on the right hand side and repeatedly. In the G-P approach, you update the two generated FEs variables that then included in the final model. I think the main problem with your code is that you only did it once rather than iterating it, but you still got pretty close, which tells you how fast this converges.

The code below implements two-dimensional FEs using both methods as well as with -xtreg, fe- using a dummy. In this simple example, FRA does take about twice as long, probably because of the nested looping.

Here's the output it produces:

Code:

------------------------------------------------------------------
    Variable |  by_hand      xtreg_fe     reg_hdfe     fra_reg    
-------------+----------------------------------------------------
      weight |    3.71236      3.71236      3.71236      3.71236  
    headroom | -711.31042   -711.31042   -711.31042   -711.31033  
------------------------------------------------------------------

As you can see, all the G-P/HDFE coefficients are very close to FRA ones.

Here's the code that spits that out:

Code:

clear all
sysuse auto
drop if missing(rep78)

/* (1) G-P Approach */
/* (A) Initialize parameters */
local rss1           = 1  // residual sum of squares
generate double temp = 0  // tempvar to hold the FEs
generate double fe1  = 0  // rep78 FE
generate double fe2  = 0  // foreign FE

/* Alternate between estimating weight and headroom coefficients and the two FEs */
while abs(`rss2'-`rss1')>epsdouble() {
    quietly {
        local rss2=`rss1'
        regress price c.weight c.headroom fe1 fe2
        local rss1=e(rss)
        capture drop res
        predict double res, res
        replace temp = res + _b[fe1]*fe1, nopromote
        capture drop fe1
        egen double fe1 = mean(temp), by(rep78)
        replace temp = res + _b[fe2]*fe2, nopromote
        capture drop fe2
        egen double fe2 = mean(temp), by(foreign)
    }
}

/* (B) Fit the regressions */
quietly {
    regress price c.weight c.headroom fe1 fe2
    estimates store by_hand

    xtset foreign
    xtreg price c.weight c.headroom i.rep78, fe
    estimates store xtreg_fe

    reghdfe price c.weight c.headroom, absorb(rep78 foreign)
    estimates store reg_hdfe
}

/* (2) FRA Procedure */
local rmse1 = 1

while abs(`rmse0'-`rmse1')>epsdouble() {
    quietly {
        local rmse0 = `rmse1'
        reg price c.weight c.headroom
        local rmse1 = e(rmse)

        /* De-mean everything by the fixed effects */
        foreach k of varlist rep78 foreign {
            foreach v of varlist weight headroom {
                capture drop v_prime
                egen double v_prime = mean(`v'), by(`k')
                replace `v' = `v'-v_prime
            }
        }
    }
}

qui reg price weight headroom, nocons
estimates store fra_reg


/* (3) Compare the 4 sets of coefficients */
estimates table *, b(%10.5f) keep(weight headroom)

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2471
#13

16 Sep 2016, 07:15

Dear Student,
As it has been explained here, I do a nested looping for the iterations. Using a two-step iteration would be enough if and only if you have a fully balanced panel. aka, each combination of the rep78 and foreign is observed at least once. In the example you provide, the Foreign=1 and rep78==1 or ==2 are never observed, which causes the observed differences in the methods.
It has also been pointed to you to look at the -reghdfe- program written by Sergio Correira. I also have my own version -regxfe- that was published on stata journal last year, that also allows for multiple fixed effects and one alternative correction to the degrees of freedom. (stj 15(3)).
Best
Fernando Rios Avila
1 like
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#14

16 Sep 2016, 12:55

-regxfe- seems to be both much faster than and will actually match the others, compared to my hand-coded version. Not entirely sure why on the last one, but this was an educational thread that makes me grateful for the Statalist community. I am sure glad we are not all bowling alone!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35728
#15

16 Sep 2016, 13:00

Less interestingly, perhaps, but as a supporter of community norms I'd like to echo Clyde's request in #7:

it is the cultural norm in this community that we use our real given and surnames as our username, to promote collegiality, professionalism, and responsibility. At your earliest convenience, click on "Contact Us" in the lower right corner of the page and send a message to the forum administrators requesting that they change your user name.

For more explanation, http://www.statalist.org/forums/help#realnames

http://www.statalist.org/forums/help#adviceextras

The same request was made in January 2015: http://www.statalist.org/forums/foru...d-observations
Comment

Announcement