Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • FE with demeaned variables: Coefficients do not match.

    Below is my code. The coefficient I get from xtreg is 3.39. The coefficient I get from demeaned variables is 3.38. I believe the coefficient should be exactly the same!
    Please correct code. Much appreciated!!

    ************************************************** *******************
    sysuse auto, clear

    drop if missing(rep78)



    *Fixed effects by demeaning data

    *Demean by foreign

    foreach x in price weight {
    egen mean_`x' = mean(`x'), by(foreign)
    gen c_f_`x' = mean_`x' - `x'
    }



    *Demean by rep78
    foreach x in price weight {
    egen mean_c_f_`x' = mean(c_f_`x'), by(rep78)
    gen c_f_r_`x' = mean_c_f_`x' - c_f_`x' // r stands for rep78. So centered by foreign, rep78
    }


    *Compare coefficients I obtained with demeaned data and xtreg.

    reg c_f_r_price c_f_r_weight, robust
    xtset foreign
    xtreg price weight i.rep78, fe


  • #2
    Please use the octothorpe (#) tags to format your code.

    It is not clear to me why you would expect this to produce the same results.

    If you want to replicate what xtreg, fe does, try this:

    Code:
    sysuse auto, clear
    drop if missing(rep78)
    
    tab rep78, gen(d_)
    drop d_1
    
    foreach x of varlist price weight d_* {
    egen mean_`x' = mean(`x'), by(foreign)
    gen c_f_`x' = mean_`x' - `x'
    }
    
    reg c_f_price c_f_weight c_f_d*
    xtset foreign
    xtreg price weight i.rep78, fe

    Comment


    • #3
      I am trying to understand what STATA does when you have two or more dummy variables in a regression. In this example the dummy variables are foreign and rep78 (Values 1-5). ie with your above example suppose you did not have the command "tab rep78, gen(d_)" and other inbuilt STATA commands like areg. But you could demean the data. How could you get estimates similar to xtreg in this case?

      Comment


      • #4
        Two sets of dummies work the same as one:

        Code:
        sysuse auto, clear
        drop if missing(rep78)
        
        tab rep78, gen(d_)
        drop d_1
        
        xtile mpg_q =mpg, nq(4)
        tab mpg_q, gen(d2_)
        drop d2_1
        
        foreach x of varlist price weight d_* d2_* {
            egen mean_`x' = mean(`x'), by(foreign)
            gen c_f_`x' = mean_`x' - `x'
        }
         
        reg c_f_price c_f_weight c_f_d*  
        xtset foreign
        xtreg price weight i.rep78 i.mpg_q, fe
        -tab, gen()- is just a convenient way to turn a categorical variables into dummies. This is what Stata (NB the spelling) uses under the hood. Then it demeans the dummies, like I did by hand.

        If you want the SEs to match, you can use add i.foreign to the -reg-.
        Last edited by Dimitriy V. Masterov; 15 Sep 2016, 16:06.

        Comment


        • #5
          I think my question was misunderstood. Let me put it this way. Suppose that rep78 had too many categories so placing dummy variables of rep78 was not feasible. . Then answer the question below:

          I am trying to understand what STATA does when you have two or more dummy variables in a regression. In this example the dummy variables are foreign and rep78 (Values 1-5). ie with your above example suppose you did not have the command "tab rep78, gen(d_)" and other inbuilt STATA commands like areg. But you could demean the data. How could you get estimates similar to xtreg in this case?

          Comment


          • #6
            Sorry, but I don't really follow what you are saying. In the last example, there are two sets of dummies, one for rep78 and one for mpg_q. There is no foreign dummy in either regression: it is removed by the demeaning transformation since it does not vary within panels. Thus you don't need to create dummies for it, demean them, and include them in the regression.

            Note that you are not using rep78 as the panel id variable. You are using foreign in -xtset-. If you wanted rep78 fixed effects, you would proceed differently. In either case, you cannot avoid turning your time-varying categorical variables into dummies and demeaning them if you want to add them as regressors.

            Comment


            • #7
              If the problem is that one of the categorical variables has so many categories that trying to represent it with indicator variables is not feasible due to constraints on memory or the like, then you can't do it with -xtreg, fe-. For that situation, use Sergio Correia's reghdfe.ado (-ssc install reghdfe-).

              In terms of understanding what Stata does, Dimitry has shown you how it works: when a second "fixed effect" is added to the model it acts just like any other predictor in the varlist of the -xtreg, fe- command and it gets demeaned. This is true whether it's an indicator representing a categorical variable ("dummy") or not.

              Finally, it is the cultural norm in this community that we use our real given and surnames as our username, to promote collegiality, professionalism, and responsibility. At your earliest convenience, click on "Contact Us" in the lower right corner of the page and send a message to the forum administrators requesting that they change your user name.

              Thank you.

              Comment


              • #8
                I think this article helps in understanding what I was trying to do http://www.levyinstitute.org/pubs/wp_782.pdf

                Comment


                • #9
                  It provides an explanation and solution.

                  Comment


                  • #10
                    Student Now it is much clearer what you want to do. Take a look at the second example in the SJ paper about -reghdfe- for how to do it by hand.

                    Comment


                    • #11
                      Hi Student

                      Your code is wrong because you are missing an extra for loop outside your two loops. If you look at the article you cited, Fernando mentions that you need to iterate that procedure until it converges (but you only did one iteration).

                      There is an example on slide 10 of this presentation: http://www.stata.com/meeting/chicago...16_correia.pdf
                      Note that -areg- is doing the same as your -egen- (just demeaning each variable on two sets of fixed effects), but then outside you need to run until converge (or until 10 in this case)

                      Best,
                      S

                      Comment


                      • #12
                        The iterative FRA algorithm you have in mind is very similar to the G-P one that Clyde Schechter and I suggested. In the FRA approach, you de-mean the covariates that aren't FEs effects in each iteration one by one. This is sort of like what -xtreg, fe- does, but only on the right hand side and repeatedly. In the G-P approach, you update the two generated FEs variables that then included in the final model. I think the main problem with your code is that you only did it once rather than iterating it, but you still got pretty close, which tells you how fast this converges.

                        The code below implements two-dimensional FEs using both methods as well as with -xtreg, fe- using a dummy. In this simple example, FRA does take about twice as long, probably because of the nested looping.

                        Here's the output it produces:

                        Code:
                        ------------------------------------------------------------------
                            Variable |  by_hand      xtreg_fe     reg_hdfe     fra_reg    
                        -------------+----------------------------------------------------
                              weight |    3.71236      3.71236      3.71236      3.71236  
                            headroom | -711.31042   -711.31042   -711.31042   -711.31033  
                        ------------------------------------------------------------------
                        As you can see, all the G-P/HDFE coefficients are very close to FRA ones.

                        Here's the code that spits that out:

                        Code:
                        clear all
                        sysuse auto
                        drop if missing(rep78)
                        
                        /* (1) G-P Approach */
                        /* (A) Initialize parameters */
                        local rss1           = 1  // residual sum of squares
                        generate double temp = 0  // tempvar to hold the FEs
                        generate double fe1  = 0  // rep78 FE
                        generate double fe2  = 0  // foreign FE
                        
                        /* Alternate between estimating weight and headroom coefficients and the two FEs */
                        while abs(`rss2'-`rss1')>epsdouble() {
                            quietly {
                                local rss2=`rss1'
                                regress price c.weight c.headroom fe1 fe2
                                local rss1=e(rss)
                                capture drop res
                                predict double res, res
                                replace temp = res + _b[fe1]*fe1, nopromote
                                capture drop fe1
                                egen double fe1 = mean(temp), by(rep78)
                                replace temp = res + _b[fe2]*fe2, nopromote
                                capture drop fe2
                                egen double fe2 = mean(temp), by(foreign)
                            }
                        }
                        
                        /* (B) Fit the regressions */
                        quietly {
                            regress price c.weight c.headroom fe1 fe2
                            estimates store by_hand
                        
                            xtset foreign
                            xtreg price c.weight c.headroom i.rep78, fe
                            estimates store xtreg_fe
                        
                            reghdfe price c.weight c.headroom, absorb(rep78 foreign)
                            estimates store reg_hdfe
                        }
                        
                        /* (2) FRA Procedure */
                        local rmse1 = 1
                        
                        while abs(`rmse0'-`rmse1')>epsdouble() {
                            quietly {
                                local rmse0 = `rmse1'
                                reg price c.weight c.headroom
                                local rmse1 = e(rmse)
                        
                                /* De-mean everything by the fixed effects */
                                foreach k of varlist rep78 foreign {
                                    foreach v of varlist weight headroom {
                                        capture drop v_prime
                                        egen double v_prime = mean(`v'), by(`k')
                                        replace `v' = `v'-v_prime
                                    }
                                }
                            }
                        }
                        
                        qui reg price weight headroom, nocons
                        estimates store fra_reg
                        
                        
                        /* (3) Compare the 4 sets of coefficients */
                        estimates table *, b(%10.5f) keep(weight headroom)

                        Comment


                        • #13
                          Dear Student,
                          As it has been explained here, I do a nested looping for the iterations. Using a two-step iteration would be enough if and only if you have a fully balanced panel. aka, each combination of the rep78 and foreign is observed at least once. In the example you provide, the Foreign=1 and rep78==1 or ==2 are never observed, which causes the observed differences in the methods.
                          It has also been pointed to you to look at the -reghdfe- program written by Sergio Correira. I also have my own version -regxfe- that was published on stata journal last year, that also allows for multiple fixed effects and one alternative correction to the degrees of freedom. (stj 15(3)).
                          Best
                          Fernando Rios Avila

                          Comment


                          • #14
                            -regxfe- seems to be both much faster than and will actually match the others, compared to my hand-coded version. Not entirely sure why on the last one, but this was an educational thread that makes me grateful for the Statalist community. I am sure glad we are not all bowling alone!

                            Comment


                            • #15
                              Less interestingly, perhaps, but as a supporter of community norms I'd like to echo Clyde's request in #7:

                              it is the cultural norm in this community that we use our real given and surnames as our username, to promote collegiality, professionalism, and responsibility. At your earliest convenience, click on "Contact Us" in the lower right corner of the page and send a message to the forum administrators requesting that they change your user name.
                              For more explanation, http://www.statalist.org/forums/help#realnames

                              http://www.statalist.org/forums/help#adviceextras

                              The same request was made in January 2015: http://www.statalist.org/forums/foru...d-observations

                              Comment

                              Working...
                              X