Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • confidence intervals for individual fixed effects u from xtreg, fe

    I am running a fixed effects model with schools being my level of interest. I want to understand which schools are getting the best marks after adjusting for student characteristics. I have 30 schools. I am doing the following:

    Code:
    xtset school_id
    xtreg marks_end_year marks_beggining_year $student_characteristics i.discipline, fe base vce(robust)
    Then to obtain the individual fixed effects I am doing the following:

    Code:
    predict fe_school, u
    My understanding is that this will give me the deviation of each school's outcome from the regression constant term which is 1.238747 and corresponds to average outcome across units. Marks go from 0 to 3 in a continuous scale.

    I have 3 quick questions:

    1- for me to obtain the average outcome of each school I can just simple add 1.238747 to each individual fixed effect. Am I right?

    2- I thought by using fixed effects we would always require to set a reference unit, so our fixed effects would be in comparison to that unit. And that random effects would give us the fixed effects around 0 so like around a central value. But if I run xtreg, fe and then get the fixed effects as per the above, I do get individual fixed effects that are deviations from the regression constant term, and not in reference to any particular unit. Can someone comment on this please?

    3- how can I calculate 95% confidence intervals for those individual fixed effects u?

  • #2
    The fixed effects coefficients are not identified and depend on the constraint that you put on the system, which is arbitrary. For Stata's constraint, see https://www.stata.com/support/faqs/s...effects-model/. Additionally, the fixed effects are in most cases not estimated consistently. Therefore, I do not see the use of saving these estimates. xtreg, fe applies the within-transformation, but you could also estimate the model using least squares dummy variables (LSDV), hence your notion of an omitted (reference) unit. In any case, taking into account the specific constraint, I show here (#2) the equivalence of predicted fixed effects from xtreg, fe and the LSDV dummy coefficients (from regress). LSDV will give you both the estimated dummy coefficients and their 95% confidence intervals, but bear in mind that these estimates are not meaningful.
    Last edited by Andrew Musau; 06 Jan 2021, 16:04.

    Comment


    • #3
      Thanks very much Andrew Musau for providing this information. I went through both links - very helpful.

      In the first one, it says 'the reported intercept is the average value of the fixed effects'. I believe in this case the constraint is setting the intercept as the average value of the fixed effects, am I right? This relates to one of my questions: 1- for me to obtain the average outcome of each school I can just simple add 1.238747 (constant term) to each individual fixed effect. Am I right?

      If that's correct, it's great because that's actually what I want. I dont want to set a reference school, ideally I want the fixed effects in relation to the average of all fixed effects, hence why I used xtreg, fe and not LSDV. My understanding is that by using LSDV, I will have to set a reference category. Does this make sense?

      Comment


      • #4
        If you follow Andrew Musau's suggestion and use LSDV regression, you can then use -predict- to get the complete predicted values, and, following that, -predict- with the -stdp- option to get the standard error of the predicted values. From there you can calculate the confidence interval. While the estimates of the constant term and the "fixed effects" are meaningless numbers depending on an arbitrary constraint, the outputs of -predict- are invariant to the choice of reference category and are meaningful.

        Comment


        • #5
          many thanks Clyde Schechter. I have been going through the links Andrew Musau shared, but I am still trying to understand the whole concept.

          If I understood correctly your advise, I could do the following:

          Code:
           reg marks_end_year marks_beggining_year $student_characteristics i.discipline i.school_id, base vce(robust)
          
          predict marks_end_year_hat if e(sample), xb
          
          bys school_id: ci means marks_end_year_hat if e(sample)
          This way I get the average adjusted outcome for each school, along with their SEs and CIs. Would this be correct?

          One thing I noticed is that if I run -predict- and then -ci means- after -xtreg, fe- I get different results. Shouldn't LSDV and FE estimator give the same predicted outcome values?

          one last thing, I just noticed that my panel is not balanced (some schools with many students, others with few), but I hope it is still ok to use LSDV in this case?

          Comment


          • #6
            Sorry to ask another question. I wonder if, after running the regression, instead of predicting the outcome and then calculating the mean for each school as I specified above, I could do the following:

            Code:
            reg marks_end_year marks_beggining_year $student_characteristics i.discipline i.school_id, base vce(robust)
            
            margins school_id if e(sample)
            I was expecting these 2 approaches (predict + mean by school & margins) to give the same results, but that's not the case. Does anyone know why?

            Comment


            • #7
              Re #5. No, that's not what I recommended. I recommended

              Code:
              reg marks_end_year marks_beggining_year $student_characteristics i.discipline i.school_id, base vce(robust)
              predict marks_end_year_hat if e(sample), xb
              predict marks_end_year_sd if e(sample), stdp
              Now, the approach with -margins- in #6 is something completely different. So you need to get clear on what you want:

              If you use -predict-, you will get an individual predicted value for each observation in the data set. It will differ among different observations in the same school due to differences on other variables. The -predict- outputs are not school-level results, they are single-observation level results. If you use -margins- you will get an expected value for each school that is adjusted for all the other variables in the model. The -margins- result is for the school as a whole.

              Comment


              • #8
                Many thanks Clyde Schechter. Just to clarify, my objective is to assess the individual effect of each school, so even if I get individual predictions, I ultimately need to average those out up to the school level. I think I will use margins, but just for my understanding:

                Why would it be different to do:
                1) -predict- to get individual predictions and then calculate the average of all individual predictions for each school level
                2) -margins- at school level

                I was expecting these 2 actions to give the same results, since both predictions are adjusted to all other variables, but they give slightly different results. That's my doubt. Do you know why?

                Comment


                • #9
                  With the first approach, the results are not fully adjusted for differences among the schools on the numerous other variables in your model. In the second approach, they are.

                  Comment


                  • #10
                    Daniela:
                    just to have an idea of how fitted values are calculated after -xtreg,fe- and -regress- with -i.panelid- among the set of predictors, you may want to consider the following toy-example:
                    Code:
                    . use "https://www.stata-press.com/data/r16/nlswork.dta"
                    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
                    
                    . reg ln_wage age i.year i.idcode if idcode<=3, vce(cluster idcode)
                    
                    Linear regression                               Number of obs     =         39
                                                                    F(1, 2)           =          .
                                                                    Prob > F          =          .
                                                                    R-squared         =     0.6843
                                                                    Root MSE          =     .27893
                    
                                                     (Std. Err. adjusted for 3 clusters in idcode)
                    ------------------------------------------------------------------------------
                                 |               Robust
                         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             age |   .3010572   .1448271     2.08   0.173    -.3220834    .9241978
                                 |
                            year |
                             69  |  -.0920902   .1448271    -0.64   0.590    -.7152309    .5310504
                             70  |  -.8648493   .4744989    -1.82   0.210    -2.906453    1.176755
                             71  |  -1.248506   .7569034    -1.65   0.241    -4.505199    2.008186
                             72  |   -1.39387   .6116727    -2.28   0.150    -4.025685    1.237945
                             73  |  -1.520276   .7804181    -1.95   0.191    -4.878144    1.837592
                             75  |  -2.049717    1.17383    -1.75   0.223    -7.100302    3.000868
                             77  |  -2.657565   1.424917    -1.87   0.203    -8.788489     3.47336
                             78  |  -2.751196   1.275456    -2.16   0.164    -8.239039    2.736647
                             80  |  -3.324016   1.557037    -2.13   0.166    -10.02341    3.375375
                             82  |  -4.027975    2.05943    -1.96   0.190    -12.88899    4.833039
                             83  |  -4.207353   2.093482    -2.01   0.182    -13.21488    4.800173
                             85  |  -4.730657   2.278496    -2.08   0.174    -14.53423    5.072919
                             87  |  -5.407995   2.621299    -2.06   0.175    -16.68653    5.870545
                             88  |  -5.901929   2.896542    -2.04   0.178    -18.36474    6.560883
                                 |
                          idcode |
                              2  |  -.3898423   .0270794   -14.40   0.005    -.5063556    -.273329
                              3  |  -2.247118   .8544305    -2.63   0.119    -5.923436    1.429199
                                 |
                           _cons |  -2.882579   2.331809    -1.24   0.342    -12.91554    7.150385
                    ------------------------------------------------------------------------------
                    
                    . predict fitted, xb
                    (24 missing values generated)
                    
                    . xtreg ln_wage age i.year if idcode<=3, fe vce(cluster idcode)
                    
                    Fixed-effects (within) regression               Number of obs     =         39
                    Group variable: idcode                          Number of groups  =          3
                    
                    R-sq:                                           Obs per group:
                         within  = 0.5596                                         min =         12
                         between = 0.4744                                         avg =       13.0
                         overall = 0.0413                                         max =         15
                    
                                                                    F(2,2)            =          .
                    corr(u_i, Xb)  = -0.9573                        Prob > F          =          .
                    
                                                     (Std. Err. adjusted for 3 clusters in idcode)
                    ------------------------------------------------------------------------------
                                 |               Robust
                         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             age |   .3010572   .1383871     2.18   0.162    -.2943743    .8964887
                                 |
                            year |
                             69  |  -.0920902   .1383871    -0.67   0.574    -.6875217    .5033412
                             70  |  -.8648493   .4533994    -1.91   0.197    -2.815669    1.085971
                             71  |  -1.248506   .7232462    -1.73   0.226    -4.360384    1.863371
                             72  |   -1.39387   .5844735    -2.38   0.140    -3.908656    1.120917
                             73  |  -1.520276   .7457154    -2.04   0.178     -4.72883    1.688278
                             75  |  -2.049717   1.121634    -1.83   0.209    -6.875718    2.776284
                             77  |  -2.657565   1.361556    -1.95   0.190    -8.515866    3.200736
                             78  |  -2.751196    1.21874    -2.26   0.153    -7.995011    2.492619
                             80  |  -3.324016   1.487801    -2.23   0.155    -9.725506    3.077473
                             82  |  -4.027975   1.967854    -2.05   0.177    -12.49497    4.439017
                             83  |  -4.207353   2.000391    -2.10   0.170    -12.81434    4.399636
                             85  |  -4.730657   2.177178    -2.17   0.162     -14.0983    4.636984
                             87  |  -5.407995   2.504738    -2.16   0.163    -16.18501    5.369023
                             88  |  -5.901929   2.767741    -2.13   0.167    -17.81056    6.006701
                                 |
                           _cons |  -3.866807   2.544581    -1.52   0.268    -14.81525     7.08164
                    -------------+----------------------------------------------------------------
                         sigma_u |  1.2007631
                         sigma_e |  .27892564
                             rho |   .9488037   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    
                    . predict fitted_xtreg, xb
                    (24 missing values generated)
                    
                    . predict res_u_xtreg, u
                    (28,495 missing values generated)
                    
                    . list fitted fitted_xtreg res_u_xtreg in 1/5
                    
                         +--------------------------------+
                         |   fitted   fitted~g   res_u_~g |
                         |--------------------------------|
                      1. | 1.671601   .6873738   .9842277 |
                      2. | 1.589002   .6047739   .9842277 |
                      3. | 1.744695   .7604677   .9842277 |
                      4. | 1.919347   .9351189   .9842277 |
                      5. |  1.99202   1.007792   .9842277 |
                         +--------------------------------+
                    
                    . di .6873738  + .9842277
                    1.6716015
                    
                    .
                    As an aside, I find always useful to better understand what's going on with regression coefficients, to compare -predict- results with fitted values calculated by hand.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Many thanks Clyde Schechter and Carlo Lazzaro - this is very helpful indeed!

                      Comment

                      Working...
                      X