Xtreg, fe - formula used by STATA

Bjorn Becker

Join Date: Aug 2021

Posts: 18
#1

Xtreg, fe - formula used by STATA

14 Nov 2023, 08:00

Dear everyone,

My question dances on the border between a STATA question and a question regarding econometric understanding. I hope you can help me, but if the issue is tilting too far to the econometric side and away from STATA for this forum, I apologize and will try to take my question elsewhere.

I am using STATA/SE 18 and am facing issues related to the formula used by the xtreg, fe command.

My discrete dependent variable is coded from 0 to 10. My main explanatory variable is a 0/1 dummy that describes whether someone is part of a group (= member_of_group). Within the group, there are people who joined the group after not being a member and never left and people who left the group after being a member and never joined again. I kicked everyone who joined AND left the group at some point out. In other words, only those who joined once, left once, or never changed their status remain in the sample.

The groups I focus on then become:

group_joiner = 1 if joined and never left (leaver and those who never changed group membership status = 0)
group_leaver = 1 if left and never joined (joiner and those who never changed group membership status = 0)

I proceed to use FE as follows:

xtreg dep_var member_of_group controls if group_joiner == 1, fe cluster(pid)
xtreg dep_var member_of_group controls if group_joiner == 0, fe cluster(pid)

xtreg dep_var member_of_group controls if group_leaver == 1, fe cluster(pid)
xtreg dep_var member_of_group controls if group_leaver == 0, fe cluster(pid)

By my understanding, I have made a logical mistake here in assuming I needed the third and fourth line of the code. Because the fixed effects beta for the main explanatory variable member_of_group of group_joiner = 0 should exactly be equal to the value of member_of_group if group_leaver = 1 and vice versa. By my understanding, this is because the only variation should come from those who joined once or left once, but those who always remained in the group should not matter. But the variable member_of_group differs for all four analyses.

The respective betas for member_of_group become the same though,as soon as I take all controls out of my regression.

When I look at the xtreg manual, I see that the code depicted on page 28 includes the overall sample's average for both the dependent and the independent variables, and I assume that, therefore, those who never changed their member_of_group status may have an influence. But why does the relationship change once I throw the controls out of my model? Does that hint at a correlation between my main explanatory variable and at least one of the controls? The formula looks as follows:

I would be happy about some guidance related to this matter, as it may change how I interpret fixed effects results in STATA.

Thank you very much.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17726
#2

14 Nov 2023, 09:59

Bjorn:
please share exactly what you typed and what Stata gave you back, along with an example/excerpt of your data (as per FAQ). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Bjorn Becker

Join Date: Aug 2021
Posts: 18

16 Nov 2023, 10:19

Dear Carlo,

I apologize and will attempt to summarize everything in a more suitable manner:

I look at the relationship of my dependent variable, let’s call it happiness, and being a member of a group that people can join but also leave as they please. Using fixed effects, I try to determine if the happiness is significantly different for people who left the group at some point (leaving_group) compared to those who joined the group at some point (joining_group).

Take this code as an example:

ID	year	happiness	group_member	joining_group	leaving_group	n controls, including age, state, income, etc.
1	2005	9	0	1	0
1	2007	8	1	1	0
1	2009	8	1	1	0
2	1998	6	1	0	1
2	2002	9	0	0	1
3	2002	5	1	0	0
3	2005	4	1	0	0
3	2007	8	1	0	0
4	2007	5	0	0	0
4	2010	5	0	0	0

With this definition of variables, the only overlap between joining_group and leaving_group is those IDs that never changed their group_member status.

Now, I proceed to compare those who left the group with those who left the group alongside those who always had the same membership status. Basically, I am checking the validity of the results with a differing sample. Only individuals who never changed their membership status make up the difference. By my understanding, they should, therefore not impact the results of group_member in fixed effects:

PHP Code:


xtreg dep_var  group_member if leaving_group== 0, fe cluster(pid)

Code:

Fixed-effects (within) regression               Number of obs     =     70,000
Group variable: pid                             Number of groups  =     30,000

R-squared:                                      Obs per group:
     Within  = 0.0001                                         min =          1
     Between = 0.0007                                         avg =        2.1
     Overall = 0.0009                                         max =         10

                                                F(1, 29999)       =       3.46
corr(u_i, Xb) = 0.0102                          Prob > F          =     0.0630

                               (Std. err. adjusted for 30,000 clusters in pid)
------------------------------------------------------------------------------
             |               Robust
       happiness| Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
1.group_member|  -.0973402   .0523475    -1.86   0.063    -.1999431    .0052627
       _cons |   7.337787   .0078224   938.05   0.000     7.322455    7.353119

PHP Code:


xtreg dep_var  group_member if joining_group == 1, fe cluster(pid)

Code:

Fixed-effects (within) regression               Number of obs     =      4,000
Group variable: pid                             Number of groups  =      1,000

R-squared:                                      Obs per group:
     Within  = 0.0015                                         min =          1
     Between = 0.0004                                         avg =        3.1
     Overall = 0.0007                                         max =         10

                                                F(1, 999)        =       3.45
corr(u_i, Xb) = -0.0053                         Prob > F          =     0.0633

                                (Std. err. adjusted for 1,352 clusters in pid)
------------------------------------------------------------------------------
             |               Robust
       happiness | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
1.group_member|  -.0973402    .052372    -1.86   0.063    -.2000794    .0053991
       _cons |   7.291431   .0291863   249.82   0.000     7.234175    7.348686
-------------+----------------------------------------------------------------
     sigma_u |  1.3954569
     sigma_e |  1.2381409
         rho |  .55952185   (fraction of variance due to u_i)
------------------------------------------------------------------------------

The values of the coefficient for group_member do not differ while I do not use controls. This is my expected result, as the groups only differ by individuals who never changed their group_member status.

If I now introduce controls, the coefficients differ from each other:

PHP Code:


xtreg dep_var  group_member n_number_of_controls if leaving_group== 0, fe cluster(pid)

Code:

Fixed-effects (within) regression               Number of obs     =     70,000
Group variable: pid                             Number of groups  =     30,000

R-squared:                                      Obs per group:
     Within  = 0.0232                                         min =          1
     Between = 0.0195                                         avg =        2.1
     Overall = 0.0232                                         max =         10

                                                F(57, 29999)      =      12.74
corr(u_i, Xb) = -0.0667                         Prob > F          =     0.0000

                                   (Std. err. adjusted for 34,607 clusters in pid)
----------------------------------------------------------------------------------
                 |               Robust
           happiness| Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-----------------+----------------------------------------------------------------
         1.group_member|   .0191756   .0528138     0.36   0.717    -.0843413    .1226924

PHP Code:


xtreg dep_var  group_member n_number_of_controls if joining_group == 1, fe cluster(pid)

Code:

Fixed-effects (within) regression               Number of obs     =      4,000
Group variable: pid                             Number of groups  =      1,000

R-squared:                                      Obs per group:
     Within  = 0.0455                                         min =          1
     Between = 0.0006                                         avg =        3.1
     Overall = 0.0043                                         max =         10

                                                F(56, 1351)       =          .
corr(u_i, Xb) = -0.4631                         Prob > F          =          .

                                    (Std. err. adjusted for 999 clusters in pid)
----------------------------------------------------------------------------------
                 |               Robust
           happiness| Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-----------------+----------------------------------------------------------------
         1.group_member|  -.0076371   .0701841    -0.11   0.913    -.1453187    .1300445

By my understanding, which appears to be flawed, the betas in fixed effects should only be affected by individuals with variation in the respective variable. This should not be changed by adding or removing controls. Why are they still changing though? Does the solution lie in the formula xtreg, fe is using in STATA?

I would be very thankful for feedback and hope that my example is clearer now.

Announcement

Xtreg, fe - formula used by STATA

Comment

Comment