Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with the command "xtreg, be" on unbalanced data?

    Dear Stata users,


    I am using Stata/MP 13.1 and I believe I have detected an issue with the command "xtreg, be" when applied on an unbalanced panel: I suspect that "xtreg, be" corresponds to WLS on the group-meaned data, and that "xtreg, be wls" correspond to OLS on the group-meaned data.

    Indeed, if I type these two commands on my dataset (1355 municipalites i overs 3 years t, for a total of 3501 observations it), I get:

    . xtreg y_it x_it, be

    Between regression (regression on group means) Number of obs = 3501
    Group variable: munibr Number of groups = 1355

    R-sq: within = 0.0024 Obs per group: min = 1
    between = 0.1772 avg = 2.6
    overall = 0.0867 max = 3

    F(1,1353) = 291.38
    sd(u_i + avg(e_i.))= .0430597 Prob > F = 0.0000

    ------------------------------------------------------------------------------
    y_it | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    x_it | -.0335348 .0019646 -17.07 0.000 -.0373887 -.0296809
    _cons | .4034226 .0100747 40.04 0.000 .383659 .4231863
    ------------------------------------------------------------------------------

    . xtreg y_it x_it, be wls

    Between regression (regression on group means) Number of obs = 3501
    Group variable: munibr Number of groups = 1355

    R-sq: within = 0.0024 Obs per group: min = 1
    between = 0.1621 avg = 2.6
    overall = 0.0867 max = 3

    F(1,1353) = 261.75
    sd(u_i + avg(e_i.))= .0414943 Prob > F = 0.0000

    ------------------------------------------------------------------------------
    y_it | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    x_it | -.0312599 .0019322 -16.18 0.000 -.0350503 -.0274696
    _cons | .3897733 .0100087 38.94 0.000 .370139 .4094075
    ------------------------------------------------------------------------------

    .


    Now, if I run OLS on the group-meaned data computed manually, the results (on the coefficients) correspond to those of "xtreg, be wls":

    . by id, sort: egen double x_i=mean(x_it)

    . by id, sort: egen double y_i=mean(y_it)

    . reg y_i x_i

    Source | SS df MS Number of obs = 3501
    -------------+------------------------------ F( 1, 3499) = 676.92
    Model | 1.16445331 1 1.16445331 Prob > F = 0.0000
    Residual | 6.01905508 3499 .001720222 R-squared = 0.1621
    -------------+------------------------------ Adj R-squared = 0.1619
    Total | 7.18350839 3500 .002052431 Root MSE = .04148

    ------------------------------------------------------------------------------
    y_i | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    x_i | -.0312599 .0012015 -26.02 0.000 -.0336156 -.0289042
    _cons | .3897733 .0062238 62.63 0.000 .3775707 .4019759
    ------------------------------------------------------------------------------



    Am I wrong or is there really an issue with "xtreg, be" command? Thank you in advance for coming back to me.


    Best,
    Geoffrey

  • #2
    Welcome to Statalist, Geoffrey.

    Sorting through your posted output, I have two comments.

    1) It appears to me you are computing group means within groups defined by the variable id. However, it appears your xtset used the variable munibr to define the panels. Is this what you intended?

    2) Note that both the xtreg results describe themselves as "regression on group means" and the F-statistic degrees of freedom is consistent with having 1355 observations (the number of groups). However, your regression on the manually-calculated group means neglected to limit itself to a single observation per group.

    As you've no doubt noticed, the output you posted is not easily readable, despite having chosen a monospace font, because the software behind the forum does not preserve multiple spaces. To improve future posts, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

    To assure maximum readability of results that you post, please copy them from the Results window or your log file into a code block in the Forum editor, as explained in section 12 of the Statalist FAQ. For example, the following:

    [code]
    . sysuse auto, clear
    (1978 Automobile Data)

    . describe make price

    storage display value
    variable name type format label variable label
    -----------------------------------------------------------------
    make str18 %-18s Make and Model
    price int %8.0gc Price
    [/code]

    will be presented in the post as the following:
    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . describe make price
    
                  storage   display    value
    variable name   type    format     label      variable label
    -----------------------------------------------------------------
    make            str18   %-18s                 Make and Model
    price           int     %8.0gc                Price

    Comment


    • #3
      Hello William,


      Thank you very much for your answer. I take good note of your comments about the output of the post.
      Regarding your comment 1), the variables "id" and "munibr" are the same: I have renamed "munibr" to make the example clearer.

      Regarding your comment 2), my objective was to manually to recreate the BE coefficients, not their standard errors (SEs) and associated degrees of freedom. Nevertheless, this has triggered my interest and I have found out what "xtreg, be" and "xtreg, be wls" do:
      - "xtreg, be" does an OLS regression on the group-meaned data where only one observation per group is kept.
      - "xtreg, be wls" does an OLS regression on the group-meaned data where only one observation per group is kept, but where each group is weighted by its number of appearances (by specifying [aweight=Ti], where Ti is the number of appearances).
      In comparison, the "manual regression" on group means where all observations are kept gives the same coefficients as "xtreg, be wls" (and as "xtreg, be" if the panel is balanced). However, it gives SEs that are too low since most observations are then redundant.

      My (personal) opinion about the two commands and how they are described in Stata's manual is the following.
      First, Stata's manual describes "xtreg, be" as the default option and implicitly uses this command to construct the random effect estimates from "xtreg, re". I believe this choice is somehow criticizable: to some extent, one disregards the panel nature of an unbalanced panel data when one attributes the same weight to all groups (e.g. suppose one group is observed ten times and another one only two times).
      Second, Stata's manual describes "xtreg, be wls" as WLS on the group-meaned data. I find this somehow misleading: as the "manual regression" shows, the coefficients from "be, wls" can also be obtained though standard OLS (i.e. without weights) on the entire dataset. Regarding the SEs, I guess (but am not sure) that one could obtain the correct ones by simply adjusting the degrees of freedom.


      Best,
      Geoffrey



      Comment


      • #4
        geoffrey - see this post for a similar discussion and explanation
        https://www.statalist.org/forums/for...ith-wls-option
        the margins bug has been fixed, but the issue remains how to summarize group effects in unbalanced panel data

        Comment

        Working...
        X