Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between xtreg and xi: reg for panel data analysis

    Hi everyone,

    I am running a panel data analysis.
    In my model, I have 6 continuous independent variables and one categorical independent variable.
    The categorical independent variable is binary and time-invariant - it refers to geographical location and stays constant throughout the 4 years panel data.

    Based on what I read from this document,
    I can use either least square dummy variable model
    Code:
    xi: reg y x1 x2 x3 x4 x5 x6 i(location) i(entity)
    OR
    Code:
     xtset id year
    xtreg y x1 x2 x3 x4 x5 x6 i(location), fe
    Both gave me the same coefficients for x1 - x6.
    However, there are three differences:
    1. Using xtreg dropped the coefficient for the location categorical variable, whereas the former would retain the coefficient.
    2. Using xtreg did not give me the fixed effect intercept for each entity in my panel data, whereas using the former did.
    3. Using xtreg, R-sq within = 0.4027, R-sq between = 0.9028, R-sq overall = 0.8938 whereas the former gave me R2 = 0.9874.
    My questions:
    1. I am quite puzzled with No. 1 - if xtreg dropped the time-invariant categorical variable, how can it still give me the same coefficient for x1 - x6 as using xi: reg?
    2. Which command should I use? I am actually estimating a cost function. My dependent variable, y is actually total cost. I would like to use the coefficients to estimate marginal cost and average incremental cost. Deriving average incremental cost using xi: reg and xtreg would produce different values as the former gave an additional coefficient (the binary categorical variable) than the latter.
    3. What is the implication of the differences in R2 estimated using both commands?
    Please advise.

    Thank you very much.

  • #2
    First, why are you using the outdated -xi:- machinery? Switch to factor variable notation (-help fvvarlist-) so that you can then use -margins- to get easy estimates of the various levels and marginal effects you are interested in.

    Question 1. As you state, location is constant for a given entity/id. (I assume these are the same variable from the way you have used them above.) This means that the combination of location and the full panoply of fixed id effects is collinear. So something has to go. With -xi: reg-, -xi- drops one of the fixed effects. You will see that in the output from that regression: there is a missing category for id in the regression table. (The same will be true if you use factor variable notation.) By contrast, -xtreg, fe- will drop location and retain all the fixed effects. But since this suite of variables as a whole contains only as much information as any subset that excludes one of them, the rest of the regression results are completely unaffected by which variable gets dropped.

    Question 2. If you are coding these regressions correctly, it does not matter which one you use. All predictions, marginal effects, coefficients, statistical hypothesis tests will be identical either way. Each model is a simple algebraic transform of the other. If that is not happening for you, please post your exact code and Stata's exact output by pasting directly from the Results window into a code block on this forum. (See FAQ for how to set up a code block.) Do NOT retype anything--we will need to see exactly what happened to troubleshoot it.

    Question 3. The R2 from -reg- and the three R2 values from -xtreg, fe- all refer to different things. The closest of the -xtreg- R2's to -reg-'s R2 (in concept, and usually numerically) will be the "overall R2." However, in -xtreg-, that overall R2 does not count variance attributable only to the fixed effects, whereas R2 in -reg- it does.

    Comment

    Working...
    X