Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing variable completely changes regression output

    Hi all,

    I am investigating the impact of the degree of competitiveness of English football leagues on fan attendance within the league using a panel data set and have estimated the following model:

    xtreg lnatt scr pts goals lag_pts ticket rgnu lncap lnpop, re

    where att is attendance of each club for each season, scr is my measure of competitiveness, pts is point acquired by each club in the league, goals is goals scored by each club in the league, lag_pts is points acquired in the previous season, ticket is the minimum ticket price in that season, rgnu is the regional unemployment rate, cap is stadium capacity and pop is population.

    Firstly, I wanted to know if I am justified in using a random effects model due to the fact that stadium capacity is largely time invariant and so would be omitted from the regression in a fixed effects model or should I perform a hausman test (and if so is using a 95% confidence interval correct for this test)?

    Additionally, I have 355 observations for all my variables aside from ticket prices for which I only have 155 observations. When I estimate the model excluding ticket prices the outcome is very different and my variable of interest (scr) goes from a p-value of 0.1 to around 0.9. Is this as concerning as it seems? Should I still include ticket prices in my estimated model?

    Any help would be hugely appreciated.

    Thanks in advance,
    Joe

  • #2
    On your first question, yes, I think you should use a Hausman test (at 95% CI).

    When you estimate the model with ticket prices included as a variable then the regression is only run over the observations for which ticket price is included. Therefore you are only using a subset of your sample. Is there a reason why some ticket prices are excluded (e.g. lower-ranked football teams?), if ticket price is excluded for non-random reasons then you are likely biasing your results by including it in the sample. Hopefully this explanation makes sense but if not I can elaborate.

    However, if ticket price is correlated with both lnatt and scr then excluding it can result in bias of your estimate and your result is insignificant. Is there anyway you can obtain ticket price data for the rest of the data?

    best,
    Rhys

    Comment


    • #3
      Thank you very much for your reply Rhys.

      Ticket price data is only available from 2011 (so excluded for random reasons) but I have data for all other variables from 2004. Ticket price is not found to be statistically significant and removing it actually increases the R^2 value (presumably because the sample size is larger) which makes me think I should just exclude it from the model. I just find it very strange that excluding it has such an impact on the coefficient and standard error of scr (there is no reason to suggest scr would be correlated with ticket price), especially when the other variables don't appear to change too much. I have included the STATA output below. Even more confusingly, the hausman test suggests RE is better for the model with ticket price included and FE is better for the model without ticket price included.


      xtreg lnatt pts goals lag_pts scr ticket rgnu lncap lnpop, re

      Random-effects GLS regression Number of obs = 141
      Group variable: club Number of groups = 50

      R-sq: Obs per group:
      within = 0.3368 min = 1
      between = 0.7785 avg = 2.8
      overall = 0.7248 max = 6

      Wald chi2(8) = 217.07
      corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

      ------------------------------------------------------------------------------
      lnatt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      pts | .0038539 .0011649 3.31 0.001 .0015708 .006137
      goals | .0005054 .0013459 0.38 0.707 -.0021326 .0031434
      lag_pts | .0013124 .0006446 2.04 0.042 .000049 .0025758
      scr | .4927856 .1863827 2.64 0.008 .1274822 .858089
      ticket | -.000495 .0036664 -0.14 0.893 -.007681 .0066909
      rgnu | -.0093582 .0073084 -1.28 0.200 -.0236824 .0049661
      lncap | .6854141 .0775563 8.84 0.000 .5334066 .8374216
      lnpop | .2353393 .0958889 2.45 0.014 .0474005 .4232781
      _cons | -1.030194 1.014356 -1.02 0.310 -3.018295 .957908
      -------------+----------------------------------------------------------------
      sigma_u | .22253535
      sigma_e | .08743948
      rho | .86625892 (fraction of variance due to u_i)

      xtreg lnatt pts goals lag_pts scr rgnu lncap lnpop, re

      Random-effects GLS regression Number of obs = 355
      Group variable: club Number of groups = 69

      R-sq: Obs per group:
      within = 0.3454 min = 1
      between = 0.8692 avg = 5.1
      overall = 0.7579 max = 14

      Wald chi2(7) = 533.58
      corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

      ------------------------------------------------------------------------------
      lnatt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      pts | .0045351 .0008781 5.16 0.000 .002814 .0062562
      goals | .001145 .0010504 1.09 0.276 -.0009137 .0032038
      lag_pts | .0022276 .0005163 4.31 0.000 .0012157 .0032395
      scr | -.0165243 .1485191 -0.11 0.911 -.3076164 .2745677
      rgnu | -.0150516 .0051385 -2.93 0.003 -.025123 -.0049803
      lncap | .6691735 .0509921 13.12 0.000 .5692309 .7691161
      lnpop | .246759 .0705219 3.50 0.000 .1085385 .3849794
      _cons | -.8426202 .7192977 -1.17 0.241 -2.252418 .5671774
      -------------+----------------------------------------------------------------
      sigma_u | .21376137
      sigma_e | .11919228
      rho | .76282756 (fraction of variance due to u_i)

      Comment


      • #4
        Thanks for providing the output. I would be inclined to agree and say it is best to exclude ticket price, given that it limits your dataset. However, just make sure there is no omitted variable bias concern, I don't know how scr is constructed but if it measures football team competitiveness then it might be that more competitive teams (so I think "better") can charge a higher price - therefore there would be a concern of OVB.

        In terms of the wild change in sign (and significance), this could just be because you are using a fuller sample and in the population there is no effect.
        Alternatively, it might be that scr is highly correlated with pts, goals and lag_pts - I would suggest checking the correlation matrix and also seeing if there is multicollinearity (run
        Code:
        vif
        after running
        Code:
        reg lnatt pts goals lag_pts scr ticket rgnu lncap lnpop
        Have you run the model using FE then (excluding ticket price)? What does that reveal? It might make more sense conceptually to use FE given that there might be time invariant characteristics driving the results (e.g. "management", "club ethos" etc).

        Best,
        Rhys

        Comment


        • #5
          Hi Rhys,

          Thanks again for replying. Sorry I should have been more clear in my initial post - scr is a measure of competitive balance in the league (i.e. the share of total points in a league season acquired by the top clubs). I don't think OVB would be a problem in this case as scr is not correlated with ticket prices.

          Multicollinearity also does not appear to be a big issue - I have included the output of the vif test below.

          I agree with you that fixed effects is probably the best approach and there seems to be a less drastic impact of removing ticket price when using fixed effects.

          Variable | VIF 1/VIF
          -------------+----------------------
          goals | 3.30 0.303408
          pts | 3.18 0.314733
          lncap | 1.91 0.524165
          lnpop | 1.64 0.608535
          rgnu | 1.29 0.776616
          ticket | 1.25 0.799745
          scr | 1.08 0.923455
          lag_pts | 1.07 0.934495
          -------------+----------------------
          Mean VIF | 1.84


          . xtreg lnatt scr pts goals rgnu lag_pts lnpop lncap, fe

          Fixed-effects (within) regression Number of obs = 355
          Group variable: club Number of groups = 69

          R-sq: Obs per group:
          within = 0.3717 min = 1
          between = 0.7240 avg = 5.1
          overall = 0.5806 max = 14

          F(7,279) = 23.58
          corr(u_i, Xb) = 0.4792 Prob > F = 0.0000

          ------------------------------------------------------------------------------
          lnatt | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          scr | .2244549 .1563055 1.44 0.152 -.083233 .5321428
          pts | .0043311 .0008667 5.00 0.000 .0026249 .0060372
          goals | .0009976 .0010312 0.97 0.334 -.0010324 .0030276
          rgnu | -.0169503 .0052582 -3.22 0.001 -.027301 -.0065995
          lag_pts | .0025208 .0005101 4.94 0.000 .0015167 .0035249
          lnpop | -.1973254 .1997025 -0.99 0.324 -.5904403 .1957895
          lncap | .4099748 .0893885 4.59 0.000 .2340133 .5859362
          _cons | 6.760811 2.307345 2.93 0.004 2.218795 11.30283
          -------------+----------------------------------------------------------------
          sigma_u | .43427953
          sigma_e | .11919228
          rho | .92994865 (fraction of variance due to u_i)

          Thanks very much for your help on this. It is much appreciated.

          Comment


          • #6
            No problem! Someone else here might have a stroke of inspiration which I didn't think of but otherwise good luck with your research project!

            Best,
            Rhys

            Comment

            Working...
            X