Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Year and country dummies in pooled OLS regressions

    Hi,

    I am doing an empirical analysis with a sample of 260 observations - 13 years and 20 countries. First of all, would it be recommended to run FE model with such a small sample? I was told by my supervisor that FE model would deliver unreliable outcomes in my case. Moreover, I don't have any time-invariant variables.

    Secondly, I ran pooled OLS regressions without including any dummy variables and would like to compare outcomes with inclusion of year and country dummies to absorb country and time specific fixed effects. Therefore would it make sense including these dummy variables in pooled OLS regression by i.Year and i.Country or -areg- function? Or is it recommended to do only when running FE model or least squares dummy variables (LSDV) model? Furthermore, after running a regression, I tested for multicollinearity by applying -VIF- function and received that most of my explanatory and dummy variables have collinearity issue then.
    Finally, I tested for time and country-fixed effect by -testparm- test in Stata 12 after running both pooled OLS and FE models, and found that the dummies for all years and countries are equal to 0, thereby no time or country fixed effects should be needed. However, the Prob>F is lower than 0.05 and gives opposite inference. Any suggestions about this?

    Thank you in advance for your answers and suggestions!

  • #2
    Algirdas,
    welcome to the list.
    Your chance of getting helpful replies increases if (as per FAQ) you post what you typed and whta Stata gave you back.
    Someasides,though:
    -have you tried to run your model under the -xtreg, re- specification?
    - VIF: what happens if you remove one or more highly collinear variables?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you for your response, Carlo.
      1. Yes, I have tried -xtreg, re- specification and applied Breusch and Pagan Lagrangian multiplier (LM) test for random effects. The test results showed that I do not need to run random effects model.
      2. Should I remove all highly collinear variables one by one starting from the mostly collinear and check the results of VIF again?

      Comment


      • #4
        As I am also having an issue of serial correlation of errors and heteroskedasticity of residuals, hence I am reporting robust standard errors by -reg, cluster(id).
        Moreover, when I tried to run fixed-effects model (also reporting standard errors), all country dummy variables were ommitted due to collinearity.

        Comment


        • #5
          Algirdas:
          thanks for these further details.
          1. did you constrast -xtreg, fe- vs -xtreg, re- via -hausman- test, too? Again on specification, if you -xtset country year- your data set, the omission of all i.country due to collinearity makes sense to me, because -xtset- already took countries into account (and hence, i.country would be redundant for the estimation);
          2. yes, but be cautious as not removing predictors that, according to what other researchers have done in the past about the same research topic, should be kept in the right-hand side of the equation.
          As an aside, in your future posts, please try to report exactly what you typed and what Stata gave you back: it is easy to do by exploiting the code delimiter function (# icon), that you can select from the advanced editor (A icon) options.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Taking into account Carlo's remarks, it is always wise to choose your variables first and evaluate their suitability prior to specifying any model. Secondly, from your post, it seems you misinterpret the output from-testparm- in STATA.

            Finally, I tested for time and country-fixed effect by -testparm- test in Stata 12 after running both pooled OLS and FE models, and found that the dummies for all years and countries are equal to 0, thereby no time or country fixed effects should be needed. However, the Prob>F is lower than 0.05 and gives opposite inference.

            Consider the following OLS regression model with country and time dummies

            Code:
            reg y x1 x2 x3 i.country i.year
            If I run the command

            Code:
            testparm i.country i.year
            This is a test that all country and time dummies are jointly equal to 0. So, when STATA displays 2.country=0....N.country=0 and 2.year=0...T.year=0 (N countries, T years), it is just pointing out what hypothesis is being tested. The F-statistic from this hypothesis is what you look at either to reject or to fail to reject the hypothesis. In your case, you indicate Prob > F = 0.05 - which means that we can reject poolability across time and across country at the 5 percent level of significance (which is evidence in favor of fixed effects).

            Note that whether or not you should rely on this test depends on the structure of your error term (you can read Chapter 4 of the textbook by Baltagi for more details and procedure on how to establish that the assumptions are satisfied). I reproduce a code showing how STATA computes the F-statistic after running -testparm- here (where you should re-adjust the variable names to follow)


            F= (RSS1-RSS2)/ (P2-P1) / (RSS2/ (N-P2))

            where RSS1 and P1 is the residual sum of squares and number of parameters including constant from model 1, N is the total number of observations (note that model 1 is nested within model 2).
            Code:
            reg y x1 x2 x3
            scalar rss1 = e(rss)
            scalar p1= e(df_m)+1
            reg y x1 x2 x3 i.country i.year
            scalar rss2= e(rss)
            scalar p2= e(df_m)+1
            scalar N= e(N) 
            scalar df_n = p2-p1
            scalar df_d= N-p2
            scalar F = ((rss1-rss2)/df_n)/(rss2/df_d)
            di df_n
            di df_d
            di F

            The F is distributed as F(df_n, df_d) under the null hypothesis.

            Comment

            Working...
            X