Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing the right regression model for Panel data

    Good evening,

    For my thesis I would like to examine whether there is a relationship between Corporate environmental performance and corporate financial performance for European countries
    Therefore, I have gathered panel data of 516 firms (id) over a period of 6 years across European countries. (micro economic panel data)

    - I will use the environmental pillar score as a proxy for Environmental performance
    - Furthermore, by looking at other studies, they made use of different control variables (Size of the firm, CurrentRatio, Cashflow, R&D expenses, Market share, Capital…). In total 8 control variables are used.
    - Furthermore I would want to measure if there is any difference within European countries, by dividing these countries into 4 regions (North, East, South and West), i added this to my data by adding a dummy variable i.Region
    - The dependent variable is financial performance, measured as two different variables (TobinsQ and ROA)

    Next i checked my data for outliers on all variables…to overcome these outliers I winsorized the variables (at 1% and 99%)
    - I inserted data in long form in Stata
    - I created paneldata by xtset (id Year)
    - Controlled the xtsum (more between difference, than within…however I want to examine if companies that increase their environmental impact, could increase their financial performance, thus within is more important for me)

    - I controlled for multicollinearity (no multicollinearity), also checked this by examining VIF, both indicated no presence of multicollinearity
    - By applying the Breusch Pagan Lagrange Multiplier test, it indicated that my data was panel data…however, does this mean OLS is not suitable anymore?
    - By applying the Modified Wald test, an indication was given that the residuals were heteroskedastic
    - Next, I used the Pesaran test to control if there was cross-sectional dependence (cotemporaneous correlation), i had to reject the null hypotheses, indicating that there was cross-sectional dependence
    - By applying the Wooldridge test, I controlled for serial correlation (first order autocorrelation), I had to reject the null hypothesis, what indicated presence of serial correlation
    - Lastly, by applying the Hausman, it indicated choosing for the FE model, however, since i would like to examine the effect of European regions, FE doesn't sound suitable for this? (since the Regions would be omitted when using a FE regression). Note that the regions don't change within firms


    Concluding,
    - There is no multicollinearity present
    - Breusch Pagan LM test indicated panel data
    - There is heteroscedasticity present in the residuals
    - Pesaran test indicated cross-sectional dependence
    - Wooldridge test indicated serial correlation present
    >> Therefore OLS, FE, RE don't seem suitable for me since, this would give biased estimations (could someone confirm this?)

    The problem for me at the moment is deciding how to go on from here
    I've read many STATA discussions but don't seem to come to a clear answer of what I should do.

    Since N > T (516 > 6), both XGLS and PCSE don't seem suitable for me (could someone confirm this?)
    Another study, which is strongly related, suggested adjusting standard errors for clustering by both firm(id) and year (this is often referred to as Petersen's approach), could this method be a possible solution? (I would do therefore use either reghdfe or ivreg2) (could someone confirm this?)

    regdhdfe Y X1 (Dependent variable) X2…X19 (Control variables) i.Region, noabsorb cluster(id Year)
    ivreg2 Y X (Dependent variable) X2…X19 (Control variables) i.Region, cluster(id Year)

    I hope someone can help me,
    Kind regards!



  • #2
    Veeckman:

    A couple of comments-- First, that the Breusch Pagan LM test suggests that ui != 0 suggests that a pooled ols estimation is not suitable for your study. Level differences between panels can definitely throw off your coefficient estimates. Second, if you want to check for regional effects in your main regression, you will indeed need to find a method outside of -xtreg, fe- to do so, as region being time invariate will cause it to be dropped from the fixed effects estimation. Random effects specifications, however, will estimate time invariate regressors. I've seen -re- used for this purpose even if a Hausman test suggests that -fe- is the better specification (it's not ideal but could be a route taken to avoid overcomplicating your regression).

    If I understand correctly, you're concerned about cross-sectional dependence biasing -fe- or -re- estimates. I think this article (De Hoyos and Sarafidis, 2006) is a good discussion of the issue. If the factors leading to crosssec dependence are correlated with your regressors this can indeed be a problem. However, if the factors are uncorrelated with your regressors, fe/re are very usable as long as they are calculated with Discroll-Kraay standard errors (see the help page for -xtscc-). -xtscc- can also help correct for your heterosked and serial corr problems. Only caution here would be that your T is pretty small, which might interfere with the calculation of the standard errors. Perhaps other forum users have suggestions on testing whether the crosssec dependence factors are correlated with your regressors. You may be able to make a priori/theory based arguments one way or the other (market share strikes me as being possibly correlated).

    I think your -regdhdfe- estimation has promise. I'm curious whether the study you looked at suggested that the two-way cluster would help solve the crossectional dependence issue?


    Comment


    • #3
      Dear Sano,

      Thank you for your response, this has already given me more clarity.

      From your post I can conclude that:
      - OLS is not suitable, since Brush Pagan LM test indicated to reject H0 (being pooled OLS)…therefore look at panel data.
      - Since I want to check whether there are difference between European regions, FE is not suitable due to fact that Region (dummy var) is time invariate, thus FE would not take this variable into account.
      - Hausman test is rather a test for indication to look at what type of specification (RE/FE) you should apply, however, this is just an indication, and you should use logical sense here as-well (meaning that FE would not apply in this case)

      Further,
      Indeed, I'm concerned about biased estimations since their is cross-sectional dependence and serial correlation present. However, while reading through a presentation of Panel data analysis (from Princeton University), it mentioned that cross-sectional dependence and serial correlation are nothing to worry about when working with micro panels (very few years). However what is considered as very few years? (would 6 years be considered as very few?). I will surely read the article you posted.

      What should I do to control if the cross-sectional dependence is correlated with the regressors? (what test should I apply here)

      Okay thank you, from other posts, reghdfe is mentioned.

      The related study didn't say much, however since data is strongly related, the study mentioned
      - Also presence of heteroscedasticy (1)
      - Presence of serial correlation (2)
      - Presence of cross sectional dependence (3)
      To overcome these 3 issues, the study mentioned that it adjusted the standard errors for clustering by both firm and year
      Moreover, to overcome collinearity problems, time fixed effect were left out in the model

      Comment


      • #4
        If you're interested in estimating the coefficients of time-invariant predictors but -fr- is the right specification for your data, you may want to consider the community-contributed programme -xthybrid-.
        That said, if you wisely invoke clustered standard errors to take heteroskedastcity and/or autocorrelation into account, you cannot use -hausman- anymore (as it does not support non-default standard errors), but should switch to the community-contributed command -xtoverid- instead (which allows non-default standard errors but, in turm, being a bit old-fashioned, does not support -fvvarlist- notation; see -xi:- for an usual workaround).
        With a short T dimension (6 years), across panel correlation is probably negligible.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo

          Thank you for your response.
          So contrary to what Jacob Sano indicated about the Hausman test, you suggest sticking with the specification model FE however, since I want to examine the impact of different regions across Europe, you suggest applying the -xthybrid- (and not FE) to overcome the omitting values of the European Region (dummy variable)?

          I'm sorry but I don't quite understand what you mean by applying the -xtoverid-, what should I do when using this instead of Hausman test?

          Okay, so you agree with "the presentation" indicating the fact that since I have a short T dimension, panel correlation is neglible.

          Kind regards!

          Comment


          • #6
            Veekman:
            whereas is true that -hausman- outcome is not written in stone, the community-contributed programme -xthybrid- is one oif the usual trick to accomodate -fe- specification and time-invariant coefficients estimate.
            The advice to switch to the community-contributed programme -xtoverid. stems from the evidence that -hausman- does not support non-default standard errors (nor is correct to add non-default standard errors after the -hausman- outcome).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Carlo

              Okay I understand that by using the -xthybrid- you can use both time variant and invariate variables in model when FE is recommended approach.
              However when I execute this; Note that I did not tell Stata that I have panel data yet (thus did not use -xtset- id Year)
              >> xthybrid ROA (dep var) CEP (indep var) Size Inno Leverage Growth Mshare Capital CurrentRatio Cashflow i.Region, clusterid(id)

              Stata tells me factor-variable and time-series operators are not allowed, how do I overcome this issue?

              Kind regards

              Comment


              • #8
                Veekman:
                try:
                Code:
                xi: xthybrid ROA (dep var) CEP (indep var) Size Inno Leverage Growth Mshare Capital CurrentRatio Cashflow i.Region, clusterid(id)
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Dear Carlo

                  I tried to use the -xtoverid- test to control whether i should use FE/RE, taking into account the time invariant variable Region (4 values, but leave out one dummy to avoid dummy variable bias).
                  xtreg ROA CEP Size Inno Leverage Growth Share Capital CurrentRatio Cashflow NorthEU SouthEU WestEU, re

                  Click image for larger version

Name:	Screenshot 2020-05-05 at 15.29.24.png
Views:	1
Size:	87.5 KB
ID:	1551299


                  Next i applied -xtoverid-

                  Click image for larger version

Name:	Screenshot 2020-05-05 at 15.29.31.png
Views:	1
Size:	16.0 KB
ID:	1551300


                  Can I conclude that after examining this test, I should stick with the FE?
                  Last edited by Veeckman Art; 05 May 2020, 07:30.

                  Comment


                  • #10
                    Veekman:
                    yes, your ijnterpretation is correct.
                    But why did you omit the clustered standard error from your code?
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Dear Carlo

                      After adjusting the code to what you indicated.
                      I examined the effect for both independent variables (ROA and TobinsQ)

                      - xthybrid ROA_w CEP Size Inno_w Leverage_w Growth_w Mshare_w Capital_w CurrentRatio_w Cashflow_w y14 y15 y16 y17 y18 N_EU S_EU W_EU, clusterid(id) full - (Model 1)
                      - xthybrid TobinsQ_w CEP Size Inno_w Leverage_w Growth_w Mshare_w Capital_w CurrentRatio_w Cashflow_w y14 y15 y16 y17 y18 N_EU S_EU W_EU, clusterid(id) full - (Model 2)

                      Note I removed both 1 dummy from Years (using only 5 years) and 1 dummy from Region (using only 3 years) to avoid dummy trap

                      Model where independent variable is ROA
                      Click image for larger version

Name:	Screenshot 2020-05-05 at 15.48.29.png
Views:	1
Size:	168.5 KB
ID:	1551311


                      Model where independent variable is TobinsQ

                      Click image for larger version

Name:	Screenshot 2020-05-05 at 15.48.49.png
Views:	1
Size:	167.6 KB
ID:	1551312


                      According to the outcome above, could you confirm that I used a hybrid model, where both time variate (CEP, Size, Inno, Leverage, Growth, Mshare, Capital, CurrentRatio and Years) and time invariate (Region) have been used.

                      This I also adjust for heteroskedasticity here?

                      Kind regards

                      Comment


                      • #12
                        Veekman:
                        yes: you used -xthybrid- and obtaines coefficients for time-invariant predictors, too (regions);
                        yes: clustered SEs accomodate for heteroskedasticity and/or autocorrelation of epsilon residual distribution.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Dear Carlo,

                          Yes you are right in #10, I indeed forgot to add the cluster(id)
                          Thus, it should have been;
                          - xtreg ROA_w CEP Size Inno_w Leverage_w Growth_w Mshare_w Capital_w CurrentRatio_w Cashflow_w NorthEU SouthEU WestEU, re cluster(id) -
                          - xtoverid -
                          - xtreg TobinsQ_w CEP Size Inno_w Leverage_w Growth_w Mshare_w Capital_w CurrentRatio_w Cashflow_w NorthEU SouthEU WestEU, re cluster(id) -
                          - xtoverid -

                          Both for model 1 and 2 -xtoverid- had p-value = 0,000 indicating to reject H0, and use FE over RE

                          Comment


                          • #14
                            Dear Carlo,

                            With the output from #11, do you believe I am now able to conclude whether increasing environmental performance has an impact on financial performance, measured by ROA and TobinsQ? (with the data I collected, there seems to be no relationship, thus increasing environmental performance doesn't have an impact on environmental performance).

                            In other words, should I apply some more steps now or can I use these models to answer my hypothesis and discuss my results?

                            1) hypothesis 1 stating that an increase in environmental performance should increase financial performance
                            2) Hypothesis 2 stating that there is a difference within the valuation of environmental performance across European Regions

                            Thank you for all the help you've given me.
                            Kind regard

                            Comment


                            • #15
                              Veeckman,
                              To chime in, -xthybrid- is the more robust workaround to include the time-invariate variables if your hausman/xtoverid suggests use of the -fe- specification. Especially if you're using it for thesis analysis I'd make sure to go into the help page to find the theory/scholarship behind the command.
                              With regards to your second hypothesis, unless I'm overlooking something, it seems that you're looking directly at the effects of the regions on firm performance (ROA, TobinQ), instead of the difference in the CEP(?) coefficient. If you have reason to believe that the effect works differently depending on the region, you may consider running region-separated regressions or interacting the region dummies with your environment variable.

                              Comment

                              Working...
                              X