Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing regression models with fixed effects and multi clustering

    Hi,

    I have unbalanced panel data for 2235 companies from 32 countries over 14 years for which I run the following six regressions: (CVs: additional 12 control variables)

    1-) reg Y X CVs ,r
    2-) reg Y X CVs i.country i.year,r
    3-) reghdfe Y X CVs,absorb(country year) vce(cluster country)
    4-) reghdfe Y X CVs,absorb(country year) vce(cluster year)
    5-) reghdfe Y X CVs,absorb(country year) vce(cluster country year)
    6-) reghdfe Y X CVs,noabsorb vce(cluster country year)

    Their outputs are as follows:

    Y 1 2 3 4 5 6
    X 1.218*** 0.821*** 0.821*** 0.821** 0.821* 1.218**
    (7.43) (7.07) (2.73) (2.83) (2.14) (2.31)
    Additional control variables YES YES YES YES YES YES
    Time fixed effects NO YES YES YES YES NO
    Country fixed effects NO YES YES YES YES NO
    Clustered standard errors (country) NO NO YES NO YES YES
    Clustered standard errors (year) NO NO NO YES YES YES
    N. of Obs. 22296 22296 22296 22296 22296 22296
    F-stat 24.75 10.60 198.69 759.65 - 64.01
    0.1032 0.1459 0.1459 0.1459 0.1459 0.1032

    Based on the outputs, could we suggest;

    1-) Employing time and country fixed effects, and
    2-) Clustering for country or year or both

    is necessary?

    Last but not least, does F-stat help to pick up the best model among the five above? Why can't I have an F-stat for the fifth model?

    Best,

    Lutfi
    Last edited by Lutfi Ozturker; 16 Jul 2022, 18:39.

  • #2
    I do not think that you can answer your question 2-) by looking at your output post factum. The choice at which level you cluster needs to be made before before you fit the regression. Post factum you just discover that when you cluster you get higher standard errors as expected, because you have less effective observations.

    You are not saying what are Y and X, and what the previous literature says about the relationship of these two. Based on the information you are providing, you should include the fixed effects because they make a big difference.

    Comment


    • #3
      Thank you, Joro.

      Y is the observed riskiness of the companies. X is a measure of similarity to all other companies from the same country. Please remember there are 2235 companies from 32 countries over 14 years.

      Literature is ambiguous about the relationship between Y and X yet consistently significant/positive one between the two in each of the six models above makes sense.

      Would you then update your insightful comments, please?

      Comment


      • #4
        By the way, coefficients and R2 are the same but not the t-stats and F-stats for the following two models for which I actually expected to produce identical results:

        reg Y X CVs i.country i.year,r ............................model-2 above
        reghdfe Y X CVs,absorb(country year) .............alternative code for model-2


        This made me worried about comparing my six models above of which the first two use "reg" and the following four "reghdfe". If the outputs of "reg" and "reghdfe" for the same model are not identical, then I suppose I cannot compare my six models either. Then, should I rewrite the codes for models numbered 3,4,5,6 without "reghdfe"? That is easier said than done though since I couldn't do that.

        Comment


        • #5
          Model 2 specifies robust standard errors. The "alternative code for model-2" does not. That is why the t-stats and F-stats are different. (Had you looked at them, you would have observed the standard errors are also different, as are the p-values and confidence intervals.)

          Comment


          • #6
            Clyde explained why you are getting different variances in #4, the -,r- at the end of your first command calls for robust variance; whereas your -reghdfe- variances are homoskedastic.

            I think you should include country and year fixed effects. The way how you described your Y and X, I see no reason why they, and their relationship should not depend on the country and year.

            Comment


            • #7
              Lufti:
              as an aside to previous excellent advice, I would not sponsor.
              1-) reg Y X CVs ,r
              2-) reg Y X CVs i.country i.year,r
              with panel data.
              With -robust- standard error you're taking care of heteroskedasticity only.
              Conversely, you should go -vce(cluster panelid)- as per panel definition your observations are not independent.
              This issue creeps up from time to time on this forum, since, unlike, -regress-, both -robust- and -vce(cluster panelid)- options do the very same job under -xtreg- (as they both invoke cluster-robuts standard error).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thank you all.

                Based on the responses and the definition of Y and X, which of the six models (or an alternate one) would you suggest?

                Please also give the code for your model suggestion for the sake of clarity. For instance, I guess I cannot employ the following model code:
                xtreg Y X CVs i.country i.year,fe vce (cluster company)
                because the "country" is not a time-varying regressor in the model. I'm not sure though if this is what Carlo advised. Nevertheless, with or without one or more fixed effects, is multi-clustering necessary, too? This second question goes to Joro specifically who advised fixed effects for both country and year. Yet, everybody is more than welcome to suggest a model code for sure.

                PS-1: This is the setting of the panel:

                xtset company year

                PS-2:I have unbalanced panel data for 2235 companies from 32 countries over 14 years.
                Last edited by Lutfi Ozturker; 18 Jul 2022, 03:36.

                Comment


                • #9
                  Lutfi:
                  as far as the following code is concerned:
                  Code:
                  xtreg Y X CVs i.country i.year,fe (vce cluster company)
                  while the -fe- estimator will wipe out time-invariant variables (as in all likelihood country is), my previous comment rested on the fact that -regress....,r- does not mirror a panel data code setup.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thank you Carlo.

                    Since


                    PHP Code:
                    xtreg Y X CVs i.country i.year,fe vce (cluster company
                    produces "country omitted because of collinearity" error message, which model/code do you recommend? What about multi-clustering? Do you think it is needed for my set-up or is only the company level satisfactory?

                    PS: Y is the observed riskiness of the companies. X is a measure of similarity to all other companies from the same country. There are 2235 companies from 32 countries over 14 years.

                    Comment


                    • #11
                      Lutfi:
                      the code produces exactly what expected.
                      Under the -fe- specification, -i.country- is basically redundant (as the estimator will give you back no coefficient at all).
                      I would go:
                      Code:
                      xtreg Y X CVs  i.year,fe vce (cluster company)
                      I do not see multi-clustering that useful here.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Thank you Carlo.

                        Taking into account that X is a measure of similarity to all other companies from the same country, should we sacrifice bank fixed effects (in your code above) and prefer country fixed effects instead as follows:

                        PHP Code:
                        reg Y X CVs i.country i.year,r cluster (company

                        I'm not trying to favour "reg" to "xtreg" but just trying to implement country and time fixed effects should they be preferable to company and time fixed effects in the setup.

                        Please find the output for your model numbered 7 below left to my alternative above numbered 8 in addition to the previously mentioned six models as follows:

                        Y 1 2 3 4 5 6 7 8
                        X 1.218*** 0.821*** 0.821*** 0.821** 0.821* 1.218** 2.198** 0.821***
                        (7.43) (7.07) (2.73) (2.83) (2.14) (2.31) (2.42) (3.63)
                        Additional control variables YES YES YES YES YES YES YES YES
                        Time fixed effects NO YES YES YES YES NO YES YES
                        Country fixed effects NO YES YES YES YES NO NO YES
                        Company fixed effects NO NO NO NO NO NO YES NO
                        Clustered standard errors (country) NO NO YES NO YES YES NO NO
                        Clustered standard errors (year) NO NO NO YES YES YES NO NO
                        Clustered standard errors (company) NO NO NO NO NO NO YES YES
                        N. of Obs. 22296 22296 22296 22296 22296 22296 22296 22296
                        F-stat 24.75 10.60 198.69 759.65 - 64.01 2.85 4.15
                        0.1032 0.1459 0.1459 0.1459 0.1459 0.1032 0.1063 0.1459
                        Last edited by Lutfi Ozturker; 18 Jul 2022, 10:27.

                        Comment


                        • #13
                          Lutfi:
                          under the -fe- specification, each and every time-invariant variable will be wiped out: it holds for -i.country-; -i.bank-, i-whatever- if there's no within-panel variation.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            I assume model 7 (your code)

                            PHP Code:
                              xtreg Y X CVs  i.year,fe vce (cluster company


                            employs company and time fixed effects whereas model 8

                            PHP Code:
                            reg Y X CVs i.country i.year,r cluster (company


                            employs country and time fixed effects, is this correct? Therefore, we have to choose two fixed effects at most at a time since three of them (company,country,time) cannot be employed simultaneously, is this also correct?



                            Taking into account that X is a measure of similarity to all other companies from the same country, should we sacrifice company fixed effects (in model 7) and prefer country fixed effects instead (model 8)?


                            PS-1:
                            Y is the observed riskiness of the companies.


                            PS-2: This is the setting of the panel:

                            xtset company year


                            PS-3:I have unbalanced panel data for 2235 companies from 32 countries over 14 years.
                            Last edited by Lutfi Ozturker; 18 Jul 2022, 11:32.

                            Comment


                            • #15
                              Lutfi:
                              I would go -model 7-, as your -panelid- is company.
                              In addition, with - model 7- you investigate both company and time fixed effect.
                              As an aside, time is actually time-varying; therefore, exception made for the year reference category (and possibly another year omitted due to collinearity), you'll have the coefficients for the remaining years.
                              Eventually, with more than 2000 panels, -vce(cluster panelid)- is mandatory.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X