Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Regession: Why do STATA results change when the order of interactions terms change??

    Hi All,

    I'm running a three way panel regression. The three panels are country, year and technology.

    My two independent variables are both dummy. Trying to investigate how the interaction of these two affect the dependent variable.

    I have run the following code:

    reg Y i.X1##i.X2 i.techID#i.year i.countryID#i.year,r cl(countryID)

    However, the estimation results are completely different when i include country*year FE first. That is when I run the code,

    reg Y i.X1##i.X2 i.countryID#i.year i.techID#i.year ,r cl(countryID)

    Appreciate if someone can help with this issue.

    Thanks.

    DN Jay

  • #2
    Showing the results might help. Use code tags (see pt. 12 in the FAQ).

    Are you sure they are completely different? Or is it maybe just getting parameterized differently because different categories are being dropped?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Hi Richard,

      Thank you for your quick response. These are the outputs for the two codes. The coeffecients change completely.

      Appreciate your feedback.



      1. The results for the code reg Y i.X1##i.X2 i.techID#i.year i.countryID#i.year,r cl(countryID) are as follows: I have excluded the estimates for the interactions because its very lengthy.

      ================================================== ===
      . reg Y i.X1##i.X2 i.techID#i.year i.countryID#i.year,r cl(countryID)
      note: 66.countryID#1990.year omitted because of collinearity
      note: 79.countryID#2000.year omitted because of collinearity
      note: 82.countryID#2000.year omitted because of collinearity
      note: 83.countryID#1985.year omitted because of collinearity
      note: 83.countryID#1990.year omitted because of collinearity
      note: 83.countryID#1995.year omitted because of collinearity
      note: 83.countryID#2000.year omitted because of collinearity

      Linear regression Number of obs = 4,909
      F(81, 82) = .
      Prob > F = .
      R-squared = 0.9450
      Root MSE = .59202

      (Std. Err. adjusted for 83 clusters in countryID)
      --------------------------------------------------------------------------------
      | Robust
      Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      ---------------+----------------------------------------------------------------
      1.X1 | .6550157 .0418388 15.66 0.000 .571785 .7382463
      1.X2 | .4550005 .0104275 43.63 0.000 .4342568 .4757442
      |
      X1#X2 |
      1 1 | .6862578 .0462608 14.83 0.000 .5942303 .7782853
      |
      techID#year |
      1 1985 | .0068082 .0384721 0.18 0.860 -.0697251 .0833415
      1 1990 | .0939994 .0323552 2.91 0.005 .0296345 .1583643
      1 1995 | -.3783144 .0567576 -6.67 0.000 -.4912234 -.2654055
      1 2000 | -.478142 .0503815 -9.49 0.000 -.5783668 -.3779172
      2 1980 | 5.630762 .0631476 89.17 0.000 5.505141 5.756383
      2 1985 | 5.661342 .066616 84.98 0.000 5.528822 5.793863
      2 1990 | 5.793312 .0617169 93.87 0.000 5.670537 5.916086
      2 1995 | 5.373535 .0848712 63.31 0.000 5.2047 5.542371
      2 2000 | 5.328558 .0824974 64.59 0.000 5.164444 5.492672
      3 1980 | .8816503 .0594966 14.82 0.000 .7632925 1.000008
      3 1985 | .8750643 .0770489 11.36 0.000 .7217894 1.028339
      3 1990 | .9349516 .0718517 13.01 0.000 .7920157 1.077888
      --more--


      2. The results for the code reg Y i.X1##i.X2 i.countryID#i.year i.techID#i.year ,r cl(countryID) are as follows:

      . reg Y i.X1##i.X2 i.countryID#i.year i.techID#i.year ,r cl(countryID)
      note: 66.countryID#1990.year omitted because of collinearity
      note: 79.countryID#2000.year omitted because of collinearity
      note: 82.countryID#2000.year omitted because of collinearity
      note: 24.techID#1985.year omitted because of collinearity
      note: 24.techID#1990.year omitted because of collinearity
      note: 24.techID#1995.year omitted because of collinearity
      note: 24.techID#2000.year omitted because of collinearity

      Linear regression Number of obs = 4,909
      F(81, 82) = .
      Prob > F = .
      R-squared = 0.9450
      Root MSE = .59202

      (Std. Err. adjusted for 83 clusters in countryID)
      --------------------------------------------------------------------------------
      | Robust
      Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      ---------------+----------------------------------------------------------------
      1.X1 | -2.83128 .1431933 -19.77 0.000 -3.116137 -2.546423
      1.X2 | -3.380107 .1325716 -25.50 0.000 -3.643834 -3.11638
      |
      X1#X2 |
      1 1 | 4.172553 .1513936 27.56 0.000 3.871383 4.473723
      |
      countryID#year |
      1 1985 | -4.181966 .1283696 -32.58 0.000 -4.437334 -3.926598
      1 1990 | -3.944997 .1496756 -26.36 0.000 -4.24275 -3.647245
      1 1995 | -4.262322 .1279723 -33.31 0.000 -4.5169 -4.007745
      1 2000 | -4.396018 .1325921 -33.15 0.000 -4.659786 -4.13225
      2 1980 | 3.311612 .1456424 22.74 0.000 3.021883 3.601341
      2 1985 | -.845488 .1650376 -5.12 0.000 -1.1738 -.5171756
      2 1990 | -.4227891 .0351569 -12.03 0.000 -.4927274 -.3528508
      2 1995 | -.0975678 .1073452 -0.91 0.366 -.3111116 .1159761
      2 2000 | -.6180799 .0263032 -23.50 0.000 -.6704053 -.5657544
      3 1980 | 3.242609 .1202133 26.97 0.000 3.003467 3.481752
      3 1985 | -.8652221 .1406615 -6.15 0.000 -1.145043 -.5854016
      3 1990 | -.1923833 .1281443 -1.50 0.137 -.4473032 .0625366
      --more--

      Comment


      • #4
        DN (full given name, please):
        as William suspected, omission due to collinearity are different in the two models. That translates into different coefficients.
        Two asides:
        - if you're dealing with panel data, what's the gain in using -regress- instead of -xtreg-?;
        - omitting the main conditional effect of interactions is rarely advised;
        - as per my experience, explaining the meaning of a three-level interaction is very difficult. Can't you stop a two-level?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,

          Thanks for your response. Please see my response in blue.

          1. Omission due to collinearity are different in the two models. That translates into different coefficients. how can I address this? should I remove those observations from the dataset and rerun?

          2. if you're dealing with panel data, what's the gain in using -regress- instead of -xtreg-? My impression was that both produce approximately the same results. I'm not really familiar with panel regression so I refereed to this link(https://www.princeton.edu/~otorres/Panel101.pdf) which shows that both xtreg,areg and regress gives the same results for panel data. I didnt use xtreg because im not sure how to specify it correctly since this a three way panel data.

          3. omitting the main conditional effect of interactions is rarely advised; The country*year fixed effects(FE) is to control for time-variant differences across countries and technology*year FE is to remove technology diffusion paths across countries. When I include the main conditional effects along the interaction FE, the results change further.

          4. as per my experience, explaining the meaning of a three-level interaction is very difficult. Can't you stop a two-level? Not sure about what you mean by three level interaction, but country*year and technology*year are two separate interactions.

          Regards,
          Dini Jay(full name)

          Comment


          • #6
            Dini:
            Q1) The best advice is to follow the specification that the literature in your research field suggest and face collinearity as unavoidable. Please note that, having lots of categorical variable, collinearity is a frequent downside.
            That said, I think that you can simplify your panel data dimension considering -countryID- as the panelid, -year- as the time variable and -techlID- as a control variable.
            I panel data jargon, this will translate into:
            Code:
            xtset countryID year
            xtreg Y i.X1##i.X2 i.techID##i.year, fe///Here I do not consider the random effect specification, that might be an option, though
            Please note that, having lots of categorical variables, collinearity is a frequent downside;

            Q2) see Q1);

            Q3) see Q!);

            Q4) you're right and I should change my glasses. However, I would consider one two-level interaction only (with the main conditional effect, though. - see Q1)).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo,

              I tried to run
              xtset countryID year

              but I get an error saying repeated time values within panel.

              As I mentioned before my dataset is a threeway panel data. for each country consists five years and within each country year there are 10 techIDs. so when I use xtset countryID year i definitely have repeated country-year observations. following is a sample of my data for just one country. All together there is 80 countries so the countryID ranges from 1 to 80.
              countryID year techID Y X1 X2
              1 1980 1 0.3 1 0
              1 1980 2 0.2 1 1
              1 1980 3 1 1 1
              1 1980 4 1.5 0 0
              1 1980 5 0.8 1 0
              1 1980 6 4.5 0 0
              1 1980 7 3.5 0 0
              1 1980 8 4.9 0 1
              1 1980 9 8.9 1 1
              1 1980 10 2.7 1 1
              1 1985 1 2.3 0 0
              1 1985 2 4.7 0 1
              1 1985 3 6.4 0 0
              1 1985 4 1.8 0 1
              1 1985 5 9.1 0 0
              1 1985 6 8.2 0 0
              1 1985 7 0.5 1 0
              1 1985 8 1.5 1 1
              1 1985 9 2.8 1 1
              1 1985 10 7.3 1 0
              1 1990 1 4.6 0 1
              1 1990 2 0.3 0 0
              1 1990 3 0.2 0 1
              1 1990 4 3.2 1 1
              1 1990 5 1.5 1 0
              1 1990 6 5.6 1 0
              1 1990 7 2.3 1 1
              1 1990 8 3.5 1 1
              1 1990 9 4.9 1 1
              1 1990 10 8.9 1 1
              1 1995 1 2.7 0 0
              1 1995 2 2.3 0 0
              1 1995 3 4.7 0 0
              1 1995 4 6.4 1 1
              1 1995 5 1.8 0 0
              1 1995 6 9.1 1 1
              1 1995 7 8.2 0 0
              1 1995 8 7.8 1 1
              1 1995 9 1.5 0 1
              1 1995 10 2.8 0 1
              1 2000 1 7.3 0 1
              1 2000 2 4.6 0 0
              1 2000 3 8.2 0 0
              1 2000 4 0.5 1 0
              1 2000 5 1.5 1 1
              1 2000 6 2.8 1 0
              1 2000 7 7.3 1 1
              1 2000 8 4.6 1 0
              1 2000 9 5.2 0 1
              1 2000 10 4.6 0 0
              Regards, Dini
              Last edited by DN Jay; 07 Jun 2017, 21:42.

              Comment


              • #8
                Dini:
                try:
                Code:
                . egen New_Time=group( year techID )
                
                . xtset countryID New_Time
                       panel variable:  countryID (strongly balanced)
                        time variable:  New_Time, 1 to 50
                                delta:  1 unit
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hi Carlo,

                  thanks for your feedback.

                  I did try the method you suggested. Under fixed effects my X1 gets omitted.

                  Eg:- code: xtreg Y X1##X2 i.techID##i.year,fe cl(countryID)

                  \
                  Fixed-effects (within) regression Number of obs = 4,909
                  Group variable: countryID Number of groups = 83

                  R-sq: Obs per group:
                  within = 0.9388 min = 21
                  between = 0.1564 avg = 59.1
                  overall = 0.8768 max = 90

                  F(110,4716) = 657.83
                  corr(u_i, Xb) = -0.1210 Prob > F = 0.0000

                  -----------------------------------------------------------------------------------
                  Y| Coef. Std. Err. t P>|t| [95% Conf. Interval]
                  ------------------+----------------------------------------------------------------
                  1.X1 0 (omitted)
                  1.X2 .0373042 .0398166 0.94 0.349 -.040755 .1153634
                  X1#X2 .0144591 .0616359 0.23 0.815 -.1063759 .1352942

                  techID |
                  2 | 5.628524 .0932407 60.37 0.000 5.445728 5.811319
                  3 | .8816503 .0946493 9.31 0.000 .6960935 1.067207
                  --more--

                  When I use random effects, I get the following results. Code:- xtreg Y X1##X2 i.techID##i.year, r cl( countryID)


                  Random-effects GLS regression Number of obs = 4,909
                  Group variable: countryID Number of groups = 83

                  R-sq: Obs per group:
                  within = 0.9367 min = 21
                  between = 0.5289 avg = 59.1
                  overall = 0.9138 max = 90

                  Wald chi2(82) = .
                  corr(u_i, X) = 0 (assumed) Prob > chi2 = .

                  (Std. Err. adjusted for 83 clusters in countryID)
                  -----------------------------------------------------------------------------------
                  Robust
                  Y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                  ------------------+----------------------------------------------------------------
                  X1 .4781735 .1523178 3.14 0.002 .1796361 .7767109
                  X2 .2813291 .1429062 1.97 0.049 .001238 .5614201
                  X1#X2 .3156088 .1890246 1.67 0.095 -.0548726 .6860903

                  techID |
                  2 | 5.658976 .0612835 92.34 0.000 5.538862 5.779089
                  3 | .8816503 .0569232 15.49 0.000 .7700829 .9932176

                  These random effects results are similar to my previous regression when I run code: regress Y X1##X2 i.techID#i.year, r cl(countryID).
                  Can you kindly explain why that is?

                  My goal is to control for country fixed effects and technology*year fixed effects. I also need to cluster the standard errors by countryID. Different approaches give different results and I'm confused of the correct method.

                  So far I have tried:

                  1. regress Y X1##X2 countryID i.techID#i.year, r cl(countryID)

                  2. regress Y X1##X2 i.techID#i.year, r cl(countryID)

                  3. egen New_Time=group( year techID )
                  xtset countryID New_Time
                  xtreg Y X1##X2 i.techID##i.year,fe cl(countryID)


                  4. xtreg Y X1##X2 i.techID##i.year, r cl( countryID)

                  Under both 3 and 4 I cant include countryID as an indpendent variable to control for country fixed effects. so does that mean by setting xtset countryID New_Time, it is already controlling for country fixed effects. if thats the case, why are the results of 4 similar to 2 when model 2 does not control for country fixed effects?

                  Appreciate if you can recommend the best approach and explain why the results of 2 and 4 and the same.

                  Thanks,
                  Dini Jay
                  Last edited by DN Jay; 12 Jun 2017, 21:20.

                  Comment


                  • #10
                    Dini:
                    there's such a thing as a best approach there (as well as in other instance).
                    My advice is avoiding hunting for the model with the most statistical significant predictors, but mimicking what other did in the past in your research field when presented with the same research topic. Obviously, different approaches give back different results.
                    Your problems boil down to collinearity issues: if one or more of your predictors are cannot be disentangled (from -fe-, say), they are omitted due to collinearity. The usual fix is doing nothing; others may prefere to change their model specification.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hi Carlo,

                      Thanks for the feedback. I understand that the results differ due to the collinearity issue. Past studies in my field ignore collinearity and take the results as given. But in my case I'm confused on which result I should consider as correct? Just selecting the result thats is in my favor is not really justifiable right? Not really sure how other journal authors cope with this issue as no paper mentions such technical issues.

                      Best,
                      DNJay

                      Comment


                      • #12
                        Dini:
                        I agree with you that:
                        -hunting for the "best" results is by no means justifiable (especially if you want to disseminate your results afterwards);
                        - most regression models reported in technical journals are source of concerning about the lack of postestimation analysis (which, in turn, affect the robustness of the published findings).
                        If the literature in your research field does not cover the issue you're interested in, try to end up with the regression model that, according to the theorethical building blocks of your discipline, gives a true and fair view of the data generating process.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment

                        Working...
                        X