Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical variables interaction joint vs conditional effects

    Dear statalist

    I have two categorical variables, Region which have three categories low, moderate, and high, and Industry which also has low, moderate, and high. I want to measure the joint effect of these categories on firm production. For instance low(region)#low(industry) or low (region)#moderate(industry) and so on.

    I would normally go for reg production i.Region##i.Industry (plus controls) which will give me the main effects plus the difference in conditional effects. But non of the coefficients represent the joint effect and their statistical significance. Thus i intended to go with reg production i.Region#i.Industry (plus controls) to get the joint effect. Would that give biased estimations for the joint effects and if so why ? and how can i get the joint effect with its statistical significance?

    Thank you in advance.
    Last edited by Islam Ibrahim; 19 Aug 2022, 14:17.

  • #2
    I don't know what you mean by the "joint effect." If you are using an interaction model, you are asserting that the effect of each variable depends on the level of the other so there is no unique number that can be called the joint effect of the two variables.

    If you mean that you want to know if these variables are jointly significant as explanatory variables, you can get that with
    Code:
    regress production i.Region##i.Industry // AND OTHER COVARIATES
    testparm i.Region##i.Industry
    Another thing that you might have in mind is the proportion of variance explained by Region & Industry over and above that explained by the other covariates. You can get that with:
    Code:
    nestreg: regress (other_covariates) (i.Region##i.Industry)
    The output will give you the change in R2, among other things.

    Comment


    • #3
      Thank you Schechter for your answer and sorry for not being clear. what I'm after is to see the effect of being both high in region and high in industry on the production

      When i run
      Code:
      regress production i.Region##i.Industry
      i will get
      Code:
            Source |       SS           df       MS      Number of obs   =       500
      -------------+----------------------------------   F(8, 491)       =      5.33
             Model |  336508.599         8  42063.5748   Prob > F        =    0.0000
          Residual |  3872086.37       491  7886.12296   R-squared       =    0.0800
      -------------+----------------------------------   Adj R-squared   =    0.0650
             Total |  4208594.97       499  8434.05806   Root MSE        =    88.804
      
      ----------------------------------------------------------------------------------
            production | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -----------------+----------------------------------------------------------------
               regions |
                    2  |  -7.389467   20.06134    -0.37   0.713    -46.80612    32.02719
                    3  |   56.53216    17.9673     3.15   0.002     21.22989    91.83444
                       |
              indusrty |
                    2  |   31.03725   17.56912     1.77   0.078    -3.482683    65.55719
                    3  |    11.6123   17.69015     0.66   0.512    -23.14543    46.37004
                       |
      regions#indusrty |
                  2 2  |  -5.542226   24.52757    -0.23   0.821    -53.73417    42.64972
                  2 3  |   15.26226   25.63749     0.60   0.552    -35.11047    65.63498
                  3 2  |  -51.00603   27.76507    -1.84   0.067     -105.559    3.546974
                  3 3  |   32.21831    26.0687     1.24   0.217    -19.00165    83.43828
                       |
                 _cons |   136.4639   14.80064     9.22   0.000     107.3835    165.5443
      I'm interested in knowing what is the effect of being in group 2#2 2#3 3#2 and 3#3 with respect to the reference group 1#1. I would not interpret the current coefficient on 2#2 as the net effect of being moderate in the region and moderate in the industry but rather the effect of being moderate in the industry over being moderate in the region, is that correct? and how can i get the net effect of being 2#2 for example?

      If i run instead

      Code:
      regress production i.Region#i.Industry
      Code:
            Source |       SS           df       MS      Number of obs   =       500
      -------------+----------------------------------   F(8, 491)       =      5.33
             Model |  336508.599         8  42063.5748   Prob > F        =    0.0000
          Residual |  3872086.37       491  7886.12296   R-squared       =    0.0800
      -------------+----------------------------------   Adj R-squared   =    0.0650
             Total |  4208594.97       499  8434.05806   Root MSE        =    88.804
      
      ----------------------------------------------------------------------------------
            production | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -----------------+----------------------------------------------------------------
      regions#indusrty |
                  1 2  |   31.03725   17.56912     1.77   0.078    -3.482683    65.55719
                  1 3  |    11.6123   17.69015     0.66   0.512    -23.14543    46.37004
                  2 1  |  -7.389467   20.06134    -0.37   0.713    -46.80612    32.02719
                  2 2  |   18.10556   18.12701     1.00   0.318    -17.51052    53.72164
                  2 3  |   19.48509    19.4936     1.00   0.318    -18.81606    57.78625
                  3 1  |   56.53216    17.9673     3.15   0.002     21.22989    91.83444
                  3 2  |   36.56339   24.03163     1.52   0.129    -10.65413     83.7809
                  3 3  |   100.3628    21.9529     4.57   0.000     57.22957     143.496
                       |
                 _cons |   136.4639   14.80064     9.22   0.000     107.3835    165.5443
      ----------------------------------------------------------------------------------
      Can i interpret the coefficient on 1#2, 1#3 and 2#2, and so on as the difference in effect between these groups and the reference group? using the significance of these coefficients as statistical evidence on their net effect (if any)?

      And lastly, can i present the last regression as a 3 by 3 table instead of a list in which each group (2#3) has its own coefficient with its p-value or star

      Sorry to prolong
      Last edited by Islam Ibrahim; 19 Aug 2022, 16:16.

      Comment


      • #4
        Yes you can do that. And although the two models are, in fact, algebraic transforms of each other (different parameterizations of the same model), and the coefficients of either can be gotten from the other using linear transformations, for the purpose you state, the i.Region#i.Industry model is easier to work with.

        Comment


        • #5
          As always, thank you so much. One last thing if you May, any recommendation on a quick code by which i can get that 3 by 3 table of the coefficients? Thanks again

          Comment


          • #6
            Here's an illustration of the approach you can use, demonstrated from a silly regression using the auto.dta:
            Code:
            clear*
            sysuse auto, clear
            replace rep78 = max(3, rep78) - 2
            
            regress price i.foreign#i.rep78
            
            frame create coef_table byte(foreign rep78) float(coef se)
            forvalues  f = 0/1 {
                forvalues r = 1/3 {
                    frame post coef_table (`f') (`r') (_b[`f'.foreign#`r'.rep78]) (_se[`f'.foreign#`r'.rep78])
                }
            }
            
            frame change coef_table
            
            table (foreign) (rep78), statistic(mean coef se) nototals
            Note: The above code requires version 17. It gives a table of the coefficients and their standard errors. If you really want p-values, you can calculate them yourself from the coefficient and the standard error and put those in the table. I wouldn't: there aren't many things less useful than a coefficient shown with a pvalue.
            Last edited by Clyde Schechter; 19 Aug 2022, 16:48.

            Comment

            Working...
            X