Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction ## vs. # yielding different results

    Dear all,

    I should preface this by saying that I am relatively new to Stata but very eager to learn, so please bear with me if my question seems trivial to the most advanced users. I am using reg to run a super simple regression analysis of log prices during and after the existence of a collusive agreement, identified by a regressor cartel=1 (and equal to 0 in the after period). In this model, I am also including interactions between the variable cartel and encoded values of suppliers, stored in the variable manufacturer_code. I immediately ran

    Code:
    * model 1 (##)
    reg ln_price_eur_g cartel##i.manufacturer_code, coeflegend
    Which outputs:
    Code:
    . reg ln_price_eur_g cartel##i.manufacturer_code, coeflegend
    note: 1.cartel#4.manufacturer_code identifies no observations in the sample.
    
          Source |       SS           df       MS      Number of obs   =    15,220
    -------------+----------------------------------   F(24, 15195)    =    523.86
           Model |   3515.5094        24  146.479558   Prob > F        =    0.0000
        Residual |  4248.76445    15,195  .279615956   R-squared       =    0.4528
    -------------+----------------------------------   Adj R-squared   =    0.4519
           Total |  7764.27385    15,219  .510169778   Root MSE        =    .52879
    
    ------------------------------------------------------------------------------------------
              ln_price_eur_g | Coefficient  Legend
    -------------------------+----------------------------------------------------------------
                    1.cartel |   .0366436  _b[1.cartel]
                             |
           manufacturer_code |
                          2  |   .2270697  _b[2.manufacturer_code]
                          3  |  -.9271932  _b[3.manufacturer_code]
                          4  |   3.761208  _b[4.manufacturer_code]
                          5  |   .3122013  _b[5.manufacturer_code]
                          6  |   -.870299  _b[6.manufacturer_code]
                          7  |  -1.058295  _b[7.manufacturer_code]
                          8  |  -.6148155  _b[8.manufacturer_code]
                          9  |  -.9390054  _b[9.manufacturer_code]
                         10  |  -.4494987  _b[10.manufacturer_code]
                         11  |  -.9216386  _b[11.manufacturer_code]
                         12  |   .1657904  _b[12.manufacturer_code]
                         13  |   1.827299  _b[13.manufacturer_code]
                             |
    cartel#manufacturer_code |
                       1  2  |  -.3981583  _b[1.cartel#2.manufacturer_code]
                       1  3  |  -.0290233  _b[1.cartel#3.manufacturer_code]
                       1  4  |          0  _b[1o.cartel#4o.manufacturer_code]
                       1  5  |  -.4705767  _b[1.cartel#5.manufacturer_code]
                       1  6  |  -.1939642  _b[1.cartel#6.manufacturer_code]
                       1  7  |   -.144564  _b[1.cartel#7.manufacturer_code]
                       1  8  |  -.3279723  _b[1.cartel#8.manufacturer_code]
                       1  9  |   .2398269  _b[1.cartel#9.manufacturer_code]
                       1 10  |  -.1203988  _b[1.cartel#10.manufacturer_code]
                       1 11  |   .0148721  _b[1.cartel#11.manufacturer_code]
                       1 12  |  -.0953999  _b[1.cartel#12.manufacturer_code]
                       1 13  |   .3254326  _b[1.cartel#13.manufacturer_code]
                             |
                       _cons |  -5.785321  _b[_cons]
    ------------------------------------------------------------------------------------------
    Out of curiosity, I also went ahead and ran a second specification:
    Code:
    * model 2 (#)
    reg ln_price_eur_g cartel ib1.manufacturer_code cartel#ib1.manufacturer_code, coeflegend
    where I specified the base level of manufacturer_code to be the same as the reference category in model 1. This yields:
    Code:
    . reg ln_price_eur_g cartel ib1.manufacturer_code cartel#ib1.manufacturer_code, coeflegend
    note: 1.cartel#4.manufacturer_code identifies no observations in the sample.
    note: 1.cartel#13.manufacturer_code omitted because of collinearity.
    
          Source |       SS           df       MS      Number of obs   =    15,220
    -------------+----------------------------------   F(24, 15195)    =    523.86
           Model |   3515.5094        24  146.479558   Prob > F        =    0.0000
        Residual |  4248.76445    15,195  .279615956   R-squared       =    0.4528
    -------------+----------------------------------   Adj R-squared   =    0.4519
           Total |  7764.27385    15,219  .510169778   Root MSE        =    .52879
    
    ------------------------------------------------------------------------------------------
              ln_price_eur_g | Coefficient  Legend
    -------------------------+----------------------------------------------------------------
                      cartel |   .3620762  _b[cartel]
                             |
           manufacturer_code |
                          2  |   .2270697  _b[2.manufacturer_code]
                          3  |  -.9271932  _b[3.manufacturer_code]
                          4  |   3.761208  _b[4.manufacturer_code]
                          5  |   .3122013  _b[5.manufacturer_code]
                          6  |   -.870299  _b[6.manufacturer_code]
                          7  |  -1.058295  _b[7.manufacturer_code]
                          8  |  -.6148155  _b[8.manufacturer_code]
                          9  |  -.9390054  _b[9.manufacturer_code]
                         10  |  -.4494987  _b[10.manufacturer_code]
                         11  |  -.9216386  _b[11.manufacturer_code]
                         12  |   .1657904  _b[12.manufacturer_code]
                         13  |   1.827299  _b[13.manufacturer_code]
                             |
    cartel#manufacturer_code |
                       1  1  |  -.3254326  _b[1.cartel#1b.manufacturer_code]
                       1  2  |  -.7235909  _b[1.cartel#2.manufacturer_code]
                       1  3  |  -.3544559  _b[1.cartel#3.manufacturer_code]
                       1  4  |          0  _b[1o.cartel#4o.manufacturer_code]
                       1  5  |  -.7960093  _b[1.cartel#5.manufacturer_code]
                       1  6  |  -.5193968  _b[1.cartel#6.manufacturer_code]
                       1  7  |  -.4699966  _b[1.cartel#7.manufacturer_code]
                       1  8  |  -.6534049  _b[1.cartel#8.manufacturer_code]
                       1  9  |  -.0856057  _b[1.cartel#9.manufacturer_code]
                       1 10  |  -.4458314  _b[1.cartel#10.manufacturer_code]
                       1 11  |  -.3105605  _b[1.cartel#11.manufacturer_code]
                       1 12  |  -.4208325  _b[1.cartel#12.manufacturer_code]
                       1 13  |          0  _b[1o.cartel#13o.manufacturer_code]
                             |
                       _cons |  -5.785321  _b[_cons]
    ------------------------------------------------------------------------------------------
    What puzzles me is that, in principle, the two models should be equivalent as long as the reference category is consistent across the two -- which I personally verified with a toy example using the auto dataset -- and yet, results are much different. Even though my gut feeling tells me I should privilege model 1, I can't wrap my head around why they produce different estimates. I am using Sata17 for Windows, any suggestion would be much appreciated!

    PS: Sorry for the lengthy post, I tried to be as detailed as possible.

  • #2
    The models are equivalent - they have the same MS, SS, MSE, and R2. What's different is that the cartel reference group change across models, which influences the interaction term estimates. Without knowing how the cartel variable is coded, that could be a part of it. When you use ib1, keep the following in mind:
    Code:
    help fvarlist
    
    * Relevant info from the helpfile
      Base          
    operator(*)    Description
    ----------------------------------------------------------------------------------
    ib#.           use # as base, #=value of variable
    If you go with i.cartel, Stata picks the lowest numbered category as the base level. So you are changing the behavior if, for example, cartel is coded 0/1.

    Comment


    • #3
      Actually, the results of the two models are not different. They appear to be different because you are treating coefficients with the same names as if they mean the same thing in the two models; but they don't.

      Let's take an example, manufacturer 3. To get the first model's predicted values of ln_price_eur_g in the era where cartel = 0, you would calculate _b[0.cartel] + _b[3.manufacturer_code] + _b[0.cartel#3.manufacturer_code] = 0 (ref.cat) + (-.9271932) + 0 (also one of many ref. cat.s in this model) = -.9271932. For the same manufacturer in the era where cartel = 1, you would calculate _b[1.cartel] + _b[3.manufacturer_code] + _b[1.cartel#3.manufacturer] = 0.0366436 + (-.9271932) + (-.0290233) = -.9195729.

      Now let's do the same thing in the second model. Here the interpretation is different. For the era where cartel = 0, you still calculate _b[0.cartel] + _b[3.manufacturer_code] + _b[0.cartel#3.manufacturer_code] = 0 (ref. cat.) + (-.9271932) + 0 (one of many ref. cat.s) = 0.9271932. And for cartel = 1 you still calculate _b[1.cartel] + _b[3.manufacturer_code] + _b[1.cartel#3.manufacturer_code] = .3620762 + (-.9271932) + (-.3544559) = -.9195729. So the results are exactly the same. You can repeat this for any of the manufacturers and will get the same finding: the two models are producing the same results. In all cases, the difference between the models in _b[1.cartel] is exactly cancelled by the difference between the models in _b[1.cartel#n.manufactuer_code] for all n.

      The meanings of the coefficients in the two models is different. The first model is the conventional way to code this, and in that model _b[1.cartel] is the marginal effect of the cartel = 1 era in the reference manufacturer. And _b[n.manufacturer] is the marginal effect of the n'th manufacturer in the cartel = 0 era. And the interaction coefficient _b[1.cartel#n.manufacturer] is the difference in differences.

      The other conventional way would be -reg ln_price_eur_g ib1.manufacturer_code cartel#ib1.manufacturer_code-; notice the absence of an uninteracted manufacturer code variable compared to your second model. In this case, the coefficient of i.cartel#j.manufacturer_code would be interpreted as the expected value of ln_price_eur_g for manufacturer j in the cartel = i era.

      Your second model is an unconventional way to code this, but it is also legitimate. The problem is that it is difficult to describe what the coefficients represent in words, and I'm not going to try here.

      The main takeaway point is that there are multiple ways to parameterize an interaction model, and although Stata gives the coefficients the same names with these different approaches, the coefficients refer to different things according to which parameterization you choose. So you cannot expect the coefficients in one model to have the same values as the coefficients with the same names in another parameterization. What you can expect is that the predicted values will be the same. And the marginal effects, as calculated by -margins- will also be the same. A quick check that the two models are equivalent can be done by noticing that all of the statistics in the output that precedes the coefficient table itself, are identical.

      Until you are experienced with these models, it is probably safer not to experiment with them in this way. It is easy to mistakenly write something that is not an equivalent parameterization. It is best to pick one conventional method of coding these models and stick with it. Once you have that one truly mastered, you can experiment with the other conventional coding. Once you have both of those under your belt, if you want to then experiment with unconventional approaches, feel free, although, honestly, I can't think of a situation where using an unconventional approach to this has any advantage over the two conventional approaches.

      As between the two conventional approaches, the x1##x2 approach works best when you are interested primarily in estimating the difference(s) in differences--because that is directly expressed in the interaction coefficient(s) and are not so concerned with the outcome levels in the combinations of x1 and x2; you then have to use -margins- or -lincom- to calculate x1#x2 specific combination levels. The x1#x2 approach works best when you are primarily interested in estimating the outcome levels in the combinations of x1 and x2: these are then given as the coefficient(s) of the x1#x2 term(s); you then have to use -lincom- or -margins- to calculate difference(s) in differences.

      Added: Crossed with #2.

      Comment


      • #4
        Dear Erik Ruzek and Clyde Schechter , thank you both for your insightful comments.

        Out of curiosity: could anyone explain why
        Code:
        1.cartel#13.manufacturer_code omitted because of collinearity
        when setting manufacturer 1 as base category (I am assuming that's where the collinearity might stem from)? I have also tried setting 13 as the base but the same note pops up even if manufacturer 1 is indeed received as base:
        Code:
        . reg ln_price_eur_g cartel ib13.manufacturer_code cartel#ib13.manufacturer_code
        note: 1.cartel#4.manufacturer_code identifies no observations in the sample.
        note: 1.cartel#13b.manufacturer_code omitted because of collinearity.
        
              Source |       SS           df       MS      Number of obs   =    15,220
        -------------+----------------------------------   F(24, 15195)    =    523.86
               Model |   3515.5094        24  146.479558   Prob > F        =    0.0000
            Residual |  4248.76445    15,195  .279615956   R-squared       =    0.4528
        -------------+----------------------------------   Adj R-squared   =    0.4519
               Total |  7764.27385    15,219  .510169778   Root MSE        =    .52879
        
        ------------------------------------------------------------------------------------------
                  ln_price_eur_g | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------------------+----------------------------------------------------------------
                          cartel |   .3620762   .1114781     3.25   0.001     .1435657    .5805868
                                 |
               manufacturer_code |
                              1  |  -1.827299   .1014136   -18.02   0.000    -2.026082   -1.628517
                              2  |   -1.60023   .1072615   -14.92   0.000    -1.810475   -1.389984
                              3  |  -2.754493   .0988292   -27.87   0.000     -2.94821   -2.560776
                              4  |   1.933908   .5375281     3.60   0.000     .8802888    2.987528
                              5  |  -1.515098   .0999878   -15.15   0.000    -1.711086    -1.31911
                              6  |  -2.697598   .2554285   -10.56   0.000    -3.198269   -2.196928
                              7  |  -2.885594   .1034309   -27.90   0.000    -3.088331   -2.682857
                              8  |  -2.442115   .1088099   -22.44   0.000    -2.655395   -2.228834
                              9  |  -2.766305   .0982047   -28.17   0.000    -2.958798   -2.573812
                             10  |  -2.276798   .1032707   -22.05   0.000    -2.479221   -2.074375
                             11  |  -2.748938   .1161633   -23.66   0.000    -2.976632   -2.521244
                             12  |  -1.661509   .1119576   -14.84   0.000    -1.880959   -1.442059
                                 |
        cartel#manufacturer_code |
                           1  1  |  -.3254326   .1163557    -2.80   0.005    -.5535038   -.0973614
                           1  2  |  -.7235909   .1215005    -5.96   0.000    -.9617465   -.4854354
                           1  3  |  -.3544559   .1147415    -3.09   0.002    -.5793631   -.1295487
                           1  4  |          0  (empty)
                           1  5  |  -.7960093    .114906    -6.93   0.000    -1.021239   -.5707797
                           1  6  |  -.5193968   .2659327    -1.95   0.051    -1.040657    .0018632
                           1  7  |  -.4699966   .1202936    -3.91   0.000    -.7057865   -.2342067
                           1  8  |  -.6534049   .1262181    -5.18   0.000    -.9008075   -.4060023
                           1  9  |  -.0856057   .1145379    -0.75   0.455    -.3101137    .1389023
                           1 10  |  -.4458314     .11873    -3.76   0.000    -.6785565   -.2131063
                           1 11  |  -.3105605   .1294006    -2.40   0.016    -.5642012   -.0569198
                           1 12  |  -.4208325   .1315891    -3.20   0.001     -.678763    -.162902
                           1 13  |          0  (omitted)
                                 |
                           _cons |  -3.958022   .0965429   -41.00   0.000    -4.147258   -3.768786
        ------------------------------------------------------------------------------------------

        Comment

        Working...
        X