Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different results from manual interaction term and # command

    Hello,

    I'm getting different coefficients when I manually create an interaction term versus when I use the # command on stata, and I am not sure why.
    I replicated a simple version of my code using auto.dta for ease of replicability. Thanks for any help in advance!

    Code:
    sysuse auto.dta, clear
    gen dummy = headroom<3
    gen dummyinteract = dummy*headroom
    reg price dummy dummy#c.headroom
    reg price dummy dummyinteract

    Edit: Should add, I'm using Stata 16.1 on a firewalled server
    Last edited by Wendy Zeng; 05 Jun 2020, 17:10.

  • #2
    Wendy:
    welcome to this forum.
    Your code interact a variable with a part of the same variable.
    More helpful replies are conditional on posting what Stata gave you back, top (as per FAQ).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Because the two specifications are not the same. The first with the factor variable notation includes both levels of the dummy variable. In the manual specification you have only included one of the levels of the factor variable. In the example below I labels the values of the dummy variables as A & B. In the full model specification you need to include both when A is true and when B in true. I think part of the difficultly understanding is not including the main effects ("headroom" as a variable in the model)..

      Code:
      sysuse auto.dta, clear
      gen dummy = (headroom<3) 
      label define lab 0 "A" 1 "B"
      label values dumm lab 
      reg price i.dummy  i.dummy#c.headroom, noheader
      //Interaction coefficients are just the slopes at 
      // different levels of the dummy variable
      margins dummy, dydx(headroom)
      //This is made clearer when the main effects are included
      reg price i.dummy##c.headroom, noheader
      
      //Manual specification
      gen dummyinteract = dummy*headroom
      
      tab dummy, gen(D)
      gen D1interact = D1*headroom 
      gen D2interact = D2*headroom
      //Full model
      reg price D2 D1interact D2interact
      //Original Model
      reg price D2  D2interact, noheader
      reg price dummy  dummyinteract, noheader

      Comment


      • #4
        Hello, I am dealing with a similar issue and have come across this thread - but I think the response given above does not apply to me.

        I am running a regression using ppmlhdfe with two dummy variables and the interaction between them. This is constructed as follows:

        Code:
        gen interaction = D1*D2
        ppmlhdfe y D1#D2 control i.year, vce(robust)
        ppmlhdfe y D1 interaction D2 control i.year, vce(robust)
        I ran this comparison mostly to see if the results are the same, as the way esttab outputs and labels the first version is kind of ugly and confusing. However, while the coefficients on D1 and D2 in the second version match those of D1 = 1, D2 = 0 and D1 = 0, D2 = 1 in the first version, the interaction term is completely different - wrong sign, wrong magnitude, significant in the first version but insignificant in the second. The coefficient on D1 = 0, D2 = 0 which is explicitly outputted in the first version is omitted due to collinearity, so I feel the results really should be identical.

        I have re-run this using the reg command to make sure it's not a ppml issue, but the same thing happened. I have also tried adding the dummies and interaction as explicit factor variables:

        Code:
        ppmlhdfe y i.D1 i.interaction i.D2 control i.year, vce(robust)
        but the outcome did not change.

        Although my case is a bit different (as it uses two dummy variables, and I already included both dummies individually) I tried to apply Scott's response anyway, generating both levels of the first dummy variable and interacting both with the other dummy, as follows:

        Code:
        tab D1, gen(d)
        gen d1D2 = d1*D2
        gen d2D2 = d2*D2
        ppmlhdfe y D1 D2 d1D2 d2D2 control i.year, vce(robust)
        but what happens is that d2D2 is omitted because of collinearity - not surprisingly - and the results are the same. Does anyone have any clues as to why this is?

        Comment


        • #5
          Pia:
          welcome to this forum.
          What if:
          Code:
          gen interaction = D1*D2
          ppmlhdfe y D1##D2 control i.year, vce(robust)
          ppmlhdfe y D1 interaction D2 control i.year, vce(robust)
          ?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            #4 is cross-posted at https://stackoverflow.com/questions/...cted-variables

            You are asked to tell us about cross-posting. FAQ Advice #8 https://www.statalist.org/forums/help#crossposting

            Comment


            • #7
              Pia: As is noted in the FAQ, you're likely to get better answers if you show us what Stata actually produces when you type your commands.

              Frankly, when I use i.D1#i.D2 I find the output confusing because it tries to show the three combinations of zero and one relative to the base group where D1 = D2 = 0. And it usually shows D1 = 0, D2 = 1 -- not want you want. The command where you manually compute the interaction gives the correct answer. Also, using c.D1#c.D2 as the interaction will also provide the correct estimates.

              Carlo's suggestion of D1##D2 also produces the correct estimates.

              Comment


              • #8
                When I feel lost with interactions (and as Jeff reported, this hits me especially when interactions include categorical variables with >2 levels each), I usually add the -allbaselevels- option to get a better awareness of what their coefficients are trying to tell me.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Apologies Nick for not mentioning the Stack overflow post in my comment - I have edited my post on Stack overflow accordingly.

                  Thank you very much Carlo and Jeff for your advice! I have replicated the issue as follows:

                  Code:
                  sysuse auto.dta, clear
                  gen high_price = 0
                  replace high_price = 1 if price>6165
                  gen interaction = high_price*foreign 
                  ppmlhdfe trunk high_price interaction foreign headroom, vce(robust)
                  ppmlhdfe trunk high_price#foreign headroom, vce(robust)
                  ppmlhdfe trunk high_price##foreign headroom, vce(robust)
                  ppmlhdfe trunk high_price c.high_price#c.foreign foreign headroom, vce(robust)
                  As is the case with my original data, the version using # yields a different result, but the manual interaction and the version with ## or c.high_price#c.foreign result in the same output. This is good to know - I will stay clear of the # in the future!

                  Comment


                  • #10
                    Pia:
                    it is expected that:
                    Code:
                    ppmlhdfe trunk high_price#foreign headroom, vce(robust)
                    gives back different results when compared to:
                    Code:
                    ppmlhdfe trunk high_price##foreign headroom, vce(robust)
                    as in the first code the main conditional effect of the two terms included in the interaction cannot be calculated,as you omitted both -high_price- and -foreign- outside the interaction.

                    That said:
                    Code:
                    ppmlhdfe trunk high_price##foreign headroom, vce(robust)
                    gives back the same results as:
                    Code:
                    ppmlhdfe trunk high_price interaction foreignheadroom, vce(robust
                    because in codes #3 and #4 you included -high_price- and -foreign- plus their interaction (please note that, unlike code #3, code #4 can't allow you to exploit the virtuous relationship of -fvvarlist- with -margins- and -marginsplot-).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Dear All,

                      I am estimating a structural gravity model. I am estimating the average impact of FTA indicator but having issues once I interact my FTA dummy with specific provisions dummies mentioned in the agreements (such as IPR provision).

                      The way I interact the two is :

                      ppmlhdfe total_trade_flow agree_fta#IPR, a(imp_time exp_time first#second) cluster(pair_id)

                      Note that once I separately include agree_fta and IPR indicators into the above equation in addition the the interaction term, this drops my interaction terms because of collinearity. I have pasted the results using agree_fta#IPR interaction term. Note that the existence of a provision depends on the existence of FTAs. Therefore, the first row is empty.

                      HDFE PPML regression No. of obs = 69,942
                      Absorbing 3 HDFE groups Residual df = 11,135
                      Statistics robust to heteroskedasticity Wald chi2(2) = 9.70
                      Deviance = 127489763.8 Prob > chi2 = 0.0078
                      Log pseudolikelihood = -63880220.51 Pseudo R2 = 0.9372

                      Number of clusters (pair_id)= 11,136
                      (Std. err. adjusted for 11,136 clusters in pair_id)

                      Robust
                      total_trade~w Coefficient std. err. z P>z [95% conf. interval]

                      agree_fta#IPR
                      0 1 0 (empty)
                      1 0 .4870974 .2696324 1.81 0.071 -.0413725 1.015567
                      1 1 .204006 .0757856 2.69 0.007 .0554689 .352543

                      _cons 12.217 .0321301 380.24 0.000 12.15403 12.27997


                      I have finally came across your comments here and wanted to make sure if I am doing it correct and checked the below equations:

                      ppmlhdfe total_trade_flow agree_fta c.agree_fta#c.IPR IPR, a(imp_time exp_time first#second) cluster(pair_id)


                      Now, the IPR provision has been dropped due to collinearity and I received the below estimation:

                      HDFE PPML regression No. of obs = 69,942
                      Absorbing 3 HDFE groups Residual df = 11,135
                      Statistics robust to heteroskedasticity Wald chi2(2) = 9.70
                      Deviance = 127489763.8 Prob > chi2 = 0.0078
                      Log pseudolikelihood = -63880220.51 Pseudo R2 = 0.9372

                      Number of clusters (pair_id)= 11,136
                      (Std. err. adjusted for 11,136 clusters in pair_id)
                      -----------------------------------------------------------------------------------
                      | Robust
                      total_trade_flow | Coefficient std. err. z P>|z| [95% conf. interval]
                      ------------------+----------------------------------------------------------------
                      agree_fta | .4870974 .2696324 1.81 0.071 -.0413725 1.015567
                      |
                      c.agree_fta#c.IPR | -.2830914 .2733026 -1.04 0.300 -.8187546 .2525718
                      |
                      IPR | 0 (omitted)
                      _cons | 12.217 .0321301 380.24 0.000 12.15403 12.27997
                      -----------------------------------------------------------------------------------

                      The interaction term becomes negative but statistically insignificant. However, if I include only the interaction term and remove agree_fta and IPR variables, then the results give me significant and positive estimation.

                      ppmlhdfe total_trade_flow c.agree_fta#c.IPR, a(imp_time exp_time first#second) cluster(pair_id)

                      HDFE PPML regression No. of obs = 69,942
                      Absorbing 3 HDFE groups Residual df = 11,135
                      Statistics robust to heteroskedasticity Wald chi2(1) = 5.18
                      Deviance = 127601348.1 Prob > chi2 = 0.0229
                      Log pseudolikelihood = -63936012.66 Pseudo R2 = 0.9371

                      Number of clusters (pair_id)= 11,136
                      (Std. err. adjusted for 11,136 clusters in pair_id)
                      -----------------------------------------------------------------------------------
                      | Robust
                      total_trade_flow | Coefficient std. err. z P>|z| [95% conf. interval]
                      ------------------+----------------------------------------------------------------
                      c.agree_fta#c.IPR | .1688964 .0742208 2.28 0.023 .0234264 .3143665
                      |
                      _cons | 12.27553 .0064697 1897.39 0.000 12.26285 12.28821
                      -----------------------------------------------------------------------------------


                      I am trying to see whether FTAs including IPR provision has stronger effects on trade or not. As I said, if I include only c.agree_fta#c.IPR instead of agree_fta c.agree_fta#c.IPR IPR into the equation, I receive very different results and I would like to make sure whether I can use any of these three.

                      If you can help me, I would really appreciate it.

                      Best regards
                      Yusuf

                      Comment


                      • #12
                        Yusuf:
                        Joao Santos Silva is the guru for this (and many more) topic(s).
                        Take a look at his previous posts.
                        As an aside, please use CODE delimiters when posting what you typed and what Stata gave you back. Thanks.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Dear Yusuf Ceylan,

                          Since the IPR variable can only be 1 when there is an FTA, it is already an interaction and therefore you should not add the additional interaction between IPR and FTA.

                          Best wishes,

                          Joao

                          Comment

                          Working...
                          X