Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Xtivreg with interaction

    Dear all,

    Currently, I am running into some trouble with xtivreg in combination with an interaction

    I am researching the effect of bilateral tax treaties on FDI inflow, with several control variables and fixed effects for the source and resident country and year. My main independent variable is a dummy denoting whether a country pair has concluded a bilateral tax treaty. As instruments, I am using two dummies on common language and colonial relationship, a continuous variable on the number of treaties closed prior to the year of interest and the GDP of the source country. Because my independent variable is a dummy, I have to use a probit model to estimate the first stage regression before I can estimate the second stage regression.

    My regression is formulated as follows:

    Code:
    probit tt_d comlang_off colony treaties_s ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, robust
    predict tt_d_iv, xb
    xtivreg ln_FDI (tt_d=tt_d_iv) ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, re

    Next, I would like to make an interaction between the instrumented variable tt_d and the categorical variable income level, denoting four country income level groups.
    I have already tried to use the standard interaction method:
    Code:
     xtivreg ln_FDI c.(tt_d=tt_d_iv)##i.Income_level_S_e ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, re
    However, Stata then states that the parentheses are imbalanced.

    Does anyone know how to solve this or have any other suggestions on how to make an interaction between the two variables?

    Kind regards,
    David

  • #2
    Go to

    Code:
    help xtivreg

    and examine the syntax of the command. The parentheses enclose the endogenous variable and the set of instruments. If you want to interact the endogenous variable with some variables, then that means that you will have several endogenous variables. You will probably need to interact the instruments as well. Do all this within the parentheses.

    Comment


    • #3
      Dear Andrew,

      Thank you for the response. If I understand you correctly, it should then be coded in the following way:
      Code:
      probit tt_d i.comlang_off##i.Income_level_S_e i.colony##i.Income_level_S_e c.treaties_s##i.Income_level_S_e ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, robust
      predict tt_d_iv, xb
      xtivreg ln_FDI (c.tt_d##i.Income_level_S_e=c.tt_d_iv##i.Income_level_S_e) ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, re
      However, this gives an error message for the last line of code:
      Code:
      depvars may not be factor variables
      r(198);
      Could you elaborate on what I'm supposed to do instead?

      Many thanks,
      David

      Comment


      • #4
        depvars may not be factor variables
        r(198);
        You need to create the depvars manually.

        Code:
        qui sum Income_level
        local min = r(min)
        levelsof Income_level if Income_level >`min', local(levels)
        foreach i of local levels{
            g ttd_inc`i'= c.ttd#`i'.Income_level
        }
        xtivreg ln_FDI (c.ttd ttd_inc* =c.tt_d_iv##ib`min'.Income_level) ...
        I much doubt that you can justify the random effects assumption in a cross-country macro panel.

        Comment


        • #5
          Also I missed this in #1

          Because my independent variable is a dummy, I have to use a probit model to estimate the first stage regression before I can estimate the second stage regression.
          This makes absolutely no sense. Do you not have instruments for the endogenous variable? I think you need to back up and analyze what you are doing.

          Comment


          • #6
            Dear Andrew,

            Thanks for the suggestions.

            With regard to #4, the main independent variable Tax Treaty is time-invariant from the moment a double taxation treaty (DTT, on the prevention of double taxation by two countries on the same income) is concluded. DTTs are generally signed only once between two countries, sometimes there are updates on the text and interpretation of specific provisions within the treaty. In theory, DTTs can be cancelled by the contracting parties, but this hardly ever happens. Only two DTTs out of 3,000 have been terminated in the past twenty years, and two have been renegotiated. So there is little within variation. As a result, it is not possible to use a fixed effects model. I know this would be better from a theoretical perspective, but I am constrained by the data and thus choose to use a random effects model.

            With regard to #5, I'll try to clarify my line of reasoning. My research is on the FDI inflow of developing countries, the independent variable Tax Treaty denotes whether a DTT is signed by two countries. MNEs that perform FDI from country A to country B often lobby their government in A to start negotiations on a DTT as this gives them more certainty on their tax liability (this is a bit of a simplification). As a result, whether a DTT is signed is influenced by earlier FDI flows between the two countries andTax Treaty should be instrumented. My instruments are dummies for common language and colonial relationships and continuous variables on the number of treaties closed prior to the year of interest and the GDP of country B. These instruments are valid and strong based on the Kleibergen Paap (2006) statistics for under-identification and weak instruments and the Hansen J (2008) statistic for over-identification.

            For the first stage regression, I use a probit regression because the instrumented variable Tax Treaty is a dummy and a first-stage OLS regression would calculate a continuous estimate whereas I need a dummy denoting 0(=no treaty) or 1(=treaty). I have based this line of reasoning on Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data (pp. 621–629). MIT Press. https://ipcig.org/evaluation/apoio/W...nel%20Data.pdf

            As using a probit as first stage is not an option within xtivreg, I splitted the code into a command for the first stage and saved the results under tt_d_iv. Next, this estimated tt_d_iv is then used as instrument in the xtivreg regression.
            Code:
            probit tt_d comlang_off colony treaties_s ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, robust
            predict tt_d_iv, xb
            xtivreg ln_FDI (tt_d=tt_d_iv) ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, re
            A different option would be to run the following:
            Code:
            probit tt_d comlang_off colony treaties_s ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, robust
            predict tt_d_iv, xb
            xtreg ln_FDI tt_d_iv ln_Openness_1 ln_GDP_pc_1 ln_Telephone_1 RoL_1 ln_Gov_exp_1 ln_GDP_Growth_1 ln_av_CPI_5 ln_M2_growth_1 i.year i.Country_e i.cou_e, re
            where the endogenous tt_d is replaced by tt_d_iv in the xtreg regression.

            Does this clarify it enough, and do you agree with using a first-stage probit model? In addition, which of the two options is the correct one?


            Best,
            David


            Last edited by David Jacobsz; 25 Sep 2022, 04:07. Reason: Further elaboration on first stage probit clarification

            Comment


            • #7
              Thanks for the reference, I was not aware of this procedure. The implementation following the discussion in Wooldridge (2010) is as follows:

              Example 18.3 (Estimating the Effects of Education on Fertility): We use the data in FERTIL2.RAW to estimate the effect of attaining at least seven years of education on fertility. The data are for women of childbearing age in Botswana. Seven years of education is, by far, the modal amount of positive education. (About 21 percent of women report zero years of education. For the subsample with positive education, about 33 percent report seven years of education.) Let y = children, the number of living children, and let w = educ7 be a binary indicator for at least seven years of education. The elements of x are age, age2, evermarr (ever married), urban (lives in an urban area), electric (has electricity), and tv (has a television). The OLS estimate of ATE is =.394 (se = .050). We also use the variable frsthalf, a binary variable equal to one if the woman was born in the first half of the year, as an IV for educ7. It is easily shown that educ7 and frsthalf are significantly negatively related. The usual IV estimate is much larger in magnitude than the OLS estimate, but only marginally significant: =1.131 (se = .619). The estimate from Procedure 18.1 is even bigger in magnitude, and very significant: =1.975 (se = .332). The standard error that is robust to arbitrary heteroskedasticity is even smaller. Therefore, using the probit fitted values as an IV, rather than the usual linear projection, produces a more precise estimate (and one notably larger in magnitude).
              Code:
              use http://www.stata.com/data/jwooldridge/eacsap/fertil2.dta, clear
              *USUAL IV REGRESSION
              ivregress 2sls children (educ7= frsthalf) c.age##c.age evermarr urban electric tv
              *PROCEDURE 18.1
              probit educ7 frsthalf c.age##c.age evermarr urban electric tv
              predict pr, pr
              ivregress 2sls children (educ7= pr) c.age##c.age evermarr urban electric tv
              Res.:

              Code:
              . *USUAL IV REGRESSION
              
              . ivregress 2sls children (educ7= frsthalf) c.age##c.age evermarr urban electric tv
              
              Instrumental variables (2SLS) regression          Number of obs   =      4,358
                                                                Wald chi2(7)    =    5816.02
                                                                Prob > chi2     =     0.0000
                                                                R-squared       =     0.5651
                                                                Root MSE        =     1.4652
              
              ------------------------------------------------------------------------------
                  children |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     educ7 |   -1.13068   .6186665    -1.83   0.068    -2.343244    .0818844
                       age |   .2627018   .0191424    13.72   0.000     .2251834    .3002202
                           |
               c.age#c.age |  -.0019787   .0002903    -6.82   0.000    -.0025476   -.0014098
                           |
                  evermarr |   .6167576   .0844692     7.30   0.000      .451201    .7823141
                     urban |  -.1672413   .0794551    -2.10   0.035    -.3229704   -.0115122
                  electric |  -.2343255   .1153132    -2.03   0.042    -.4603353   -.0083157
                        tv |  -.1371643   .1827466    -0.75   0.453    -.4953411    .2210126
                     _cons |   -2.83005   .6344204    -4.46   0.000    -4.073491   -1.586609
              ------------------------------------------------------------------------------
              Instrumented:  educ7
              Instruments:   age c.age#c.age evermarr urban electric tv frsthalf
              
              .
              . *PROCEDURE 18.1
              
              . ivregress 2sls children (educ7= pr) c.age##c.age evermarr urban electric tv
              
              Instrumental variables (2SLS) regression          Number of obs   =      4,358
                                                                Wald chi2(7)    =    4985.56
                                                                Prob > chi2     =     0.0000
                                                                R-squared       =     0.4893
                                                                Root MSE        =     1.5877
              
              ------------------------------------------------------------------------------
                  children |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     educ7 |  -1.974509   .3314743    -5.96   0.000    -2.624186   -1.324831
                       age |    .252137   .0194179    12.98   0.000     .2140785    .2901954
                           |
               c.age#c.age |  -.0020734   .0003077    -6.74   0.000    -.0026764   -.0014704
                           |
                  evermarr |    .527485    .067659     7.80   0.000     .3948757    .6600943
                     urban |  -.0797056    .061311    -1.30   0.194    -.1998729    .0404617
                  electric |  -.1171961   .0952452    -1.23   0.219    -.3038733    .0694811
                        tv |   .0789773   .1301417     0.61   0.544    -.1760958    .3340503
                     _cons |  -2.032667   .4115925    -4.94   0.000    -2.839373    -1.22596
              ------------------------------------------------------------------------------
              Instrumented:  educ7
              Instruments:   age c.age#c.age evermarr urban electric tv pr

              I am not quite sure how you extend the procedure to panel data as you probably need to have the country effects in the probit model. However, note that as stated in the book, the usual IV procedure is still valid with a binary endogenous variable, so you may want to consider this if you cannot resolve the issue of how to include the country effects in the probit model. For the second stage, the panel data equivalent of ivregress is xtivreg. You still risk ending up fitting noise with random effects.
              Last edited by Andrew Musau; 25 Sep 2022, 13:29.

              Comment


              • #8
                Dear Andrew,

                Thank you for the response. I'll try and see if I get it working within my thesis deadline.

                Best,
                David

                Comment


                • #9
                  I don’t have my book in front of me but hopefully I didn’t say that you have to use something like probit in the first stage. It might help with efficiency. As Andrew said, you can use interactions of exogenous variables as IVs for any interaction that includes an endogenous variable. In Chapter 9 of my book I don’t think I say the EEV can’t be binary.

                  And now I see Andrew, as usual, has a clear discussion.

                  Definitely do not put in county FEs in the probit. Use correlated RE probit if you want to use that approach.

                  Comment

                  Working...
                  X