Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IV Regression With Interaction Terms and 2+ Instruments

    Dear all,

    Greetings to all contributors from someone new to the forum and a user of Stata 15.1/SE.

    I have a data set of roughly 900 obs and am trying to perform a regression on 17 variables (no panel data) plus their respective interaction terms with one variable. For simplicity, assume I only have 3 main vars X1 (dummy), X2 (dummy) and X3 (continuous) in a regression model that can be expressed in pseudo code as:
    Code:
    Y = b0 + b1 * X1 + b2 * X2 + b3 * X3 + b4 * (X1*X2) + b5 * (X1*X3) + e
    As I have the informed suspicion that my key interaction variable X1 might be endogenous, I have attempted to find an instrument for it. Let's say, I have identified Z1 (dummy) and Z2 (continuous) as relevant and valid intruments for X1, so:
    Code:
    X1 = a0 + a1 * Z1 + a2 * Z2 + u
    My understanding is, that if I want the first model to be specified correctly, I have to run an IV regression due to the mentioned endogeneity. To my mind come the Stata built-in -ivregress- or the equally intuitive -ivreg2- from SSC by Baum, Schaffer and Stillmann.

    Essentially my problem has two facets:
    1. Notwithstanding any background information about my research design: is this a statistically sound approach, i.e. are statistical inferences plausible with this model? I have stumbled across the forbidden regression as provided in Wooldridge (2000), Econometric Analysis of Cross Section and Panel Data, section 9.5, esp. pp. 236-7. However, my impression is that my problem differs from the forbidden regression model as I have endogeneity suggested only in X1 and not in any of X2, X3. But maybe I am misinterpreting things here.

    2. Given the answer to Question 1 is 'yes': How can I implement this in Stata? I have come across - amongst other - this informative post: https://www.statalist.org/forums/for...eraction-terms; and especially answer #8. However, I fail to imagine how I can execute the code proposed therein when I have two instruments. Employing -ivreg2-, I imagine this would in its simplest form result in something like:
    Code:
     / * IV Regression With Interaction Terms * /
    * This should be the model without endogeneity:
    reg Y X1 X2 X3 X1##X2 X1##c.X3
    
    * Now with IV approach:
    ssc install ivreg2, replace
    ivreg2 Y X2 X3 (X1 X1##X2 X1##c.X3 = Z1 Z2 Z1##X2 c.Z2##X2 Z1##c.X3 c.Z2##c.X3)
    I especially struggle with the part in parentheses after the -ivreg2- command. While this IV approach provides at least any output, I am not entirely certain if it produces valid results.


    Any insights are greatly appreciated.


    Best,
    Fabio

  • #2
    You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    The normal practice is to calculate the interactions first and then for any interactions that include the endogenous variables, include them in the list of endogenous variables for ivreg or ivreg2.

    There is some question about this: Maurice J. G. Bun & Teresa D. Harrison (2019) OLS and IV estimation of
    regression models including endogenous interaction terms, Econometric Reviews, 38:7, 814-827,
    DOI: 10.1080/07474938.2018.1427486

    Comment


    • #3
      Fabio: I'm not sure what the issue is. I think the command you want is

      Code:
       
       ivregress 2sls Y X2 X3 (X1 c.X1#c.X2 c.X1#c.X3 = Z1 Z2 c.Z1#c.X2 c.Z2#c.X2 c.Z1#c.X3 c.Z2#c.X3)
      I'm actually not sure that ivregress allows factor notation. If not, you need to construct the interactions yourself, preferably centered so that the level effects have meaning.

      The variables X1, X1#X2, X1#X3 are all endogenous. So you need IVs for all of them. And the interaction are the very natural IVs to use.

      JW

      Comment


      • #4
        Many thanks to both of you, Phil and Jeff!


        Originally posted by Jeff Wooldridge View Post
        I think the command you want is

        Code:
        ivregress 2sls Y X2 X3 (X1 c.X1#c.X2 c.X1#c.X3 = Z1 Z2 c.Z1#c.X2 c.Z2#c.X2 c.Z1#c.X3 c.Z2#c.X3)
        As per my first post, I have tried virtually the exact same approach you proposed, yet with the -ivreg2- command instead of -ivregress-. It worked; in that it provided an output that appears to produce somewhat reasonable results after double-checking.

        I was just wondering, if the code in that form would take care of every aspect of the problem (i.e. correct standard errors, centered interaction terms). So the notion I get from your answer is that this is a reasonable method of coding my model (i.e. put it in one -ivreg2- command vs. constructing the interaction terms manually first).
        Plus, given the absence of any theoretical objections in your two responses, I assume this is generally a valid way to go about this from a methodological point of view.


        Originally posted by Jeff Wooldridge View Post
        The variables X1, X1#X2, X1#X3 are all endogenous. So you need IVs for all of them. And the interaction are the very natural IVs to use.

        I understand that every interaction term that contains X1 is endogenous. I believe what you mean is that given Z1 and Z2 are relevant and valid instruments for X1, I should also use them as instruments for the interaction terms with X1 (which is also reflected in your code example). Am I understanding this part correctly?


        Best

        Comment


        • #5
          Yes, that's correct. In fact, using the result on optimal instruments, if it happens that E(X1|X2,X3,Z1,Z2) is linear and the structural error is homoskedastic, you can show that using the interactions is the optimal choice of IVs. It is also legitimate to interact first stage fitted values for X1 with X2 and X3 and use those as IVs (no regressors -- that would be the forbidden regression!).

          Comment


          • #6
            Excellent. Thanks to your very helpful input, I am now comfortable enough with my methodology to proceed with my work.

            Thank you very much for your time, Jeff!

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              Yes, that's correct. In fact, using the result on optimal instruments, if it happens that E(X1|X2,X3,Z1,Z2) is linear and the structural error is homoskedastic, you can show that using the interactions is the optimal choice of IVs. It is also legitimate to interact first stage fitted values for X1 with X2 and X3 and use those as IVs (no regressors -- that would be the forbidden regression!).
              Hi Jeff:

              Are you aware of any paper where they interacted first-stage fitted values with some other variable like X2 above to use as instruments?

              Comment

              Working...
              X