Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental Variables regression with multiple endogenous variables

    Dear all,

    I want to do an IV regression with two endogenous variables, however I do not know if I am doing this the right way and if so, how to interpret the coefficients. This is my code:

    Code:
    foreach var of varlist lcocaha1 { 
        xi: xtivreg2 `var' lpop ylxlrer        ///
            yearxMuncode* YearInd*Wkmean    /// 
            (ylxlp_cof ylxd_oaxlp_cof = ylxltop3cof ylxd_oaxltop3cof)     ///
            i.year, fe cluster(dept) partial(i.year) first 
            outreg2 using CI_Analysis.xls, se bdec(3) tdec(3) nocons excel 
            }
    in which lcocaha1 is my y-variable, and ylxlp_cof and ylxd_oaxlpcof my two endogenous variables. All other variables denoted in my code are controls.

    I am using Stata 14.2

    This is the first time I am using Statalist so I hope I stated my question clear enough. Hope you can help me!

    Best,

    Sophie


  • #2
    You'll increase your chances of a very helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    It is often better when you're starting to run the regressions separately rather than in a loop. It is a little questionable to switch variables in and out of the regression - you're likely to have omitted variables bias if you show a variable matters in one estimate then drop it in another. You should switch from using calculated interactions to using factor variable notation - this lets you use margins more easily (except for the endogenous variables that you must calculate before xtivreg2). Also, you don't need the xi: any more, at least in xtivreg. As for multiple endogenous variables, this is how you do it - include them as endogenous in the estimate.

    I normally save the results using estimates store and then use outreg2 on all the estimates for a given table at once. In your loop, you appear to overwrite the results file repeatedly.

    Comment


    • #3
      Dear Phil,

      Thank you for your answer. I realize I could have included some more details to make myself more clear. However, I seem to believe, from your answer, that my code is correct (next to some coding details that could have made it simpler) ?

      Because if so, I would like some advice on the interpretation of the coefficients of my endogenous variables.

      To clarify why I constructed my code this way: the second endogenous variable (ylxd_oaxlpcof) has been constructed interacting a dummy for a specific region (oa) in my dataset with my main endogenous variable (ylxlpcof). I did this since I want to know whether my results are driven by the contribution of this area (oa), as coefficients of my main regression remain negative but turn insignificant when excluding this region from my baseline regression (I did this as part of my robustness checks).

      Code:
      baseline regression:
      
      foreach var of varlist lcocaha1 {
          xi: xtivreg2 `var' lpop ylxlrer                   ///
              yearxMuncode* YearInd*Wkmean    ///
              (ylxlp_cof = ylxltop3cof)                    ///
              i.year, fe cluster(dept) partial(i.year) first
              }
      
      excluding oa region:
      
      foreach var of varlist lcocaha1 {
          xi: xtivreg2 `var' lpop ylxlrer                   ///
              yearxMuncode* YearInd*Wkmean    ///
              (ylxlp_cof = ylxltop3cof)                    ///
              i.year if r_oa==0, fe cluster(dept) partial(i.year) first   /// r_oa denotes dummy for region oa
              }
      
      test if result is driven by region oa or by lack of variability when excluding region oa:
      
      foreach var of varlist lcocaha1 {
          xi: xtivreg2 `var' lpop ylxlrer                   ///
              yearxMuncode* YearInd*Wkmean    ///
              (ylxlp_cof ylxd_oaxlp_cof = ylxltop3cof ylxd_oaxltop3cof)     ///
              i.year, fe cluster(dept) partial(i.year) first
              }
      So actually I want to test whether my results are driven by region oa and therefore coefficients turn insignificant when excluding this, or if excluding this region removes so much variation from my dataset that coefficients turn insignificant.

      Therefore, my main question: How to interpret the coefficients on ylxlp_cof and ylxd_oaxlp_cof?

      I hope this clarifies a bit what I want to achieve. I really hope you can help me.

      Best,

      Sophie
      Last edited by Sophie Robbe; 03 Jan 2018, 10:19.

      Comment


      • #4
        Hi, I have a theoretical question regarding multiple endogenous variables and IVs. I am looking at the effect of early maternal employment on childhood test scores. I have data on the hours worked by the mother at 9months and 3 years. The regression looks something like: testscore_t = a + b’X_t + c(hours_{t-4}) + d(hours_{t-6}) + u, where X is a vector of child and maternal characteristics at time t. This is a standard approach regression equation in the literature on this topic, but both hours variables are endogenous because of selection into work.
        My question regarding IVs is whether I can use two highly correlated instruments to instrument for hours_{t-4} and hours_{t-6}. My instinct would be that instruments for different endogenous variables cannot be correlated, because then instrument1 doesn’t really provide an exogenous shift to that variable, as it affects both endogenous variables.

        Some help on this would be much appreciated!

        Kind regards,

        Susana


        Comment

        Working...
        X