Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction with instrumented variable

    I am measuring the impact of distance to the nearest city on employment outcomes.

    Y = b0 + b1*X1 + b2*X2 + b3*X1*X2 + controls + e
    • Y is a binary variable.
    • X1 is a categorical variable (income categories)
    • X2 is an continuous endogenous variable (distance), instrumented by Z.
    I would like to interact the categorical variable and the instrumented variable to see the impact of distance on employment outcomes given different income categories. When I do ivprobit regression with this interaction, it increases coefficients a lot. I was wondering whether I am doing it correctly or is there any specific command I should use for this?

    Thank you very much!

  • #2
    You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output and sample data using dataex.

    You don't even tell us exactly what you ran which makes it impossible to tell you if you're doing it correctly. Generally, for interactions with endogenous variables, you need to create the interaction before the estimation and include the interaction among the endogenous variables. If there is endogeneity, then controlling for it should change the parameters.

    Comment


    • #3
      Dear Phil,

      Thanks for answering. Sorry, I did not report a sample dataset because it's too big. Hope this code will help to understand my question:

      This is the simple ivprobit I am running:

      Code:
       
      
       ivprobit empl age agesq edu gender i.wealth (log_dist = log_dist2), vce(robust)
      
      Fitting exogenous probit model
      
      Iteration 0:   log likelihood = -5964.9268  
      Iteration 1:   log likelihood = -4845.5347  
      Iteration 2:   log likelihood = -4825.4807  
      Iteration 3:   log likelihood = -4825.4677  
      Iteration 4:   log likelihood = -4825.4677  
      
      Fitting full model
      
      Iteration 0:   log pseudolikelihood = -20854.293  
      Iteration 1:   log pseudolikelihood =  -20854.29  
      Iteration 2:   log pseudolikelihood =  -20854.29  
      
      Probit model with endogenous regressors         Number of obs     =      8,868
                                                      Wald chi2(9)      =    2301.30
      Log pseudolikelihood =  -20854.29               Prob > chi2       =     0.0000
      
      -----------------------------------------------------------------------------------------
                              |               Robust
                              |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ------------------------+----------------------------------------------------------------
                     log_dist |   .2800374    .047623     5.88   0.000     .1866981    .3733768
                          age |   .2348998    .013861    16.95   0.000     .2077328    .2620667
                        agesq |  -.0028701   .0001952   -14.70   0.000    -.0032528   -.0024874
                          edu |   .1254677   .0071261    17.61   0.000     .1115009    .1394346
                       gender |  -.7290411   .0406152   -17.95   0.000    -.8086454   -.6494369
                              |
                   wealth_cat |
                      poorer  |   .0908315   .0452128     2.01   0.045     .0022162    .1794469
                      middle  |   .1695209   .0508201     3.34   0.001     .0699153    .2691264
                      richer  |   .3361315   .0557483     6.03   0.000     .2268668    .4453963
                     richest  |   .3973981   .0584359     6.80   0.000     .2828659    .5119302
                              |
                        _cons |  -5.596588   .2586441   -21.64   0.000    -6.103521   -5.089654
      ------------------------+----------------------------------------------------------------
       corr(e.log_dist,e.empl)|  -.4346023   .0708039                     -.5626107   -.2862428
                sd(e.log_dist)|   1.474843   .0160344                      1.443749    1.506607
      -----------------------------------------------------------------------------------------
      Instrumented:  log_dist
      Instruments:   age agesq edu gender 2.wealth_cat 3.wealth_cat 4.wealth_cat 5.wealth_cat
                     log_dist2
      -----------------------------------------------------------------------------------------
      Wald test of exogeneity (corr = 0): chi2(1) = 28.44       Prob > chi2 = 0.0000

      Then I would like to interact the binary variable (easier) and the instrumented variable to see the impact of distance on employment outcomes given genders:

      Code:
      ivprobit empl age agesq edu i.gender#c.log_dist  i.wealth (log_dist = log_dist2), vce(robust)
      
      Fitting exogenous probit model
      
      Iteration 0:   log likelihood = -5964.9268  
      Iteration 1:   log likelihood = -5123.7107  
      Iteration 2:   log likelihood = -5107.2074  
      Iteration 3:   log likelihood = -5107.1729  
      Iteration 4:   log likelihood = -5107.1729  
      
      Fitting full model
      
      Iteration 0:   log pseudolikelihood = -16489.935  
      Iteration 1:   log pseudolikelihood = -16489.932  
      Iteration 2:   log pseudolikelihood = -16489.932  
      
      Probit model with endogenous regressors         Number of obs     =      8,868
                                                      Wald chi2(9)      =    2202.72
      Log pseudolikelihood = -16489.932               Prob > chi2       =     0.0000
      
      -----------------------------------------------------------------------------------------
                              |               Robust
                              |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ------------------------+----------------------------------------------------------------
                     log_dist |   .5569142   .1021977     5.45   0.000     .3566104     .757218
                          age |   .1996109   .0172447    11.58   0.000      .165812    .2334099
                        agesq |  -.0024388    .000229   -10.65   0.000    -.0028876   -.0019901
                          edu |   .0855257   .0083696    10.22   0.000     .0691216    .1019298
                              |
            gender#c.log_dist |
                           1  |  -.4911309   .1000764    -4.91   0.000     -.687277   -.2949849
                              |
                   wealth_cat |
                      poorer  |   .0859501   .0418564     2.05   0.040      .003913    .1679871
                      middle  |   .1418404   .0447074     3.17   0.002     .0542155    .2294653
                      richer  |   .2961563   .0478771     6.19   0.000     .2023189    .3899936
                     richest  |   .3370489   .0497717     6.77   0.000     .2394981    .4345998
                              |
                        _cons |  -4.960362   .3857274   -12.86   0.000    -5.716374    -4.20435
      ------------------------+----------------------------------------------------------------
       corr(e.log_dist,e.empl)|  -.5738943   .0843105                     -.7161557   -.3858481
                sd(e.log_dist)|   .8733999   .0176681                      .8394486    .9087244
      -----------------------------------------------------------------------------------------
      Instrumented:  log_dist
      Instruments:   age agesq edu 1.gender#c.log_dist 2.wealth_cat 3.wealth_cat 4.wealth_cat
                     5.wealth_cat log_dist2
      -----------------------------------------------------------------------------------------
      Wald test of exogeneity (corr = 0): chi2(1) = 27.01       Prob > chi2 = 0.0000
      Here we see that the log_dist coefficient increases a lot. Also since it's an instrumented variable, I was not sure I was doing it right.

      As you suggested I also created the interaction before the estimation and then included it among the endogenous variables. However, since I only have one instrument only, it won't do the estimation.

      What would you suggest? Is any of these ways correct?

      Thanks a lot!

      Comment


      • #4
        Hi Javier,
        I dont think your setup is correct. I dont think i have seen before the use of polynomials of an edogenous variable to be used as instrument of itself. This basically violates the assumption that the instrument is uncorrelated with the main outcome error.
        Assuming your instrument is correct, I have seen previous research and threads for linear regressions here in the forum that use something like:
        Code:
        ivprobit y x1 x2 (y2 y2#x2=z1 z2 z3)
        In other words, that need to control for the endogeneity of both the original endogenous variable, and the interaction.
        HTH
        Fernando

        Comment


        • #5
          Hi Fernando,

          Thanks for your answer. The instrument is not a polynomial, it's an exogenous variable and a quite strong instrument. I kept the name like that just for simplicity.

          As I understand, you suggest something like this?
          Code:
           ivprobit empl age agesq edu gender i.wealth (gender#c.log_dist log_dist = log_dist2), vce(robust)
          when I run it, it shows an error:

          depvars may not be interactions
          The endogenous variables are incorrectly specified
          Have I understood it correctly?

          Comment


          • #6
            I see. Sorry for the confusion. Since you named your IV as log_dist2, my first impression was that you were using log_dist^2 as the instrument.
            For your specification. Yes, that is what i have seen done before, but only in terms of linear IV.
            So this may be a hunch, but what about trying your baseline model with constraining the sample by gender.
            HTH

            Comment


            • #7
              Sure, I could run separate regressions for each gender but then I can't answer the question I am asking, whether there is any gender-related difference in the -empl-, other things being equal.

              Comment


              • #8
                Hello Javier,
                have you/or anyone found a solution to this problem since? Because I have the same problem (wanting to run ivprobit with interactions of the endogenous regressor as specified in your code) and I get the same error message from Stata "depvars may not be interactions
                The endogenous variables are incorrectly specified"

                Thank you very much!

                Comment


                • #9
                  I do not think that the error message implies any econometric problem with the model. It is just a convention in Stata not to allow dependent variables to be explicit interactions. In 2SLS (two-stage least squares), the endogenous variables are dependent variables in the first-stage regressions. Therefore, just create the interacted variables yourself.

                  Code:
                  sysuse auto, clear
                  regress c.mpg#c.weight turn disp
                  gen mpg_weight= mpg*weight
                  regress mpg_weight turn disp
                  Res.:

                  Code:
                  . regress c.mpg#c.weight turn disp
                  depvar may not be an interaction
                  r(198);
                  
                  . 
                  . gen mpg_weight= mpg*weight
                  
                  . 
                  . regress mpg_weight turn disp
                  
                        Source |       SS           df       MS      Number of obs   =        74
                  -------------+----------------------------------   F(2, 71)        =      1.80
                         Model |   301554849         2   150777425   Prob > F        =    0.1729
                      Residual |  5.9503e+09        71  83806483.6   R-squared       =    0.0482
                  -------------+----------------------------------   Adj R-squared   =    0.0214
                         Total |  6.2518e+09        73  85641303.9   Root MSE        =    9154.6
                  
                  ------------------------------------------------------------------------------
                    mpg_weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                          turn |   37.75763    386.716     0.10   0.922    -733.3322    808.8474
                  displacement |    20.6968   18.52517     1.12   0.268    -16.24134    57.63495
                         _cons |   55145.48   12748.49     4.33   0.000     29725.71    80565.26
                  ------------------------------------------------------------------------------
                  
                  .

                  Comment


                  • #10
                    This approach does not work with -ivprobit-

                    Code:
                    . webuse laborsup
                    
                    . ivprobit fem_work fem_educ kids (other_inc c.other_inc#c.kids = male_educ c.male_educ#c.kids), nolog
                    depvars may not be interactions
                        The endogenous variables are incorrectly specified
                    r(198);
                    However it works perfectly fine in linear IV

                    Code:
                    . ivregress 2sls fem_work fem_educ kids (other_inc c.other_inc#c.kids = male_educ c.male_educ#c.kids)
                    
                    Instrumental variables (2SLS) regression          Number of obs   =        500
                                                                      Wald chi2(4)    =     135.82
                                                                      Prob > chi2     =     0.0000
                                                                      R-squared       =     0.2579
                                                                      Root MSE        =     .42905
                    
                    ------------------------------------------------------------------------------------
                              fem_work |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------------+----------------------------------------------------------------
                             other_inc |  -.0215009   .0043956    -4.89   0.000    -.0301161   -.0128858
                                       |
                    c.other_inc#c.kids |   .0023376   .0017589     1.33   0.184    -.0011098    .0057851
                                       |
                              fem_educ |   .0633497   .0073805     8.58   0.000     .0488843    .0778152
                                  kids |  -.1749977   .0892116    -1.96   0.050    -.3498493   -.0001462
                                 _cons |   .8728716   .2404446     3.63   0.000     .4016089    1.344134
                    ------------------------------------------------------------------------------------
                    Instrumented:  other_inc c.other_inc#c.kids
                    Instruments:   fem_educ kids male_educ c.male_educ#c.kids
                    I think that Professor Jeff Wooldridge explained that it does not make sense to try and mimic the strategy that works with linear -ivregress-, and apply it in -ivprobit- (which is not IV, it is control function approach). Here is the thread:
                    https://www.statalist.org/forums/for...quadratic-term

                    You can mechanically overrule the Stata -ivprobit- limitation by manually generating the variables and avoiding factor variable notation. But my understaning is that Professor Wooldridge explained that the assumptions necessary for this to work are internally inconsistent. I.e., what is needed for this approach to work, just cannot possibly be.

                    Code:
                    . gen otherinckids =  c.other_inc#c.kids
                    
                    . ivprobit fem_work fem_educ kids (other_inc otherinckids = male_educ c.male_educ#c.kids), nolog
                    
                    Probit model with endogenous regressors         Number of obs     =        500
                                                                    Wald chi2(4)      =     174.04
                    Log likelihood = -4607.7924                     Prob > chi2       =     0.0000
                    
                    --------------------------------------------------------------------------------------------------
                                                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    ---------------------------------+----------------------------------------------------------------
                                           other_inc |  -.0764133   .0129582    -5.90   0.000    -.1018109   -.0510158
                                        otherinckids |   .0100711   .0058323     1.73   0.084    -.0013599    .0215021
                                            fem_educ |   .2081473   .0277055     7.51   0.000     .1538455    .2624491
                                                kids |  -.6832745   .2934867    -2.33   0.020    -1.258498   -.1080511
                                               _cons |   1.506261   .7630166     1.97   0.048     .0107759    3.001746
                    ---------------------------------+----------------------------------------------------------------
                         corr(e.other_inc,e.fem_work)|   .3919935   .1283349                      .1164238    .6115242
                      corr(e.otherinckids,e.fem_work)|   .2879554    .128871                      .0209125    .5166471
                     corr(e.otherinckids,e.other_inc)|   .8314174   .0138075                      .8023083    .8565813
                                      sd(e.other_inc)|   16.66556   .5270111                      15.66399    17.73116
                                   sd(e.otherinckids)|   38.69173    1.22354                      36.36644     41.1657
                    --------------------------------------------------------------------------------------------------
                    Instrumented:  other_inc otherinckids
                    Instruments:   fem_educ kids male_educ c.male_educ#c.kids
                    --------------------------------------------------------------------------------------------------
                    Wald test of exogeneity: chi2(2) = 7.48                   Prob > chi2 = 0.0237
                    So I managed to "make it work" mechanically, but whether this makes sense as a model is a whole different story.

                    Comment

                    Working...
                    X