Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interval Regression with Instrumental Variable

    Hello!

    I am trying to estimate the causal effect of certain individual characteristics on the outcome "net wage". My outcome variable is not measured continuously, but in 9 intervals/bins (which are censored at the lower and upper bound):

    Code:
    tab Outcome
    
        Outcome |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |        357       13.02       13.02
              2 |        330       12.04       25.05
              3 |        455       16.59       41.65
              4 |        562       20.50       62.14
              5 |        472       17.21       79.36
              6 |        310       11.31       90.66
              7 |        192        7.00       97.67
              8 |         52        1.90       99.56
              9 |         12        0.44      100.00
    ------------+-----------------------------------
          Total |      2,742      100.00
    I first estimated it with OLS as a baseline, then tried the command "intreg":

    Code:
    recode Outcome (1=.)(2=1)(3=2)(4=3)(5=4)(6=5)(7=6)(8=7)(9=8), gen(Outcome1)
    (2742 differences between Outcome and Outcome1)
    
    . recode Outcome (9=.), gen(Outcome2)
    (12 differences between Outcome and Outcome2)
    
    . intreg Outcome1 Outcome2 Treatment, vce(robust)
    
    Fitting constant-only model:
    
    Iteration 0:   log pseudolikelihood = -5373.7754  
    Iteration 1:   log pseudolikelihood = -5336.1702  
    Iteration 2:   log pseudolikelihood = -5336.0182  
    Iteration 3:   log pseudolikelihood = -5336.0182  
    
    Fitting full model:
    
    Iteration 0:   log pseudolikelihood = -5373.2751  
    Iteration 1:   log pseudolikelihood = -5335.7403  
    Iteration 2:   log pseudolikelihood = -5335.5894  
    Iteration 3:   log pseudolikelihood = -5335.5894  
    
    Interval regression                             Number of obs     =      2,691
                                                       Uncensored     =          0
                                                       Left-censored  =        352
                                                       Right-censored =         12
                                                       Interval-cens. =      2,327
    
                                                    Wald chi2(1)      =       0.82
    Log pseudolikelihood = -5335.5894               Prob > chi2       =     0.3639
    
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       Treatment |  -.0980528   .1080018    -0.91   0.364    -.3097325    .1136268
           _cons |   3.418425   .0991997    34.46   0.000     3.223997    3.612853
    -------------+----------------------------------------------------------------
        /lnsigma |   .6866218   .0147144    46.66   0.000     .6577822    .7154614
    -------------+----------------------------------------------------------------
           sigma |   1.986992   .0292373                      1.930506     2.04513
    ------------------------------------------------------------------------------

    Now I'd like to estimate the regression, instrumenting for my "Treatment" variable with another variable. Is it possible to use IV with interval regression in Stata?

    Thank you for any help!

  • #2
    Check out the user written command -cmp- by David Roodman.

    Comment


    • #3
      Thank you for the advice! I checked it out and gave it a try - does the following use of the command look reasonable to you?

      "Treatment" is a Dummy, that's why I specified $cmp_probit in the options.

      Code:
      cmp (Treatment = Instrument) (Outcome1 Outcome2 = Treatment), indicators($cmp_probit $cmp_int) 
      
      Fitting individual models as starting point for full model fit.
      Note: For programming reasons, these initial estimates may deviate from your specification.
            For exact fits of each equation alone, run cmp separately on each.
      
      Iteration 0:   log likelihood = -1508.6496  
      Iteration 1:   log likelihood = -720.53165  
      Iteration 2:   log likelihood = -694.97762  
      Iteration 3:   log likelihood = -694.68111  
      Iteration 4:   log likelihood = -694.68095  
      Iteration 5:   log likelihood = -694.68095  
      
      Probit regression                               Number of obs     =      3,480
                                                      LR chi2(1)        =    1627.94
                                                      Prob > chi2       =     0.0000
      Log likelihood = -694.68095                     Pseudo R2         =     0.5395
      
      ------------------------------------------------------------------------------
         Treatment |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
        Instrument |   2.589436   .0759961    34.07   0.000     2.440486    2.738386
             _cons |  -.4634362   .0481121    -9.63   0.000     -.557734   -.3691383
      ------------------------------------------------------------------------------
      
      Interval regression                             Number of obs     =      2,691
                                                         Uncensored     =          0
                                                         Left-censored  =        352
                                                         Right-censored =         12
                                                         Interval-cens. =      2,327
      
                                                      LR chi2(1)        =       0.86
      Log likelihood = -5335.5894                     Prob > chi2       =     0.3544
      
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
         Treatment |  -.0980528    .105876    -0.93   0.354     -.305566    .1094603
             _cons |   3.418425   .0968686    35.29   0.000     3.228566    3.608284
      -------------+----------------------------------------------------------------
          /lnsigma |   .6866218   .0154286    44.50   0.000     .6563823    .7168613
      -------------+----------------------------------------------------------------
             sigma |   1.986992   .0306565                      1.927805    2.047995
      ------------------------------------------------------------------------------
      
      Fitting constant-only model for LR test of overall model fit.
      
      Fitting full model.
      
      Iteration 0:   log likelihood = -6030.7938  
      Iteration 1:   log likelihood = -6029.3513  
      Iteration 2:   log likelihood = -6029.3469  
      Iteration 3:   log likelihood = -6029.3469  
      
      Mixed-process regression                        Number of obs     =      3,480
                                                      LR chi2(2)        =    1629.79
      Log likelihood = -6029.3469                     Prob > chi2       =     0.0000
      
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      Treatment    |
        Instrument |   2.589232   .0759633    34.09   0.000     2.440346    2.738117
             _cons |  -.4642361   .0481076    -9.65   0.000    -.5585252   -.3699469
      -------------+----------------------------------------------------------------
      Outcome1     |
         Treatment |  -.2273688   .1421295    -1.60   0.110    -.5059374    .0511999
             _cons |   3.528635    .126102    27.98   0.000     3.281479     3.77579
      -------------+----------------------------------------------------------------
          /lnsig_2 |   .6869851   .0154452    44.48   0.000     .6567129    .7172572
      /atanhrho_12 |   .0811387   .0595483     1.36   0.173    -.0355737    .1978511
      -------------+----------------------------------------------------------------
             sig_2 |   1.987714   .0307007                      1.928443    2.048806
            rho_12 |   .0809611   .0591579                     -.0355587    .1953093
      ------------------------------------------------------------------------------

      Comment


      • #4
        You have done great--yes, you should specify the equation for Treatment as a probit or logit, whatever options cmp gives you but respecting the binary nature of Treatment.

        Everything looks beautiful, the instrument has mighty explanatory power for Treatment. The only bummer is that your structural equation of interest shows Treatment insignificant.

        Comment


        • #5
          The insignificance of Treatment was expected! The results are also relatively similar to those from a "naive" 2SLS estimation.

          Thanks a lot for the friendly and helpful advice!

          Comment


          • #6
            I need to revive this thread for a question about the approach shown above; if you look at the way I generated the lower and upper bound of my outcome variable for interval regression:

            Code:
             
             . recode Outcome (1=.)(2=1)(3=2)(4=3)(5=4)(6=5)(7=6)(8=7)(9=8), gen(Outcome1) (2742 differences between Outcome and Outcome1)  . recode Outcome (9=.), gen(Outcome2) (12 differences between Outcome and Outcome2)
            I basically used the ordered category description, ranging from 1 to 9. The data is on wages and is measured in intervals, with left- and right-censoring (1 = < 500$, 2 = 500-600$, ..., 9 = > 1500$). Now when trying to interpret my results, I realized what I did seems to make no sense. As far as I understand, I rather need to create variables for the lower and upper bound of the interval for each observation with the concrete $-values of those bounds (the open ends indicated by missing values), which I tried now:

            Code:
            . recode Outcome (1=.)(2=500)(3=600)(4=700)(5=800)(6=900)(7=1000)(8=1200)(9=15
            > 00), gen(Outcome1)
            (2742 differences between Outcome and Outcome1)
            
            . 
            . recode Outcome (1=500)(2=600)(3=700)(4=800)(5=900)(6=1000)(7=1200)(8=1500)(9
            > =.), gen(Outcome2)
            (2742 differences between Outcome and Outcome2)
            
            . 
            . list Outcome Outcome1 Outcome2 in 1/10 
            
                 +-------------------------------+
                 | Outcome   Outcome1   Outcome2 |
                 |-------------------------------|
              1. |       1          .        500 |
              2. |       2        500        600 |
              3. |       3        600        700 |
              4. |       5        800        900 |
              5. |       3        600        700 |
                 |-------------------------------|
              6. |       5        800        900 |
              7. |       5        800        900 |
              8. |       5        800        900 |
              9. |       4        700        800 |
             10. |       4        700        800 |
                 +-------------------------------+

            Then I ran the cmp command with those outcome variables:

            Code:
            . cmp (Treatment = Instrument) (Outcome1 Outcome2 = Treatment), indicators($cm
            > p_probit $cmp_int) 
            
            Fitting individual models as starting point for full model fit.
            Note: For programming reasons, these initial estimates may deviate from your s
            > pecification.
                  For exact fits of each equation alone, run cmp separately on each.
            
            Iteration 0:   log likelihood = -1508.6496  
            Iteration 1:   log likelihood = -720.53165  
            Iteration 2:   log likelihood = -694.97762  
            Iteration 3:   log likelihood = -694.68111  
            Iteration 4:   log likelihood = -694.68095  
            Iteration 5:   log likelihood = -694.68095  
            
            Probit regression                               Number of obs     =      3,480
                                                            LR chi2(1)        =    1627.94
                                                            Prob > chi2       =     0.0000
            Log likelihood = -694.68095                     Pseudo R2         =     0.5395
            
            ------------------------------------------------------------------------------
               Treatment |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
              Instrument |   2.589436   .0759961    34.07   0.000     2.440486    2.738386
                   _cons |  -.4634362   .0481121    -9.63   0.000     -.557734   -.3691383
            ------------------------------------------------------------------------------
            
            Interval regression                             Number of obs     =      2,691
                                                               Uncensored     =          0
                                                               Left-censored  =        352
                                                               Right-censored =         12
                                                               Interval-cens. =      2,327
            
                                                            LR chi2(1)        =       1.39
            Log likelihood = -5365.2625                     Prob > chi2       =     0.2384
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
               Treatment |  -13.57253    11.5116    -1.18   0.238    -36.13485    8.989787
                   _cons |   749.6888    10.5316    71.18   0.000     729.0472    770.3303
            -------------+----------------------------------------------------------------
                /lnsigma |   5.373939   .0157219   341.81   0.000     5.343125    5.404753
            -------------+----------------------------------------------------------------
                   sigma |   215.7109   3.391395                      209.1652    222.4614
            ------------------------------------------------------------------------------
            
            Fitting constant-only model for LR test of overall model fit.
            
            Fitting full model.
            
            Iteration 0:   log likelihood = -6060.7041  
            Iteration 1:   log likelihood = -6059.3455  
            Iteration 2:   log likelihood = -6059.3428  
            Iteration 3:   log likelihood = -6059.3428  
            
            Mixed-process regression                        Number of obs     =      3,480
                                                            LR chi2(2)        =    1629.16
            Log likelihood = -6059.3428                     Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            Treatment    |
              Instrument |   2.589299   .0759653    34.09   0.000     2.440409    2.738188
                   _cons |  -.4642754   .0481119    -9.65   0.000     -.558573   -.3699778
            -------------+----------------------------------------------------------------
            Outcome1     |
               Treatment |  -24.61224   15.28673    -1.61   0.107    -54.57368    5.349192
                   _cons |   759.0972   13.57553    55.92   0.000     732.4897    785.7048
            -------------+----------------------------------------------------------------
                /lnsig_2 |   5.374208   .0157336   341.58   0.000      5.34337    5.405045
            /atanhrho_12 |   .0637478   .0581125     1.10   0.273    -.0501507    .1776463
            -------------+----------------------------------------------------------------
                   sig_2 |   215.7688   3.394816                      209.2167    222.5262
                  rho_12 |   .0636616    .057877                     -.0501087    .1758009
            ------------------------------------------------------------------------------

            Is this approach and execution correct now?

            If so, am I right to interpret the results as: A one unit increase in Treatment is associated with a 24.6$ decrease in wage?

            Comment

            Working...
            X