Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman selection with IV for panel data: Use of two separate inverse mills ratios in level (output) equation

    Dear Statalisters,

    I am trying to estimate how learning experience (denoted by the variable "HWB") affects task performance (denoted by the variable "performance"), which is a continuous variable. HWB is endogenous and I implement iv regression to deal with endogeneity concerns. I estimate IMR using a probit regression where the DV is "worked", which indicated whether the worker worked in that hourly slot or not. Then I use the IMR in the main equation to estimate the effect of HWB on performance as follows:

    xtset, clear

    capture program drop heckman

    program heckman, eclass
    sum worked
    probit worked avgcomp_last HWB controls1
    matrix b1=e(b)
    capture drop IMR
    predict IMR, score

    xtset courier_id
    xi: xtivreg2 performance controls1 controls2 IMR (HWB = HWB_lagday), fe
    matrix b2=e(b)
    matrix coleq b1 = choice
    matrix coleq b2 = level
    matrix b=b2,b1
    ereturn post b
    end

    bootstrap, reps(50) seed(12345) cluster(courier_id) idcluster(newid):heckman
    est sto m1

    However, my one of my DVs, "performance1" can only be observed when the variable "stockout_reqsub"==1. In short, there is another selection issue here. I cannot find any proper way to deal with this. My question is, should I include another probit regression:
    sum stockout_reqsub
    probit stockout_reqsub controls3
    matrix b3=e(b)
    capture drop IMR2
    predict IMR2, score

    and then in the final equation use both IMR (from the "worked" equation) and IMR2 (from the "stockout_reqsub" equation) in the final equation to perform the estimation?

    My dataset is as below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long order_id float worked double performance float stockout_reqsub long num_item double num_stockouts float(avg_numstockout_reqsub time_dum) byte day_of_week float(avgcomp_last HWB CSF_day precip_hourly)
          . 0 . 0  . .         0 . 2        6 0     9    0
          . 0 . 0  . .  6.333333 . 4      9.3 0     0    0
          . 0 . 0  . .       1.5 . 6     6.64 7 53.55    0
    6847080 1 3 1 16 4 2.3333333 2 5 8.592857 0   5.5    0
          . 0 . 0  . .  7.621622 . 5     7.88 5 45.55 .019
          . 0 . 0  . . 3.0714285 . 6   9.8125 6  50.3    0
          . 0 . 0  . .         0 . 6 8.700001 0  12.9    0
          . 0 . 0  . . 4.2222223 . 4    13.95 0   5.5    0
    5962762 1 0 1 74 1 2.7011495 4 5 8.525001 4 52.25    0
          . 0 . 0  . .  3.029412 . 0     6.64 0     0    0
    5775032 1 . 0 48 0 2.2173913 1 6 7.306667 0 11.64    0
    4603736 1 0 1 37 2 3.3809524 4 3 8.394285 2 29.55    0
          . 0 . 0  . .  1.728395 . 6        7 0     0 .002
          . 0 . 0  . .         0 . 1   6.8375 6 53.45    0
          . 0 . 0  . .         0 . 4     7.73 0  6.14    0
          . 0 . 0  . .  5.457627 . 5 6.897143 5 43.95    0
          . 0 . 0  . .         1 . 6     9.75 0     0    0
          . 0 . 0  . .       4.3 . 0      9.9 1 18.85    0
    6053104 1 2 1 15 2      2.75 2 6        8 0     6    0
          . 0 . 0  . . 2.2857144 . 5      5.5 0     0    0
    5102814 1 . 0  6 0     3.375 4 2      9.3 3 25.04    0
    end
    Last edited by Reeju Guha; 27 Mar 2023, 05:51.

  • #2
    Hi Statalisters,

    I am returning back to this question to see if anyone can suggest an approach. As described above, my goal is to estimate how learning experience (denoted by the variable "HWB") affects the service quality of the task, denoted by the variable, no. of items substituted when there is a stockout, and a customer requests a substitution (substituted_when_reqd).

    The 2 sources of selection are: 1) a substitution occurs only when there is a stockout in an order (i.e., has_stockout = 1/0), and 2) an order is delivered only if the worker choses to work in a given shift/hourly slot (i.e., worked = 1/0).

    The approach I took is described below:
    I took 2 separate IMRs: IMR and IMR2, where IMR describes whether an order has a stockout or not (1/0), and IMR2 describes whether the worker chose to work on that shift(slot) or not (1/0)
    My code is below:

    Code:
    xtset, clear
    
    capture program drop heckman1a
      
      program heckman1a, eclass
      preserve
         probit has_stockout avgstockout_other num_item i.time_dum i.day_of_week
         matrix b1=e(b)
         capture drop IMR
         predict IMR, score
    
         probit worked avgcomp_last HWB CSF_day CSF_week precip_hourly precip_day demand_cityslot supply_cityslot work_lag_day
         matrix b2=e(b)
         capture drop IMR2
         predict IMR2, score
         
         xtset courier_id
         xtreg HWB HWB_lagday ln_experience num_item ln_storefamiliarity i.day_of_week i.time_dum CSF_day CSF_week precip_hourly precip_day demand_cityslot supply_cityslot work_lag_day IMR IMR2, fe
         matrix b3=e(b)
         predict double resid1, e
         xtpoisson substituted_when_reqd HWB resid1 ln_experience num_item ln_storefamiliarity i.day_of_week i.time_dum CSF_day CSF_week precip_hourly precip_day demand_cityslot supply_cityslot work_lag_day IMR IMR2, fe 
         matrix b4=e(b)
         matrix coleq b1 = choice1
         matrix coleq b2 = choice2
         matrix coleq b3 = level-first
         matrix coleq b4 = level
         matrix b=b3,b4
         ereturn post b
     restore
     end
    
    bootstrap, reps(2) seed(12345) cluster(courier_id) idcluster(newid1):heckman1a
    est sto m1
    Please let me know if this approach is correct?

    Comment

    Working...
    X