Heckman selection with IV for panel data: Use of two separate inverse mills ratios in level (output) equation

Reeju Guha

Join Date: May 2021

Posts: 14
#1

Heckman selection with IV for panel data: Use of two separate inverse mills ratios in level (output) equation

27 Mar 2023, 05:35

Dear Statalisters,

I am trying to estimate how learning experience (denoted by the variable "HWB") affects task performance (denoted by the variable "performance"), which is a continuous variable. HWB is endogenous and I implement iv regression to deal with endogeneity concerns. I estimate IMR using a probit regression where the DV is "worked", which indicated whether the worker worked in that hourly slot or not. Then I use the IMR in the main equation to estimate the effect of HWB on performance as follows:

xtset, clear

capture program drop heckman

program heckman, eclass
sum worked
probit worked avgcomp_last HWB controls1
matrix b1=e(b)
capture drop IMR
predict IMR, score

xtset courier_id
xi: xtivreg2 performance controls1 controls2 IMR (HWB = HWB_lagday), fe
matrix b2=e(b)
matrix coleq b1 = choice
matrix coleq b2 = level
matrix b=b2,b1
ereturn post b
end

bootstrap, reps(50) seed(12345) cluster(courier_id) idcluster(newid):heckman
est sto m1

However, my one of my DVs, "performance1" can only be observed when the variable "stockout_reqsub"==1. In short, there is another selection issue here. I cannot find any proper way to deal with this. My question is, should I include another probit regression:
sum stockout_reqsub
probit stockout_reqsub controls3
matrix b3=e(b)
capture drop IMR2
predict IMR2, score

and then in the final equation use both IMR (from the "worked" equation) and IMR2 (from the "stockout_reqsub" equation) in the final equation to perform the estimation?

My dataset is as below:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long order_id float worked double performance float stockout_reqsub long num_item double num_stockouts float(avg_numstockout_reqsub time_dum) byte day_of_week float(avgcomp_last HWB CSF_day precip_hourly) . 0 . 0 . . 0 . 2 6 0 9 0 . 0 . 0 . . 6.333333 . 4 9.3 0 0 0 . 0 . 0 . . 1.5 . 6 6.64 7 53.55 0 6847080 1 3 1 16 4 2.3333333 2 5 8.592857 0 5.5 0 . 0 . 0 . . 7.621622 . 5 7.88 5 45.55 .019 . 0 . 0 . . 3.0714285 . 6 9.8125 6 50.3 0 . 0 . 0 . . 0 . 6 8.700001 0 12.9 0 . 0 . 0 . . 4.2222223 . 4 13.95 0 5.5 0 5962762 1 0 1 74 1 2.7011495 4 5 8.525001 4 52.25 0 . 0 . 0 . . 3.029412 . 0 6.64 0 0 0 5775032 1 . 0 48 0 2.2173913 1 6 7.306667 0 11.64 0 4603736 1 0 1 37 2 3.3809524 4 3 8.394285 2 29.55 0 . 0 . 0 . . 1.728395 . 6 7 0 0 .002 . 0 . 0 . . 0 . 1 6.8375 6 53.45 0 . 0 . 0 . . 0 . 4 7.73 0 6.14 0 . 0 . 0 . . 5.457627 . 5 6.897143 5 43.95 0 . 0 . 0 . . 1 . 6 9.75 0 0 0 . 0 . 0 . . 4.3 . 0 9.9 1 18.85 0 6053104 1 2 1 15 2 2.75 2 6 8 0 6 0 . 0 . 0 . . 2.2857144 . 5 5.5 0 0 0 5102814 1 . 0 6 0 3.375 4 2 9.3 3 25.04 0 end

Last edited by Reeju Guha; 27 Mar 2023, 05:51.
Tags: None

Reeju Guha

Join Date: May 2021
Posts: 14

25 Apr 2023, 03:07

Hi Statalisters,

I am returning back to this question to see if anyone can suggest an approach. As described above, my goal is to estimate how learning experience (denoted by the variable "HWB") affects the service quality of the task, denoted by the variable, no. of items substituted when there is a stockout, and a customer requests a substitution (substituted_when_reqd).

The 2 sources of selection are: 1) a substitution occurs only when there is a stockout in an order (i.e., has_stockout = 1/0), and 2) an order is delivered only if the worker choses to work in a given shift/hourly slot (i.e., worked = 1/0).

The approach I took is described below:
I took 2 separate IMRs: IMR and IMR2, where IMR describes whether an order has a stockout or not (1/0), and IMR2 describes whether the worker chose to work on that shift(slot) or not (1/0)
My code is below:

Code:

xtset, clear

capture program drop heckman1a
  
  program heckman1a, eclass
  preserve
     probit has_stockout avgstockout_other num_item i.time_dum i.day_of_week
     matrix b1=e(b)
     capture drop IMR
     predict IMR, score

     probit worked avgcomp_last HWB CSF_day CSF_week precip_hourly precip_day demand_cityslot supply_cityslot work_lag_day
     matrix b2=e(b)
     capture drop IMR2
     predict IMR2, score
     
     xtset courier_id
     xtreg HWB HWB_lagday ln_experience num_item ln_storefamiliarity i.day_of_week i.time_dum CSF_day CSF_week precip_hourly precip_day demand_cityslot supply_cityslot work_lag_day IMR IMR2, fe
     matrix b3=e(b)
     predict double resid1, e
     xtpoisson substituted_when_reqd HWB resid1 ln_experience num_item ln_storefamiliarity i.day_of_week i.time_dum CSF_day CSF_week precip_hourly precip_day demand_cityslot supply_cityslot work_lag_day IMR IMR2, fe 
     matrix b4=e(b)
     matrix coleq b1 = choice1
     matrix coleq b2 = choice2
     matrix coleq b3 = level-first
     matrix coleq b4 = level
     matrix b=b3,b4
     ereturn post b
 restore
 end

bootstrap, reps(2) seed(12345) cluster(courier_id) idcluster(newid1):heckman1a
est sto m1

Please let me know if this approach is correct?

Announcement

Heckman selection with IV for panel data: Use of two separate inverse mills ratios in level (output) equation

Comment