Dear Statalisters,
I'm looking to correct for selection bias using Heckman selection. However, I find the application non-trivial as the first stage is a panel and the second stage is a cross-section.
Given that my instrument is time invariant, I'm using the Wooldridge (1995) approach, running the probit for every year in the panel. I then include the obtained inverse Mills ratios and bootstrap over both stages following Semykina/Wooldridge (2010) to account for the generated regressor.
My data is a firm-year panel, where a firm may select into treatment and subsequently drops from the panel. I would like to include industry and country fixed effects (individual fixed effects are not feasible given I only have a cross-section in the second stage). Given the millions of entries in the panel, using probit with these fixed effects (as dummies) is rather slow. My approach would therefore be to use the Mundlak device in the first stage, demeaning each variable by industry and country, respectively. In the second-stage, I would simply use industry, country and year dummies. Is this approach viable?
Grateful for any input!
Kind regards
I'm looking to correct for selection bias using Heckman selection. However, I find the application non-trivial as the first stage is a panel and the second stage is a cross-section.
Given that my instrument is time invariant, I'm using the Wooldridge (1995) approach, running the probit for every year in the panel. I then include the obtained inverse Mills ratios and bootstrap over both stages following Semykina/Wooldridge (2010) to account for the generated regressor.
My data is a firm-year panel, where a firm may select into treatment and subsequently drops from the panel. I would like to include industry and country fixed effects (individual fixed effects are not feasible given I only have a cross-section in the second stage). Given the millions of entries in the panel, using probit with these fixed effects (as dummies) is rather slow. My approach would therefore be to use the Mundlak device in the first stage, demeaning each variable by industry and country, respectively. In the second-stage, I would simply use industry, country and year dummies. Is this approach viable?
Grateful for any input!
Kind regards