Hello!
I am modeling predictive factors that affect aid distribution patterns. I have a cross-sectional time series dataset. Explanatory variables, with the exception of two, are largely time variant. The dependent variable is heavily zero-inflated, which means that traditional OLS will not work. I am thus left with several options, including Tobit, Heckman, two-part, and Poisson Pseudo Maximum Likelyhood. I largely lean towards the PPML model in this context
I also follow the theoretical justification that aid a two part selection (where) then allocation (how much) process. The Heckman model is largely unsuitable as there is no independence between the two equations, and I estimate the two models with the same set of covariates which means the identification rests solely on the nonlinerarity of the IMR.
Thus, can someone explain to me why, or recommend some papers on this topic, as to why using a two-part Probit-PPML structure (and retaining all zero values in the second allocation step) would be preferred to using a traditional Logit-OLS structure (and modeling only the positive outcomes in the second allocation step). It could also be really helpful if you could include other statistical diagnostics I could conduct to test for model robustness (e.g. between using fixed and random effects, heteroscedasticity, null test in this regard, etc.) I understand how to do this with traditional OLS, but am unsure in the context of Poisson distribution.
Thank you so much for your help!
Kind regards,
Andy
My code:
*ppml
foreach dv in odaLike oofLike total {
ppmlhdfe `dv' ln_ungaVoting taiwan ln_oresMetalsReal ln_mineralProduction lag_democracy lag_corruptionControl lag_polStability lag_ln_debtGDP lag_ln_gdpCapita lag_ln_population, absorb(year) vce(robust)
matrix b = e(b)
di "Incidence Rate Ratios for model with dependent variable `dv':"
foreach var in ln_ungaVoting taiwan ln_oresMetalsReal ln_mineralProduction lag_democracy lag_corruptionControl lag_polStability lag_ln_debtGDP lag_ln_gdpCapita lag_ln_population {
scalar irr_`var' = exp(b[1,"`var'"])
di "`var': " irr_`var'
}
}
*probit
foreach dv in odaLike oofLike total {
gen `dv'_binary = `dv' > 0
}
foreach dv in odaLike_binary oofLike_binary total_binary {
xtprobit `dv' ln_ungaVoting taiwan ln_oresMetalsReal ln_mineralProduction lag_democracy lag_corruptionControl lag_polStability lag_ln_debtGDP lag_ln_gdpCapita lag_ln_population i.year
}
I am modeling predictive factors that affect aid distribution patterns. I have a cross-sectional time series dataset. Explanatory variables, with the exception of two, are largely time variant. The dependent variable is heavily zero-inflated, which means that traditional OLS will not work. I am thus left with several options, including Tobit, Heckman, two-part, and Poisson Pseudo Maximum Likelyhood. I largely lean towards the PPML model in this context
I also follow the theoretical justification that aid a two part selection (where) then allocation (how much) process. The Heckman model is largely unsuitable as there is no independence between the two equations, and I estimate the two models with the same set of covariates which means the identification rests solely on the nonlinerarity of the IMR.
Thus, can someone explain to me why, or recommend some papers on this topic, as to why using a two-part Probit-PPML structure (and retaining all zero values in the second allocation step) would be preferred to using a traditional Logit-OLS structure (and modeling only the positive outcomes in the second allocation step). It could also be really helpful if you could include other statistical diagnostics I could conduct to test for model robustness (e.g. between using fixed and random effects, heteroscedasticity, null test in this regard, etc.) I understand how to do this with traditional OLS, but am unsure in the context of Poisson distribution.
Thank you so much for your help!
Kind regards,
Andy
My code:
*ppml
foreach dv in odaLike oofLike total {
ppmlhdfe `dv' ln_ungaVoting taiwan ln_oresMetalsReal ln_mineralProduction lag_democracy lag_corruptionControl lag_polStability lag_ln_debtGDP lag_ln_gdpCapita lag_ln_population, absorb(year) vce(robust)
matrix b = e(b)
di "Incidence Rate Ratios for model with dependent variable `dv':"
foreach var in ln_ungaVoting taiwan ln_oresMetalsReal ln_mineralProduction lag_democracy lag_corruptionControl lag_polStability lag_ln_debtGDP lag_ln_gdpCapita lag_ln_population {
scalar irr_`var' = exp(b[1,"`var'"])
di "`var': " irr_`var'
}
}
*probit
foreach dv in odaLike oofLike total {
gen `dv'_binary = `dv' > 0
}
foreach dv in odaLike_binary oofLike_binary total_binary {
xtprobit `dv' ln_ungaVoting taiwan ln_oresMetalsReal ln_mineralProduction lag_democracy lag_corruptionControl lag_polStability lag_ln_debtGDP lag_ln_gdpCapita lag_ln_population i.year
}
Comment