Hi all,
More of a conceptual, econometric question than a data-driven question.
I have a panel dataset, and want to run regressions with the total number of new job positions for the next recruitment period (the next month) as dependent variable. The dependent variable is survey respondent, region and month-specific, and is a count variable.
The dependent variable itself is a rowtotal of vacant and filled positions.
The issue is that in our survey, the question producing the two abovementioned variable was only filled out by survey respondents who wished to recruit additional workers, and it was an optional question. We therefore have a lot of missings in this variable, which were recoded as zeros.
This configuration looks like a classic Heckman situation, with the observability of a strictly positive value for the dependent variable being endogenous and a function of selection.
My question is:
- What would be the consequence of removing the zeros and running either nonlinear or linear models with multiple fixed-effect vectors on the trimmed sample?
I know that ignoring the problem of selection and running OLS on the entire sample yields bias (e.g. Johnston and DiNardo, 1997). However, what happens if we only consider the strictly positive subset of the data? I have been unable to find literature on this topic...
More of a conceptual, econometric question than a data-driven question.
I have a panel dataset, and want to run regressions with the total number of new job positions for the next recruitment period (the next month) as dependent variable. The dependent variable is survey respondent, region and month-specific, and is a count variable.
The dependent variable itself is a rowtotal of vacant and filled positions.
The issue is that in our survey, the question producing the two abovementioned variable was only filled out by survey respondents who wished to recruit additional workers, and it was an optional question. We therefore have a lot of missings in this variable, which were recoded as zeros.
This configuration looks like a classic Heckman situation, with the observability of a strictly positive value for the dependent variable being endogenous and a function of selection.
My question is:
- What would be the consequence of removing the zeros and running either nonlinear or linear models with multiple fixed-effect vectors on the trimmed sample?
I know that ignoring the problem of selection and running OLS on the entire sample yields bias (e.g. Johnston and DiNardo, 1997). However, what happens if we only consider the strictly positive subset of the data? I have been unable to find literature on this topic...

Comment