Hi,
I am estimating a model of transitions from low to higher pay. I have a five-year unbalanced panel of working-age individuals, who can at any point be low-paid, higher-paid, unemployed, self-employed or economically inactive. As I am concerned about the possibility that my results are affected by non-random attrition between t-1 and t, I would like to correct for possible attrition bias by using a Heckman selection model, for which I will need to find a variable that can serve as an instrument for attrition in the selection equation. Several papers I've looked at that deal with this issue have used information about the interviewer (e.g. the interviewer ID, or whether there was a change in interviewer between t-1 and t)*. But since this information is not available within the dataset I'm using, I have been trying to think of alternative instruments. One option which I thought of is to use the total number of waves over which an individual is observed within my panel as an instrument. This can range from 2 up to 5 waves (because of course for individuals who are only observed once no pay transition can be observed so they do not form part of the regression sample) and I figured that the length of time that individuals spend in the panel is likely to be correlated with attrition, but not with the likelihood of them making a transition from low to higher pay.
However, as I am a novice at this type of econometric analysis and have not come across any other examples of researchers using such a variable as an instrument for attrition in the literature, I thought it would be worth checking with a more knowledgeable and experienced audience whether this would indeed make a good instrument, or whether there are any potential issues with using this as an instrument which I have overlooked?
I have done some preliminary analysis to check that a) the instrument is significantly correlated with panel retention, and b) that it is not correlated (when controlling for the other variables in the model) with my dependent variable of interest. First of all I defined my instrument as follows:
Below are the results from a regression of panel retention on the instrument plus the other variables in the main model:
And here are the results from a regression of my dependent variable on the instrument plus all the other variables:
To me this suggests that the instrument meets the necessary requirements: highly significantly correlated with panel retention but not with the probability of transitioning from low to higher pay. Am I justified in concluding on the basis of these results that this instrument is suitable? Or is there any additional tests I need to perform?
Many thanks,
Sanne
*see Cappellari, L., & Jenkins, S. P. (2004). Modelling Low Pay Transition Probabilities, Accounting for Panel Attrition, Non-Response, and Initial Conditions. CESifo Working Paper Series No. 1232. & Cheng, T., & Trivedi, P. (2014) Attrition Bias in Panel Data: A Sheep in Wolf's Clothing? A Case Study Based on the MABEL Survey. HEDG Working Paper 14/04.
I am estimating a model of transitions from low to higher pay. I have a five-year unbalanced panel of working-age individuals, who can at any point be low-paid, higher-paid, unemployed, self-employed or economically inactive. As I am concerned about the possibility that my results are affected by non-random attrition between t-1 and t, I would like to correct for possible attrition bias by using a Heckman selection model, for which I will need to find a variable that can serve as an instrument for attrition in the selection equation. Several papers I've looked at that deal with this issue have used information about the interviewer (e.g. the interviewer ID, or whether there was a change in interviewer between t-1 and t)*. But since this information is not available within the dataset I'm using, I have been trying to think of alternative instruments. One option which I thought of is to use the total number of waves over which an individual is observed within my panel as an instrument. This can range from 2 up to 5 waves (because of course for individuals who are only observed once no pay transition can be observed so they do not form part of the regression sample) and I figured that the length of time that individuals spend in the panel is likely to be correlated with attrition, but not with the likelihood of them making a transition from low to higher pay.
However, as I am a novice at this type of econometric analysis and have not come across any other examples of researchers using such a variable as an instrument for attrition in the literature, I thought it would be worth checking with a more knowledgeable and experienced audience whether this would indeed make a good instrument, or whether there are any potential issues with using this as an instrument which I have overlooked?
I have done some preliminary analysis to check that a) the instrument is significantly correlated with panel retention, and b) that it is not correlated (when controlling for the other variables in the model) with my dependent variable of interest. First of all I defined my instrument as follows:
Code:
by pidp: gen nwaves = _N
tab nwaves if escape < .
nwaves Freq. Percent Cum.
2 676 5.81 5.81
3 1,037 8.91 14.72
4 2,153 18.50 33.21
5 7,774 66.79 100.00
Total 11,640 100.00
Code:
. probit fretain nwaves explanvar1 explanvar2 $controls2 if lpay1 == 1, nolog
Probit regression Number of obs = 10012
LR chi2(46) = 6630.25
Prob > chi2 = 0.0000
Log likelihood = -2951.3742 Pseudo R2 = 0.5290
fretain Coef. Std. Err. z P>z [95% Conf. Interval]
nwaves 1.183465 .0261423 45.27 0.000 1.132227 1.234703
[long list of other variables which I'm not showing here in the interest of brevity]
_cons -2.725012 .2327342 -11.71 0.000 -3.181163 -2.268862
Code:
. probit fescape nwaves explanvar1 explanvar2 $controls2 if lpay1 == 1, nolog
Probit regression Number of obs = 6632
LR chi2(46) = 525.35
Prob > chi2 = 0.0000
Log likelihood = -3505.7619 Pseudo R2 = 0.0697
fescape Coef. Std. Err. z P>z [95% Conf. Interval]
nwaves -.0128535 .0310014 -0.41 0.678 -.0736151 .0479082
[long list of other variables which I'm not showing here in the interest of brevity]
_cons -.6540679 .2493312 -2.62 0.009 -1.142748 -.1653878
Many thanks,
Sanne
*see Cappellari, L., & Jenkins, S. P. (2004). Modelling Low Pay Transition Probabilities, Accounting for Panel Attrition, Non-Response, and Initial Conditions. CESifo Working Paper Series No. 1232. & Cheng, T., & Trivedi, P. (2014) Attrition Bias in Panel Data: A Sheep in Wolf's Clothing? A Case Study Based on the MABEL Survey. HEDG Working Paper 14/04.
