Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrument for attrition to use in Heckman selection model

    Hi,

    I am estimating a model of transitions from low to higher pay. I have a five-year unbalanced panel of working-age individuals, who can at any point be low-paid, higher-paid, unemployed, self-employed or economically inactive. As I am concerned about the possibility that my results are affected by non-random attrition between t-1 and t, I would like to correct for possible attrition bias by using a Heckman selection model, for which I will need to find a variable that can serve as an instrument for attrition in the selection equation. Several papers I've looked at that deal with this issue have used information about the interviewer (e.g. the interviewer ID, or whether there was a change in interviewer between t-1 and t)*. But since this information is not available within the dataset I'm using, I have been trying to think of alternative instruments. One option which I thought of is to use the total number of waves over which an individual is observed within my panel as an instrument. This can range from 2 up to 5 waves (because of course for individuals who are only observed once no pay transition can be observed so they do not form part of the regression sample) and I figured that the length of time that individuals spend in the panel is likely to be correlated with attrition, but not with the likelihood of them making a transition from low to higher pay.

    However, as I am a novice at this type of econometric analysis and have not come across any other examples of researchers using such a variable as an instrument for attrition in the literature, I thought it would be worth checking with a more knowledgeable and experienced audience whether this would indeed make a good instrument, or whether there are any potential issues with using this as an instrument which I have overlooked?

    I have done some preliminary analysis to check that a) the instrument is significantly correlated with panel retention, and b) that it is not correlated (when controlling for the other variables in the model) with my dependent variable of interest. First of all I defined my instrument as follows:

    Code:
    by pidp: gen nwaves = _N
    tab nwaves if escape < .
    
       nwaves    Freq.    Percent    Cum.
                
    2      676    5.81     5.81
    3    1,037    8.91     14.72
    4    2,153    18.50    33.21
    5    7,774    66.79    100.00
                
    Total  11,640    100.00
    Below are the results from a regression of panel retention on the instrument plus the other variables in the main model:

    Code:
    . probit fretain nwaves explanvar1 explanvar2 $controls2 if lpay1    ==    1,    nolog
    
    Probit regression                                 Number of obs   =      10012
    LR chi2(46)     =    6630.25
    Prob > chi2     =     0.0000
    Log likelihood = -2951.3742                       Pseudo R2       =     0.5290
    
    
    fretain       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]
    
    nwaves    1.183465   .0261423    45.27   0.000     1.132227    1.234703
                  
    [long list of other variables which I'm not showing here in the interest of brevity]
                  
    _cons   -2.725012   .2327342   -11.71   0.000    -3.181163   -2.268862
    And here are the results from a regression of my dependent variable on the instrument plus all the other variables:

    Code:
    . probit fescape nwaves explanvar1 explanvar2 $controls2 if lpay1    ==    1,    nolog
    
    Probit regression                                 Number of obs   =       6632
    LR chi2(46)     =     525.35
    Prob > chi2     =     0.0000
    Log likelihood = -3505.7619                       Pseudo R2       =     0.0697
    
    
    fescape       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]
    
    nwaves   -.0128535   .0310014    -0.41   0.678    -.0736151    .0479082
                  
    [long list of other variables which I'm not showing here in the interest of brevity]
                  
    _cons   -.6540679   .2493312    -2.62   0.009    -1.142748   -.1653878
    To me this suggests that the instrument meets the necessary requirements: highly significantly correlated with panel retention but not with the probability of transitioning from low to higher pay. Am I justified in concluding on the basis of these results that this instrument is suitable? Or is there any additional tests I need to perform?

    Many thanks,

    Sanne

    *see Cappellari, L., & Jenkins, S. P. (2004). Modelling Low Pay Transition Probabilities, Accounting for Panel Attrition, Non-Response, and Initial Conditions. CESifo Working Paper Series No. 1232. & Cheng, T., & Trivedi, P. (2014) Attrition Bias in Panel Data: A Sheep in Wolf's Clothing? A Case Study Based on the MABEL Survey. HEDG Working Paper 14/04.
Working...
X