Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel probit (pooled) with cross-sectional selection equation

    I'm estimating a panel probit in a pooled way (just stacking the observations - no need to correct for the panel structure right now). I have n observations for every subject. This works fine of course.

    The thing is: these subjects are not randomly selected, there's a selection process. I found many papers discussing (panel) probit with selection, but none of these discuss a cross-sectional selection equation... In each and every paper there are as many observations for the selection equation as for the equation of interest.

    I'm thinking of estimating a multivariate probit with the selection equation as one equation and and a equation for every time period, but can't figure out how the variance-covariance matrix would look like in such a case. Can someone give me a hint? Or do you know other solutions for this particular issue? I prefer full information maximum likelihood compared to a two-step approach (which is simple of course).

  • #2
    This is an interesting model to estimate, and I've never estimated it but have given it considerable thought. Since you say that you want to run a selection equation for each year: given that an observation has been selected into the sample the first year, is that observation always observed in the sample (i.e. are panels balanced)? If they are, the selection happens in the first year, and that is when you have to estimate the selection equation. You say that you have n observations for every subject, so that means they're balanced. Notice then, that once the observation has been selected into the sample (in the first year), the probability of it being selected in the future years is 1. If you use other years, you may have biased estimators for the coefficients. For example, assume that age is one of the explanatory variables for the selection process, and that the coefficient on age is positive. That means that a given observation the second year would have a higher probability of being selected than the first year. This is not possible, however, since the probability would be the same.
    Alfonso Sanchez-Penalver

    Comment


    • #3
      Originally posted by Alfonso Sánchez-Peñalver View Post
      This is an interesting model to estimate, and I've never estimated it but have given it considerable thought. Since you say that you want to run a selection equation for each year: given that an observation has been selected into the sample the first year, is that observation always observed in the sample (i.e. are panels balanced)? If they are, the selection happens in the first year, and that is when you have to estimate the selection equation. You say that you have n observations for every subject, so that means they're balanced. Notice then, that once the observation has been selected into the sample (in the first year), the probability of it being selected in the future years is 1. If you use other years, you may have biased estimators for the coefficients. For example, assume that age is one of the explanatory variables for the selection process, and that the coefficient on age is positive. That means that a given observation the second year would have a higher probability of being selected than the first year. This is not possible, however, since the probability would be the same.
      Thank you very much for your elaborate answer, Alfonso. The panel is balanced indeed. You're right that since selection happens in the first year, the probability of someone being selected in the years thereafter is 1. If I understand it correctly, it means the probability of the first observation is a bivariate one, and the probability of the others is univariate. Programming this might be doable.

      Comment


      • #4
        How are you going to calculate it? Probit first and then OLS? What it means is that for the first year observations you estimate the probit. Then for all the years you have that the probability of being selected is that for the first year. So programming full maximum likelihood may be tricky, because you still have to consider that first year probability and the correlation with the residuals of the first year, for all the different years. If you go the two-step way it's easier. You still have to use the mills ratio from the first year selection model in the rest of the years.
        Last edited by Alfonso Sánchez-Peñalver; 11 Sep 2016, 07:04.
        Alfonso Sanchez-Penalver

        Comment

        Working...
        X