Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Incidental Parameters Problem in Repeated Cross-Sectional Data

    Hi all,

    More of a methodological / econometric question than a Stata, technical question.

    I have repeated cross-sectional data; thousands of different apprentices enroling into apprenticeships at different start dates. So therefore there is a time component, and a cross-sectional component. However, each apprentice is observed only once.

    My dependent variable is binary, and my regressor of interest is as well. My regressor of interest is CB.

    I can include three vectors of fixed effects in my estimation: month, industry and state.

    Code:
    input float(y CB Industry) str2 progState float month
    0 0 . "AK"  1
    0 0 . "AL"  1
    0 0 . "FL" 11
    1 0 . "FL"  5
    0 0 . "FL" 11
    0 0 . "FL"  2
    0 0 . "FL"  8
    1 0 . "FL" 11
    0 0 . "FL" 11
    1 0 . "FL"  5
    0 0 . "FL"  8
    0 0 . "FL"  1
    0 0 . "FL" 11
    1 0 . "FL"  3
    0 0 . "FL"  6
    0 0 . "FL"  9
    1 0 . "FL"  8
    0 0 . "FL"  9
    0 0 . "FL"  9
    1 0 . "FL"  8
    1 0 . "FL"  9
    0 0 . "FL"  6
    1 0 . "FL"  2
    1 0 . "FL"  2
    1 0 . "FL"  2
    0 0 . "FL"  2
    0 0 . "FL"  2
    0 0 . "FL"  2
    0 0 . "FL"  2
    0 0 . "FL"  9
    I am well aware of all the literature on the debate between linear probability model versus logit and probit (Wooldridge, 2010, Lewbel et al., 2012, Angrist and Pischke, 2009, Maddala, 1985, Long, 1997, etc.). However, my question concerns the incidental parameters problem, and whether it would apply here, given that I do not have panel data.

    I understand that the incidental parameter problem arises from the fact that the dimensions of certain of certain parameters increase with sample size (e.g. fixed effects), and we only have a fixed number of time periods T to estimate each unit FE, and conversely for time FE.

    So I guess the question is, does the incidental parameters problem also occur to repeated cross-sectional data?

    I am also given to understand that the incidental parameters problem arises solely in nonlinear models, correct? i.e. models in which fixed-effects don't get averaged out, but in which the log-likelihood function is maximised over each parameter, is biased, and then this bias propagates to the estimation of other parameters.

    Bottom line: is it possible for LPM in this case (with repeated cross-sections) to suffer from the incidental parameters problem? Will a logit model suffer from it here?
    Last edited by Maxence Morlet; 29 Jun 2022, 12:53.

  • #2
    Sounds like a job for LASSO

    Comment


    • #3
      Looks like a great paper, I'll give it a read!

      Comment


      • #4
        Dear Maxence Morlet,

        I do not think you have an IPP in this case because you can increase your sample without increasing the number of parameters.

        Best wishes,

        Joao

        Comment


        • #5
          Hi Maxence
          I believe the incidental parameter problem exists for two reasons.
          on the one hand, as you describe, the problem arises because the number of estimated parameters (FE) grows almost at the same rate as observations in the sample. Thus, you have only few observations to identify the specific FE,
          Because of this, incidental parameter problem may occur whenever you have too many FE to be estimated, in relation to the sample size.
          Now, this is not a problem for linear regressions, nor for poison regressions (that is why you have reghdfe and pmfhdfe ( i think that is the name) . However when you try to estimate nonlinear models, the this problem arises and affects the identification of all other coefficients. This includes quantile regression models, logit/probit models, etc.
          So, LPM shouldn't not suffer from this problem because it relies on the LR properties, however logit will, if you have too many FE
          HTH

          Comment


          • #6
            Yep LPM will work (whether it's advisable is another matter), that's why I suggested the LASSO logit to do some shrinkage of the nuisance coefficients

            Comment


            • #7
              Thanks a lot for all your insights!

              Comment


              • #8
                One needs to know how many observations are available per fixed effect. If you put in industry FEs and only a few firms per industry then nonlinear models generally suffer.

                Comment


                • #9
                  Hi all,

                  Apologies for reviving this thread.

                  Thank you very much for your helpful insights! Jeff Wooldridge FernandoRios Joao Santos Silva Jared Greathouse

                  I have two questions on this topic:

                  - Suppose I run a "pooled" logit for robustness (as pooled models do not restrict serial correlation in the error term) but I misspecify the model. Is it the case that the only way to misspecify a logit model is to misspecify the mean equation, meaning that any misspecification = inconsistency? Or could one resort to Gourieroux, Monfort and Trognon (1984) claiming that the logit belongs to the linear exponential family, so is robust to misspecification as long as it is not misspecification of the marginal probability?


                  - Is it possible to circumvent the IPP by including the means of the regressors in addition to the regressors themselves, to capture time-invariant heterogeneity? I am aware of this method in panel data, however would it be feasible / desirable at all in this case (repeated cross section varying over state, month and industry)?


                  Many thanks in advance for your response!

                  Comment

                  Working...
                  X