Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit with panel data

    Hello Stata users,

    I am running a logit model with panel data (T=2, N=2256). Since the coefficient estimates from logit model are hard to understand and to interpret I am reporting marginal effect estimates that are easier to interpret. I want to take advantage of the panel dimension of my data by using fixed effect to control for time invariant individual characteristics. I have understood that a conditional FE logit model with individual fixed effects cannot provide marginal effects because the estimation procedure implies that we do not obtain estimates of the individual effect ci (ci is wiped out by the estimation procedure).
    One potential solution to this issue might be to use Random Effect model but the strict exogeneity and zero correlation assumptions are in my opinion too strong for my study.

    However, Professor Santos Silva has created a Stata command (aextlogit) that allows to estimate average semi-elasticity with respect to one specific covariate. When I implement this approach I lose 3/4 of my observations because of all positive or all negative outcomes. (By the way if anyone can enlighten me about what are average semi-elasticity).

    My question is thus do you think the fact that I lose so many of my observation induce biases in my estimation? And the second question is what other method can I implement to obtain marginal effects with a fixed effect logit estimation? There are many discussions on this topic but in my opinion many of us misunderstand what a logit model or conditional fixed logit model gives us as beta coefficient and interpret it wrong.

    Thank you for stopping by.

    Marcel Campion.

  • #2
    Marcel: I have two suggestions. First, use a linear model estimated by fixed effects. This often gives a good approximation to the average marginal effect from a nonlinear model. I have a paper that covers a special case here.

    Then I would use a probit correlated random effects approach. The Mundlak version usually works well, but Chamberlain can also be used. You just need to generate the time averages of all time-varying variables (except the time period dummy). Once you've done that, the average marginal effects follow easily. Pooled estimation works well, and seems to lose little in terms of efficiency. I talk about this approach in my MIT Press book, "Econometric Analysis of Cross Section and Panel Data," 2e, 2010, MIT Press.

    Generic code, where x1, ... xK change over time, d2 is the second period dummy, z1, ..., zM don't change:

    Code:
    xtset id year
    egen x1bar = mean(x1), by(id)
    ...
    egen xKbar = mean(xK), by(id)
    probit y x1 x2 ... xK z1 z2 ... zM x1bar ... xKbar d2, cluster(id)
    margins, dydx(x1 ... xK)

    Comment


    • #3
      Jeff: should the means computed in x*bar be restricted to those observations used in the model, i.e. are not excluded due to missing values on some other variable?

      One way to do that would be:

      Code:
      // we are not interested in this model
      // we just want to find out which observations will be used
      qui : probit y x1 x2 ... xK z1 z2 ... zM d2
      
      // store which observations will be used in variable touse (read: to use)
      gen touse = e(sample)
      
      // do the computations on only those observations that will be used in the model
      egen x1bar = mean(x1) if touse, by(id)
      ...
      egen xKbar = mean(xK) if touse, by(id)
      
      // continue as in #2
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Maarten: Definitely! Thanks for that. I assumed from the description of the data that the panel is balanced, but I see it is not explicitly stated that it is.

        I have a paper where I consider cases where one might want the mean and the variance of the heterogeneity to depend on the number of time periods observed for each unit, in which case one can add dummy variables for the number of time periods, and perhaps even use -hetprobit-.

        Comment


        • #5
          Dear Marcel,

          To answer your questions:

          1 - The average semi-elasticity is exactly that: the sample average of the individual semi-elasticities (if you do not know what is a semi-elasticity, please check a textbook).

          2 - Dropping those observations does not cause bias; they are dropped because they contain no information about the parameters of interest.

          3 - I am not aware of any other method.

          Best wishes,

          Joao

          Comment


          • #6
            Dear Joao, Jeff and Marteen,

            Thank you for your contributions. It is actually working very well (I will soon do a recap of what I find with the two strategies so people can have their own opinion).
            However I have a question about whether it is possible to instrument one endogenous variable?

            Comment


            • #7
              At the moment I have thought of a strategy based on the prediction from OLS regression.
              My endogenous variable is x2 in the above model described by Jeff.
              first predict
              reg x2 h1 h2 h3
              predict X2, xb

              then introduce X2 into the model
              probit y x1 X2 x3 z1 z2 z3 x1bar X2bar x3bar... d2, cluster (hid)

              I think it should yield unbiased estimates unless I miss understand something.

              kind regards
              Marcel

              Comment


              • #8
                Dear Marcel,

                Your probit equation is an example of a "forbidden regression"; so that won't work.

                Best wishes,

                Joao

                Comment


                • #9
                  Jeff: you have used meap94_98 to explain how to deal with unbalanced panel data. could you tell me from where I can get this data file or explain how to go with unbalanced datasets with Stata command.

                  Comment

                  Working...
                  X