Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unconditional logit fixed effects using dummies

    I'm estimating a logistic regression model with individuals (n = 800) clustered within countries (k = 6) over several time periods (p = 3). I want to control for time-invariant country-level effects (country fixed effects). I'd like to report predicted unconditional probabilities, and possibly marginal effects to my audience. Because of this, I'm fitting an unconditional likelihood model using 5 country dummies. I know the usual advice in this situation is to use a conditional likelihood model (clogit), because of incidental paramater bias. But I've also read that providing the number of observations in each group (n) is large and the number of clusters (k) small, then unconditional estimates will be unbiased.

    For each country, I have a fairly large number of observations (n = 75 to 200). So I compared a model fitted with unconditional likelihood using country dummies to a model fitted with clogit. The estimates and standard errors are identical down to three decimal places, leading me to think that I'm on safe ground using the unconditional model.

    However, I'd really like to be able to cite a paper that talks about how large the number of observations need to be within clusters for bias to be negligable. Does anyone have any suggestions?



    So far, I've found:

    The Stata docs for clogit state that, p 14-15: "Let i = 1,2,...,n denote the groups and let t = 1,2,...,Ti denote the observations for the ith group... If Ti is large for all groups, the bias of the unconditional fixed-effects estimator is not a concern, and we can confidently use logit with an indicator variable for each group"

    Which seems to indicate that if the number of observations within clusters (Ti) is large, then estimates are unbiased. But, the docs don't cite a source for this.


    However, in Katz (2001) Bias in Conditional and Unconditional Fixed Effects Logit Estimation. Political Analysis 9(4):379-384. On p 379-380: "We observe N units for T time periods and that at each observation we record whether an event occurs... The unconditional maximum-likelihood estimator of the incidental parameters is consistent as T → ∞ for fixed N but inconsistent as N → ∞ for fixed T . The inconsistency arises because the number of incidental parameters increases without bound, while the amount of information about each incidental parameter remains fixed"

    The author states that the unconditional model is inconsistent as N goes to infinity, which seems to contradict the Stata doc and my empirical finding.

  • #2
    I think you have to consider this in the context of what kinds of inferences you wish to make from your analysis. The key point is that as you increase the number of indicator ("dummy") variables used to represent fixed effects (other than the main fixed effect handled by -xt-) in your model, the unconditional estimator is not consistent.

    If you do not wish to draw conclusions that generalize beyond the 6 particular countries and 3 particular time periods in your model, then you have a fixed number of such variables, and there is simply no concern about what might happen with a larger number of them. So the asymptotic behavior as the number of countries or time periods goes to infinity is irrelevant to your inferences. You are concerned only about making generalizations about individuals in these 6 countries at these 3 time periods.

    If, however, you do wish to draw conclusions that generalize to some large universe of countries and time periods, of which 6 and 3 (respectively) are sampled in your data, then the asymptotic behavior of the estimator is important--and it is not consistent. So you have a problem from this perspective. But, do you really want to generalize about countries based on a sample of 6, or time periods based on a sample of 3? Such generalizations would be weak at best, even with an ideal estimator. So if this is your goal, I would say that your data are not terribly well suited to the purpose, and the concerns about conditional vs unconditional likelihood are a minor problem compared to that.

    Comment


    • #3
      Thank you Clyde, this is an enormously helpful and clear explanation. I certainly do not want to generalize beyond the 6 countries and 3 time points in the sample - I wouldn't have much confidence in doing so with such a small sample. It's true, I have a fixed number of countries (k) and time periods (p), so that seems fine. One thing I'm still unsure about though is the number of individuals (n) needed within each country and time period to get unbiased estimates.

      Again, from what I've read, It seems that if there are only a few observations within a cluster (country or time, since I'm using dummies for both) then there is bias. However, I have a minimum of 75 observations in each country (about 110 on average) and a minimum of about 20 in each time point for each country (about 35 on average), so I assume this is large enough to get unbiased estimates, since I get the same estimates using clogit. If I'm correct on this point, do you know of any literature that talks about the number of subjects needed within groups to get unbiased estimates? Perhaps someone has done a simulation, but I don't seem to be able to find much on this topic.

      Comment


      • #4
        You might find the following article useful: "The behaviour of the maximum likelihood estimator of limited dependent variable models in the presence of fixed effects", by William Greene,
        Econometrics Journal (2004), volume 7, pp. 98–119.

        Comment


        • #5
          Thank you Stephen! A very interesting article. Greene states that bias is large for the MLE when T is small. In his notation, T is "the length of the panel" (p 99). I assume this means that T is the number of time points in the panel? Which would mean T = 3 for my dataset, in which case the estimates I get from the MLE should be biased to almost twice the size of those of the conditional likelihood estimator, when in fact they are identical for my data. Greene doesn't seem to talk that much about the number of observations within each group, but runs simulations with total samples of 100, 500, and 1000 (N in his notation). This leaves me a little confused. Unless I am misunderstanding Greene's notation.

          Comment

          Working...
          X