Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Controlling for differences in panel size

    I have this panel data set where the number of records for a panel differs greatly, e.g. e.g. a case might have anywhere from 2 to 50 records. For each record, I have a variable that is coded 1 if this is the highest scoring record for that panel member, 0 otherwise. I want to run an xtlogit or melogit analysis with this as the dependent variable. That is, I want to examine what determines a respondent's most successful record.

    The problem is, with larger panels, the biggest reason a record isn't ranked #1 is because there are more records that could be #1. e.g. if you only have 2 records, there is a 50% chance for each record to be ranked #1, but with 50 records it is only a 2% chance.

    So the question is, how best to control for panel size? If I were doing poisson, I think I would use the exposure option, e.g. exposure(nrecs); or the offset option, e.g. offset(ln_nrecs). Should I do something similar in xtlogit, e.g. include the log of nrecs as an explanatory variable? Or is there some more appropriate way to control for the fact that panels do not have the same number of records?

    Unfortunately, I cannot share any of the data at this time, but hopefully my explanation of the problem is clear enough. I would imagine that issues caused by differences in panel size have come up in other situations.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

  • #2
    Hello Richard,

    Just a tentative approach, sorry if it is not helpful. I fear - meologit - wouldn't allow for weights, but, assuming - xtlogit do offer - iweights - (PA models accept fweights or pweights) either for fixed or random effects, I wonder whether adjusting for some formula which considers the (inverse?) probability of getting #1 with regards to the number of records, well, I wonder whether this strategy couldn't be applied. Well, as I said in the beginning, this is a tentative approach.
    Best regards,

    Marcos

    Comment


    • #3
      I have a similar question (in particular, when cases are nested within groups as well).
      Last edited by Amin Sofla; 28 May 2018, 14:30.

      Comment


      • #4
        Paul Allison says to consider conditional logit (i.e., xtlogit, fe) models. That intrinsically controls for the number of records. I think that would be ok for many of the analyses I have in mind, but I am not sure if it will work for all of them.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          I might be wrong here, but if the assumptions of the random-effects model are met, shouldn't the random-effect, i.e., the unit-specific intercept absorb the differences in number of records? Remember, you do not need to include all "reasons" for being the highest scoring record in the regression model when these reasons are not correlated with the predictors of interest and when interest is in the "effects" of the latter predictors not in predicting.* If you believe that the number of records, which I understand in terms of differences in participation probability, e.g., panel attrition, is correlated with the predictors of interest, then you may have a problem, either way; not even the conditional logit would do if the determinants of participation vary with time.

          I find the problem a little unusual for another reason: The dependent variable, i.e., the highest scoring record, does not seem to be an "absolute" measure in the sense that it will always be the highest scoring record that you have observed so far. Maybe an unobserved record might have been the highest; maybe the record not yet observed will be the highest, making this concept an (individual specific) relative measure. This does not seem like the type of event you would typically study. Whether it is a problem, is not clear to me and the answer to that question might be more a conceptual than a statistical one. To discuss this further, we would probably need to hear about the contents of the study.

          Best
          Daniel


          * I am aware that in non-linear models any predictor that is correlated with the outcome will affect the coefficients of the other predictors even when uncorrelated; whether this is a problem depends on the exact research question.
          Last edited by daniel klein; 28 May 2018, 22:34.

          Comment


          • #6
            It is kind of an unusual data set and problem, at least for me. The panel members are researchers, and the records are information on their publications. And yes, the DV is not an absolute measure -- the "reigning champion" might get knocked off in some future year. But of course that is true of many things, e.g. the current "hottest year ever" could be bumped off in a few years. We will also have DVs that are absolute measures, like whether the publication is highly cited within its field.

            Basically we are trying to determine correlates of academic publishing success -- what are the characteristics of a publication that make it the researcher's most successful? How does that vary by characteristics of the researcher, e,g. is prestige of journal more important, or less important, for women? Or do the correlates of success differ by academic field?

            With an re model, panel size definitely matters. For example, if I just toss in gender, gender has a big positice effect. But that is because women tend to have fewer publications, so each of the ones they do have is more likely to be their highest ranked. Allison says if we do an re model, he would probably just include number of publications (or their log) as a predictor. But with other DVs panel size might not matter, e.g. nobody has to have a highly cited paper nor is anybody limited to having only one.

            Thanks to everyone for their input. I think we can do some interesting things, if we can just figure out how to do them correctly! But I think we can get quite a bit done by using fe models.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            Stata Version: 17.0 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment

            Working...
            X