Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects with repeated cross-sections (pseudo-panel)

    Hi,

    I would like to run a fixed-effects regression in stata for repeated cros-sections (pseudo-panel). I am trying to use the following syntax, but it's giving me an error:



    xtset id_lug_birt_2 period
    *repeated time values within panel
    r(451);


    xtreg ln_inglab_hora ethnici1 edumo2 edumo3 edufa2 edufa3 female birthreg2 birthreg3 [iw=FEX_C], fe vce (cluster id_lug_birt_2)
    *must specify panelvar; use xtset
    r(459);

    Could anyone tell me what the mistake is.

    Thank you

  • #2
    The message is self explanatory: there is at least one (and there could be more) combination of values of id_lug_birt_2 and period that appears more than once in the data. You can't -xtset- with a time variable when that happens. The question is now, what are these observations and how did they get into your data set. To find them:
    Code:
    duplicates tag id_lug_birt_2 period, gen(flag)
    browse if flag
    There are several possibilities. You may find that all of these surplus observations are exact duplicates: they agree on all the other variables. In that case, you can -duplicates drop- and they will go away. (But before you do that, you should ask yourself why they are there; usually fully exact duplicate observations are the result of errors in data management. So you should review the data management that created that data set, because where there is one error, others often lurk.)

    Or you may find that these surplus observations disagree on some other variables. Then you may have a real problem: they are contradicting each other. So you need to find some way to reconcile the contradictions and reduce the data to one observation per id_lug_birt_2 period pair. Or you may have an apparent, but not real, problem: the observations disagree on some other variable(s) but they are actually all correct and at least one of the other variables is an additional identifier variable. That is, id_lug_birt_2 isn't really an identifier of an analytic unit in your data--it might be a firm name but some other variable spells out which division of the firm, or something like that. In that case, you need to create a new identifier variable that takes all of those separate variables into account. See -help egen- and scroll down to the -group()- function. Then -xtset- with that new varible.

    Finally, this may all be much ado about nothing. Why are you bothering to specify a time variable in your -xtset- command. You only need that if you are going to use lag and lead operators or estimate auto-regressive correlation structures. But you specifically said that this is not panel data, it is multiple cross sections. So in that case, none of those things are applicable. So you could just -xtset id_lug_birt_2-, with no time variable, and go on your way. (But don't do this until you have checked the duplicates to make sure they are not data errors that you need to fix.)

    Comment

    Working...
    X