Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with mi import command

    Hello! I'm having some trouble with the mi import command. I open the dataset with the imputed values from amelia and run the command (where imp are the imputed datasets, and idno is the case identifier)

    mi import flong, m(imp) id(idno) imp(variables......) but i consistently get the error message:
    "variables (imp idno) do not uniquely identify the observations at least one (imp idno) value is duplicated"

    I'm not sure how to resolve this. Any ideas?

    Thanks!

  • #2
    Well, a requirement of -mi import flong- is that the m() and id() variables jointly identify unique observations in the data. Evidently that is not the case in your data. At the top level, there are two possibilities:

    1. You have, and expect to have, and should have, multiple observations per idno in each version of the data set. Perhaps it is data with replicate measurements, or longitudinal data or something like that. In that case, the problem is that your specification of the id() option is incomplete. It needs to be id(idno rep_no), or id(idno date) or something like that.

    1.5 You expect to have and should have multiple observations per idno in each version of the data set, but there is no other variable that, along with idno, identifies them. In that case you need to create such a variable: -by idno, sort: gen seq = _n- Then -mi import flong, m(imp) id(idno seq) imp(....)-

    2. You should have only one observation per idno in each version of the data set. In that case you need to find the inappropriate duplications and eliminate them in some appropriate way. I would start with:

    Code:
    sort imp idno
    duplicates tag imp idno, gen(flag)
    Then you can -list if flag- or -browse if flag-, or -keep if flag- and then save or export. Anyway, you have to look at all these duplicates and figure out why they are there. Maybe they are duplicates on all variables, not just imp and idno, and you can just eliminate them all with -duplicates drop- and lose no information. But if they differ on other variables, then you will have to decide which one is correct. Or perhaps no one of them is correct and you need to average values across the duplicates, or something like that. The possibilities at this point are more or less endless, and the best approach will depend on the details of your data and your research goals.

    Comment

    Working...
    X