Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop duplicates in panel data

    Hi,

    I am working with panel data organized by patient ID (id) and date of assessment (date). Unfortunately, some discharge assessments (type) took place on the same day as the admission assessment, hence I have repeated time values in my panel data. I've been able to visually confirm that the admission assessments are generally more complete for my two main variables of interest using the command below.

    browse id date type var1 var2 if (id == id[_n-1]| id == id[_n+1]) & (date == date[_n-1] | date ==date[_n+1])

    I now want to force Stata to drop the discharge assessments when id and date are duplicates, but I cannot get this to work. Help?
    I know that I am throwing away information, but I see no better option and I have 2.5 million IDs.

    Thank you,
    Emma

  • #2
    Well, if the discharge and admission observations have the same values for id and date, how can you tell which is which?

    Added: Also, before you go ahead and throw away data, why do you need to have id and date uniquely identify the observations? You can use -xtset id- without specifying a time variable. You will then still be able to make use of all of the basic -xt- type commands in Stata.

    The only things you won't be able to do without specifying a time variable in -xtset- is use time series operators like lag, lead, difference, and seasonal difference, and you won't be able to estimate models with autoregressive correlation structure. But this sounds like health care data, and those things are hardly ever needed in health care work. So maybe just -xtset id- and move forward without worrying about it?
    Last edited by Clyde Schechter; 02 May 2017, 15:25.

    Comment


    • #3
      I also have the assessment type (admission, discharge, others). I read your response to the post "deleting duplicate rows in data panel".
      Basically I want the command below, but I can't find a way to make it work with an if statement (if type == "discharge").

      duplicates drop id date, force

      Edit:
      Good point about not throwing away data, I will keep that in mind. I am just trying to describe symptoms at baseline, i.e. at first admission. You might say then just get rid of all discharge assessments, which would be great but there about 25 types of assessment codes, some of which are both first and discharge all in one, while other patients have a first and a discharge assessment on the same day.
      Last edited by Emmanuelle Belanger; 02 May 2017, 15:38.

      Comment


      • #4
        Basically I want the command below, but I can't find a way to make it work with an if statement (if type == "discharge").

        duplicates drop id date, force
        No, you can't do that.

        The following code will eliminate all type == "discharge" records when it is not the only record for that id for that date:

        Code:
        by id date, sort: drop if (type == "discharge") & (_N > 1)

        Comment


        • #5
          Thank you!

          Comment

          Working...
          X