Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • keeping one observation, when 2 or more variables repeat

    Hi,

    I am a beginner, trying to understand Stata and learning new things. I am working on a project, a retrospective cohort study design. I am trying to find out if an exposure to some medication does have protection on the exposed vs non-exposed population. When I got the data, they gave me each time when a diagnosis was made. I want to keep for the same id, just the first time the diagnosis was entered. I tried duplicates drop person dg, but not sure if that is the correct one and enough. There is also same person (persnumber) with multiple diagnosis /dg(0, 1, 2, ... 7).

    Let me give you an example:

    id persno dg date
    1 2345 4 1/14/2011
    2 2345 4 3/15/2017
    3 2345 7 2/3/1988
    4 2001 5 etc
    5 2001 6
    6 2017 0
    7 2001 0

    What I want, or how should it look like

    id persno dg(0) dg(4) dg(7) dg(5) date
    1 2345 0 1 1 0 1/14/2011 but also 2/3/1988 for dg(7)
    2 2001 0 0 0 1
    3 2017 1 0 1 0

    I know that I could use tab dg, gen(dg_) which will give me dummy variables, but what to do when person duplicates, and also dg duplicates? and I would like to keep the first time the diagnosis was made, and how to connect second dg of interest to its date?

    Thank you!

  • #2
    Welcome to the Stata Forum / Statalist.

    Please take some time to read the FAQ, particularly on how to share data.

    Unfortunately, your query is not clear to me.

    Best regards,

    Marcos

    Comment


    • #3
      Thank you,

      I see that my post is quite different, then how I wanted.

      Initial Data

      Click image for larger version

Name:	Screen Shot 2017-06-28 at 11.20.54 AM.png
Views:	1
Size:	104.0 KB
ID:	1399675


      I know that I could use tab dg, gen (dg) to get dg(0) ....dg(7), but I do not know how to get rid off the duplicates dg() and persno, but keeping just the first duplicate whit the first date.

      I also, need for the same persno with different dg, to create another row, but with the first date, too.
      Attached Files

      Comment


      • #4
        It seems you wish to apply the - reshape wide - command. Unfortunately, I'm not with Stata at hand at this very moment.

        That said, I fail to envisage the reason for reshaping wide, since the long format is preferred in most situations, more so under longitudinal data.
        Best regards,

        Marcos

        Comment


        • #5
          I would look to sorts and lags to get what your original request asked. You can sort and then use a drop if condition to drop if there are repeated observations with the same diagnosis.

          Comment

          Working...
          X