Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping the first observation from Multiple Observations in panel data

    Hi all,

    I need some help. I am working with a long format dataset. Multiple clinical visits over time; however, for the drug variable (for each prescription or non-prescription over the counter drug use) a new row has been created. The Drug variable is not of my interest at this moment. For each id and visit I just want to keep one observation, the first one. Here is an example of the data structure.

    Thanks in advance for your help.

    list MACSID VISIT VDATE LDATM LDATY DKGRP RACE POPUSE DRUG CLASS1 CLASS2A CLASS2B in 154/170

    +----------------------------------------------------------------------------------------------------+
    | MACSID VISIT VDATE LDATM LDATY DKGRP RACE POPUSE DRUG CLASS1 CLASS2A CLASS2B |
    |----------------------------------------------------------------------------------------------------|
    154. | 10004 550 18722 4 2011 1 1 1 503 99 9999 . |
    155. | 10004 560 18904 10 2011 1 1 1 532 7 701 . |
    156. | 10004 560 18904 . . . 1 . 532 7 701 . |
    157. | 10004 560 18904 10 2011 1 1 1 503 99 9999 . |
    158. | 10004 560 18904 10 2011 1 1 1 522 8 801 . |
    |----------------------------------------------------------------------------------------------------|
    159. | 10004 560 18904 . . . 1 . 517 9 901 . |
    160. | 10004 560 18904 10 2011 1 1 1 511 99 9999 . |
    161. | 10004 560 18904 10 2011 1 1 1 503 99 9999 . |
    162. | 10004 570 19086 . . . 1 . 517 9 901 . |
    163. | 10004 570 19086 . . . 1 . 801 1 101 . |
    |----------------------------------------------------------------------------------------------------|
    164. | 10004 570 19086 4 2012 1 1 1 532 7 701 . |
    165. | 10004 570 19086 4 2012 1 1 1 511 99 9999 . |
    166. | 10004 570 19086 4 2012 1 1 1 511 99 9999 . |
    167. | 10004 570 19086 4 2012 1 1 1 525 99 9999 . |
    168. | 10004 570 19086 . . . 1 . 522 8 801 . |
    |----------------------------------------------------------------------------------------------------|
    169. | 10004 570 19086 4 2012 1 1 1 522 8 801 . |
    170. | 10004 580 19275 10 2012 1 1 1 532 7 701 . |
    +----------------------------------------------------------------------------------------------------+




  • #2
    Please read the FAQ about posting data using dataex: https://www.statalist.org/forums/help#stata

    Also, please provide the info for the sorting of the data. The "first" observation might change and so it is best to be very specific about the sorting order.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Hi Carole,

      Thanks for your reply. I am thinking sort, by(ID VISIT) and will drop cases with missing on ID and VISIT.

      Best,

      Syed
      Last edited by Syed W Noor; 30 Jul 2018, 18:17.

      Comment


      • #4
        Sorry for missing the dataex. Here it is.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double(MACSID VISIT VDATE LDATM LDATY RACE DRUG CLASS1 CLASS2A CLASS2B)
        10004 550 18722  4 2011 1 503 99 9999 .
        10004 560 18904 10 2011 1 532  7  701 .
        10004 560 18904  .    . 1 532  7  701 .
        10004 560 18904 10 2011 1 503 99 9999 .
        10004 560 18904 10 2011 1 522  8  801 .
        10004 560 18904  .    . 1 517  9  901 .
        10004 560 18904 10 2011 1 511 99 9999 .
        10004 560 18904 10 2011 1 503 99 9999 .
        10004 570 19086  .    . 1 517  9  901 .
        10004 570 19086  .    . 1 801  1  101 .
        10004 570 19086  4 2012 1 532  7  701 .
        10004 570 19086  4 2012 1 511 99 9999 .
        10004 570 19086  4 2012 1 511 99 9999 .
        10004 570 19086  4 2012 1 525 99 9999 .
        10004 570 19086  .    . 1 522  8  801 .
        10004 570 19086  4 2012 1 522  8  801 .
        10004 580 19275 10 2012 1 532  7  701 .
        end


        Comment


        • #5
          Since VISIT and VDATE are repeating within MACSID, that is probably not the best candidate for the second sort variable (unless there is a third that would specify the order of within VISIT/VDATE).

          But, assuming you go with VISIT, the following code will drop all observations within MACSID except the first one:

          Code:
          **dummy variable to identify the first observation
          bysort MACSID (VISIT): gen first=_n==1
          
          **drop if it is not the first observation
          drop if first != 1
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Hi Carole,

            Thanks for your reply. I used the following code:

            sort MACSID (VISIT)
            quietly by MACSID VISIT : gen dup = cond(_N==1,0,_n)
            tab dup

            drop if dup>1

            It kept one observation for each ID and visit.

            Best,

            Syed

            Comment

            Working...
            X