Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations within an identifier leaving only one working observation with the earliest dated hospital admission

    Hi, I've struggle to get an answer to this elsewhere however I'm sure it will be I just can't get my head around what I need to do.

    This is an example dataset

    ID Date of hospital admission Other variables x y z
    A1 1/1/13
    A1 1/1/14
    A1 1/2/14
    A2 2/2/13
    A2 2/2/10
    A3 2/2/18
    A3 5/5/20
    A4 1/2/13

    I want the data to look like

    ID First hospital admission Other variables x y z
    A1 1/1/13
    A2 2/2/10
    A3 2/2/18
    A4 1/2/13


    I've got thousands of observations so can't just do manually.

    Any ideas to code it would be much appreciated.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 id str6 str_date
    "A1" "1/1/13"
    "A1" "1/1/14"
    "A1" "1/2/14"
    "A2" "2/2/13"
    "A2" "2/2/10"
    "A3" "2/2/18"
    "A3" "5/5/20"
    "A4" "1/2/13"
    end
    
    gen date = daily(str_date, "MD20Y")
    format date %td
    assert missing(date) == missing(str_date)
    drop str_date
    
    by id (date), sort: keep if _n == 1
    Note: Because you posted a little hand-typed table of your data, the crucial information of whether your date variable is a real Stata date variable or just a string that reads like a date to human eyes, has been effectively withheld. The code above assumes it's just a string that must first be converted to a real Stata internal format date variable. If it's already a Stata date variable, however, then only the final line of the code shown is needed.

    You can avoid problems like that and have a better Statalist experience if you show example data by using the -dataex- command, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      I've got thousands of observations so can't just do manually.
      Even if you had only one observation and it only took a single keystroke to make the change manually, you should never even think about doing that. Unless you are just playing around for fun, all the analytic and data management work you do should have an audit trail. Manual adjustments to the data do not meet that key criterion for data integrity. Always use code, code that you save and maintain, for all your interactions with your data, no matter how trivial they seem.

      Comment


      • #4
        Thank you so much for your help, I didn’t know about the -dataex-
        command and will use it in future.

        your advice worked perfectly, this has cracked an absolute painful headache for me so thank you.

        Comment

        Working...
        X