Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I keep the lowest numeric value for a variable on an annual basis over time?

    I have a focused question. I am working in long format. One of my variables is called “new_diff_days”, which is the absolute # of days between each participant’s birthday (month and day - not DOB) and the date of each participants HbA1c test from a merged baseline study plus a registry-based data set. Of course, each participant has a numeric ID_number I want to only keep the lowest # of days between each individual participant's birthday and the date of their HbA1c test (“new_diff_days”) for each year (e.g., 2010) that one or many HbA1c tests (HbA1c_mmolmol) was done (“y_status_dato”) during the time period of 2008 to 2020. How would you write code for that?
    Last edited by Kevin Marks; 15 Oct 2021, 05:48.

  • #2
    Try this:

    Code:
    bys ID_number:  egen mindays = min(new_diff_days)
    keep if new_diff_days == mindays

    Comment


    • #3
      Kevin:
      probably less efficient than George's helpful code is the following toy-example:
      Code:
      use "https://www.stata-press.com/data/r16/nlswork.dta"
      . bysort idcode (year): egen wanted=min( tenure )
      
      . bysort idcode: replace wanted=. if _n>1
      
      
      . list idcode year wanted if idcode==1
      
             +--------------------------+
             | idcode   year     wanted |
             |--------------------------|
          1. |      1     70   .0833333 |
          2. |      1     71          . |
          3. |      1     72          . |
          4. |      1     73          . |
          5. |      1     75          . |
             |--------------------------|
          6. |      1     77          . |
          7. |      1     78          . |
          8. |      1     80          . |
          9. |      1     83          . |
         10. |      1     85          . |
             |--------------------------|
         11. |      1     87          . |
         12. |      1     88          . |
             +--------------------------+
      
      .
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Thank you for that code. Instead I used...
        bys ID_number HbA1c_mmolmol y_status_dato: egen mindays = min(new_diff_days)
        keep if new_diff_days == mindays

        For each participant (ID_number), I was trying to keep the minimum number for the variable "new_diff_days" for many rows of HbA1c_mmolmol values for each year (y_status_dato). I noticed that fewer values got deleted when I used "bys ID_number HbA1c_mmolmol y_status_dato" instead of " bysort ID_number". Does that make sense?

        Comment


        • #5
          It makes sense if it does what you need it to do (keep the lowest value). I think you want one observation per ID, so check that.

          This will tell you (among other approaches).
          Code:
          gunique ID_number
          Carlo's code is a nice way to "see" what you've done to make sure it's right.


          Comment

          Working...
          X