Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping Last Observation

    Hi all,

    Complete newbie here. I've been trying to figure out the following without success. My set of data contains a variable X with observations that repeat (i.e., the number of times the same person had different observations). I'm trying to figure out how to keep only the last observation in my data set.

    It looks something like this:

    X Y Z
    6 Yes 10/31/1990
    3 No 2/10/2010
    3 No 2/28/2020
    4 No 5/30/2016
    6 No 12/10/2020


    Observations 6 and 3 repeat for X. I only want to keep the last observations in X (while keeping everything else). My hope is that after keeping only the last observations of X, I will be left with something like this:


    X Y Z
    3 No 2/28/2020
    4 No 5/30/2016
    6 No 12/10/2020


    Any help would be much appreciated!

  • #2
    Code:
    bysort X (Z); keep if _n == _N

    Comment


    • #3
      Let me add the following to Nick's answer.

      You say your data "looks something like this". In that case, Nick's answer looks something like the right answer.

      If your dates are stored in a string variable in your dataset, then this code will not sort your dates into the correct order. Let's look at just the dates, ignoring x for this example.
      Code:
      . * Example generated by -dataex-. For more info, type help dataex
      . clear
      
      . input str10 z
      
                    z
        1. "10/31/1990"
        2. "2/10/2010" 
        3. "2/28/2020" 
        4. "5/30/2016" 
        5. "12/10/2020"
        6. end
      
      . sort z
      
      . list
      
           +------------+
           |          z |
           |------------|
        1. | 10/31/1990 |
        2. | 12/10/2020 |
        3. |  2/10/2010 |
        4. |  2/28/2020 |
        5. |  5/30/2016 |
           +------------+
      
      .
      You need to have your dates stored as Stata numeric variables.
      Code:
      . generate zdate = daily(z,"MDY")
      
      . format %td zdate
      
      . sort zdate
      
      . list zdate
      
           +-----------+
           |     zdate |
           |-----------|
        1. | 31oct1990 |
        2. | 10feb2010 |
        3. | 30may2016 |
        4. | 28feb2020 |
        5. | 10dec2020 |
           +-----------+
      Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

      All Stata manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

      Comment


      • #4
        Hi
        I have a dataset with repeated measurements of ultrasounds. EFW is estimated fetal weight.

        I want to analyze the last observed EFW, hence I need to select it -- somehow. How do I select the last recorded EFW?

        My data is in long format, as follows (giving a part of dataset, showing that not all participants had the same number of measurements during pregnancy - where EFW is blank, its missing i.e. For any woman, the last observed EFW is the value just before the missing EFW):

        PTID scan EFW
        14-10002 1 296.0504
        14-10002 2 556.1716
        14-10002 3 1012.506
        14-10002 4 2097.567
        14-10002 5 3212.682
        14-10002 6
        14-10003 1 295.7653
        14-10003 2 474.2715
        14-10003 3 962.3026
        14-10003 4 1443.261
        14-10003 5 2276.78
        14-10003 6
        14-10004 1 278.6404
        14-10004 2 475.6306
        14-10004 3 793.9675
        14-10004 4 1638.684
        14-10004 5
        14-10004 6
        14-10005 1 333.0324
        14-10005 2 527.5546
        14-10005 3 822.053
        14-10005 5
        14-10005 6
        14-10005 4
        14-10008 1 284.0349
        14-10008 2 447.2947
        14-10008 3 818.6283
        14-10008 4 1473.189
        14-10008 5 2176.725
        14-10008 6
        Your help will be appreciated!

        Comment

        Working...
        X