Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to run an analysis on only certain observations in a dataset

    Hello everyone!

    I have a wide dataset with several rows per individual. How would you make sure to consider only one of the observations while calculating things like mean age. I only want to use the value of the first observation for each individual.

    Let's say I want to calculate mean age and it varies per row because these are values recorded over several years. How would I consider only the first observation for each individual

    Here is an example:
    Beneficiary ID Unique ID Age
    EEEAVH9E 1 72
    EEEAVH9E 1 72
    EEEAVH9E 1 73
    EEEAVG7F 2 80
    EEEAVG7F 2 81
    Any help would be appreciated!


  • #2
    Aditi:
    do you mean something along the following lines?:
    Code:
    use "https://www.stata-press.com/data/r17/nlswork.dta"
    . bysort idcode (year): gen wanted=age if _n==1
    
    
    . tabstat age, stat(count mean sd p25 p50 p75 min max)
    
        Variable |         N      Mean        SD       p25       p50       p75       Min       Max
    -------------+--------------------------------------------------------------------------------
             age |     28510  29.04511  6.700584        23        28        34        14        46
    ----------------------------------------------------------------------------------------------
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I think Carlo meant
      Code:
      . use "https://www.stata-press.com/data/r17/nlswork.dta"
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . bysort idcode (year): gen wanted=age if _n==1
      (23,825 missing values generated)
      
      . tabstat wanted, stat(count mean sd p25 p50 p75 min max)
      
          Variable |         N      Mean        SD       p25       p50       p75       Min       Max
      -------------+--------------------------------------------------------------------------------
            wanted |      4709  23.94479  5.940831        20        22        26        14        45
      ----------------------------------------------------------------------------------------------
      or equivalently
      Code:
      . use "https://www.stata-press.com/data/r17/nlswork.dta"
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . bysort idcode (year): gen wanted = _n==1
      
      . tabstat age if wanted, stat(count mean sd p25 p50 p75 min max)
      
          Variable |         N      Mean        SD       p25       p50       p75       Min       Max
      -------------+--------------------------------------------------------------------------------
               age |      4709  23.94479  5.940831        20        22        26        14        45
      ----------------------------------------------------------------------------------------------

      Comment


      • #4
        William is correct (thank you!) and I was sloppy in my previous reply.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you so much for your responses. There are multiple observations within the same year in some cases, so I just want to be able to consider the first observation for the individual irrespective of year. Is this possible?

          Comment


          • #6
            Code:
            generate seq = _n // keep the observations in the same order they are in currently within each UniqueID
            bysort UniqueID (seq): generate wanted = _n==1
            tabstat age if wanted

            Comment

            Working...
            X