Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping all but the highest value per individual, multiple records

    Hi Statalist colleagues!

    Thanks to your previous help, I was able to create an indicator variable for the number of moves per individual within my dataset (marked by the *have column below). What I'm trying to do now is drop (or convert to missing) everything but the highest value of moves per individual.

    Essentially, I wish to determine the total number of individuals who moved more than once within my single year of data - but when I tabulate the *have column, it returns an overcounted number of moves since, for example, ind_id 2 is counted three times. I would like to keep the highest number of moves, the *want column, but remove all other values such that when I tabulate *want, it will give me an accurate figure without overcounting or multi-counting a single individual.

    Perhaps I'm glossing over a simple syntax, but I'm at a loss. Thank you!
    ind_id seq_id city *have *want
    1 1 "Hope" - -
    1 2 "Hope" - -
    1 3 "Aurora" 1 1
    1 4 "Aurora" - -
    1 5 "Aurora" - -
    2 1 "Hope" - -
    2 2 "Aurora" 1 -
    2 3 "Aurora" - -
    2 4 "Jackson" 2 -
    2 5 "Jackson" - -
    2 6 "Hope" 3 3

  • #2
    I believe this is what you want. In the future, you'll want to use -dataex- to present your example data, as described in the StataList FAQ for new members.


    Code:
    clear
    // I presume your "-" codes are actually missing values in your data set.
    input byte (ind_id seq_id) str10 city byte have
    1 1 "Hope" .
    1 2 "Hope" .
    1 3 "Aurora" 1
    1 4 "Aurora" .
    1 5 "Aurora" .
    2 1 "Hope" .
    2 2 "Aurora" 1
    2 3 "Aurora" .
    2 4 "Jackson" 2
    2 5 "Jackson" .
    2 6 "Hope" 3
    end
    //
    // See -help egen-: Often a source of easy solutions.
    egen want = max(have), by(ind_id)
    // It appears you also want the following.
    replace want = . if (have != want)

    Comment


    • #3
      Thanks so much, Mike. I will attempt to use -dataex- to present example data in the future, but the system on which I work is a remote desktop environment sequestered from the internet for privacy purposes, and I don't have STATA on my personal desktop environment.

      It is likely that I wasn't clear in describing my problem. I'll try to rearticulate it better in this reply! The ind_id "1" below is OK. For ind_id "2", however, I would want only 3 to be present, and all other values to be '.' -- this way, when I tabulate the variable, it is only returning the indicator 3 for ind_id 2, instead of returning 1, 2, and 3 for ind_id 2

      ind_id seq_id city *desired result
      1 1 "Hope" -
      1 2 "Hope" -
      1 3 "Aurora" 1
      1 4 "Aurora" -
      1 5 "Aurora" -
      2 1 "Hope" -
      2 2 "Aurora" -
      2 3 "Aurora" -
      2 4 "Jackson" -
      2 5 "Jackson" -
      2 6 "Hope" 3
      Last edited by Cody Fiduccia; 04 Aug 2020, 18:31.

      Comment


      • #4
        When I run the code above and -tabulate want-, I get
        Code:
        . tab want
        
               want |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |          1       50.00       50.00
                  3 |          1       50.00      100.00
        ------------+-----------------------------------
              Total |          2      100.00

        I must be misunderstanding you, as that appears to me as being what you say you want. Perhaps you want to delete actual observations, or you have some other meaning for "tabulate" than what that term means in Stata.

        Comment


        • #5
          Hi Mike -

          My apologies. After a night's rest, I reentered the code as you provided with my variable names and it worked. I wasn't translating your example to my actual data correctly yesterday. Thank you very much, it will be quite helpful in my analyses of mobility over time. Sincerely appreciate your patience and understanding!

          All the best,
          Cody

          Comment


          • #6
            Good that it all worked out!

            Note that if dataex is not in your Stata, then it seems that you are using an out-of-date Stata (15.0, 14.1 or earlier) and hence are best advised to flag that in future posts so that advice doesn't refer to features you can't use.

            Please see https://www.statalist.org/forums/help#version for explanation and while you're passing swing by https://www.statalist.org/forums/help#spelling

            Comment

            Working...
            X