Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create Unique Identifier for Strings and Several Variables

    Hello everyone,

    my data is in the context of NBA basketball. The focus is on players in the starting lineup (denoted as starter1, starter2 etc.), see the following example:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str7 season str33 team str27 date str24(starter1 starter2 starter3 starter4 starter5)
    "1990-91" "Portland Trail Blazers" "February 28, 1991" "Duckworth,Kevin"   "Drexler,Clyde"     "Williams,Buck"     "Porter,Terry"      "Kersey,Jerome"      
    "1990-91" "Sacramento Kings"       "February 25, 1991" "Carr,Antoine"      "Mays,Travis"       "Simmons,Lionel"    "Les,Jim"           "Causwell,Duane"     
    "1990-91" "Minnesota Timberwolves" "January 5, 1991"   "Richardson,Pooh"   "Campbell,Tony"     "Corbin,Tyrone"     "Mitchell,Sam"      "Spencer,Felt"
    For each player, I would need a unique identifier. If my data consisted only of one "starter" variable (one player), the solution would be quite easy:
    Code:
     egen identifier=group(starter1)
    However, the player names are distributed across the 5 starting lineup variables. Thus, "Duckworth, Kevin" sometimes may appear in starter1 and sometimes in one of the four other variables. My aim is to have five variables (e.g., starter1_num, starter2_num etc.) similar to the string variable, but consisting of a unique numeric identifier (instead of the string). Probably it is easy, however I somehow cannot see a solution.

    Thanks for your help already in advance.

    Best regards,
    Pascal

  • #2
    this will stack the players, create a unique id, and then you can merge/joinby into the original data.
    I do wonder if a switching to long format may make the work easier.

    Code:
    forv i = 1/5 {
        preserve
        keep starter`i'
        ren starter`i' starter
        save starter`i', replace
        restore
    }
    
    clear
    set obs 0
    forv i = 1/5 {
        append using starter`i'
    }
    duplicates drop starter, force
    egen pid = group(starter)

    Comment


    • #3
      I have a different interpretation of what you want than George does. I'm thinking you want a separate observation for each instance in which a particular player appears, so that (perhaps) you might analyze how the performance of a player varies in relation to the composition of the team. Assuming that's right, here's a way to create that structure:
      Code:
      compress
      expand 5 // one observation for each starter in each game
      // Each starter can be the "focus player"
      bysort season team: gen seq = _n
      gen focus = ""
      forval i = 1/5 {
        qui replace focus = starter`i' if `i' == seq
      }
      encode focus, gen(id)
      // Inspect
      order season team  focus id
      list season team focus, nolabel

      Comment


      • #4
        Here's another variation:

        Code:
        preserve
            keep starter*
            gen `c(obs)' id = _n
            reshape long starter, i(id) j(n)
            duplicates drop starter, force
            drop id n
            encode starter, gen(identifier) 
            tempfile starters
            save `starters'
        restore
        
        forval i=1/5 {
            rename starter`i' starter
            merge m:1 starter using `starters', keep(3) keepusing(identifier) nogen
            rename identifier starter`i'_num
            order starter`i'_num, after(starter)
            drop starter
        }

        Comment


        • #5
          Dear all,

          thank you very much for your fast and helpful answer - it suited my needs perfectly and I was able to solve it.

          Best regards,
          Pascal

          Comment

          Working...
          X