Create Unique Identifier for Strings and Several Variables

Pascal Meier

Join Date: Oct 2019

Posts: 24
#1

Create Unique Identifier for Strings and Several Variables

04 Oct 2022, 06:31

Hello everyone,

my data is in the context of NBA basketball. The focus is on players in the starting lineup (denoted as starter1, starter2 etc.), see the following example:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str7 season str33 team str27 date str24(starter1 starter2 starter3 starter4 starter5) "1990-91" "Portland Trail Blazers" "February 28, 1991" "Duckworth,Kevin" "Drexler,Clyde" "Williams,Buck" "Porter,Terry" "Kersey,Jerome" "1990-91" "Sacramento Kings" "February 25, 1991" "Carr,Antoine" "Mays,Travis" "Simmons,Lionel" "Les,Jim" "Causwell,Duane" "1990-91" "Minnesota Timberwolves" "January 5, 1991" "Richardson,Pooh" "Campbell,Tony" "Corbin,Tyrone" "Mitchell,Sam" "Spencer,Felt"

For each player, I would need a unique identifier. If my data consisted only of one "starter" variable (one player), the solution would be quite easy:

Code:

egen identifier=group(starter1)

However, the player names are distributed across the 5 starting lineup variables. Thus, "Duckworth, Kevin" sometimes may appear in starter1 and sometimes in one of the four other variables. My aim is to have five variables (e.g., starter1_num, starter2_num etc.) similar to the string variable, but consisting of a unique numeric identifier (instead of the string). Probably it is easy, however I somehow cannot see a solution.

Thanks for your help already in advance.

Best regards,
Pascal
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3337
#2

04 Oct 2022, 07:27

this will stack the players, create a unique id, and then you can merge/joinby into the original data.
I do wonder if a switching to long format may make the work easier.

Code:

forv i = 1/5 { preserve keep starter`i' ren starter`i' starter save starter`i', replace restore } clear set obs 0 forv i = 1/5 { append using starter`i' } duplicates drop starter, force egen pid = group(starter)
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2449
#3

04 Oct 2022, 08:14

I have a different interpretation of what you want than George does. I'm thinking you want a separate observation for each instance in which a particular player appears, so that (perhaps) you might analyze how the performance of a player varies in relation to the composition of the team. Assuming that's right, here's a way to create that structure:

Code:

compress expand 5 // one observation for each starter in each game // Each starter can be the "focus player" bysort season team: gen seq = _n gen focus = "" forval i = 1/5 { qui replace focus = starter`i' if `i' == seq } encode focus, gen(id) // Inspect order season team focus id list season team focus, nolabel
1 like
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1548

04 Oct 2022, 08:38

Here's another variation:

Code:

preserve
    keep starter*
    gen `c(obs)' id = _n
    reshape long starter, i(id) j(n)
    duplicates drop starter, force
    drop id n
    encode starter, gen(identifier) 
    tempfile starters
    save `starters'
restore

forval i=1/5 {
    rename starter`i' starter
    merge m:1 starter using `starters', keep(3) keepusing(identifier) nogen
    rename identifier starter`i'_num
    order starter`i'_num, after(starter)
    drop starter
}

Comment

Pascal Meier

Join Date: Oct 2019

Posts: 24
#5

05 Oct 2022, 02:17

Dear all,

thank you very much for your fast and helpful answer - it suited my needs perfectly and I was able to solve it.

Best regards,
Pascal
Comment

Announcement

Create Unique Identifier for Strings and Several Variables

Comment

Comment

Comment

Comment