How to reorganize my raw dataset into the following form: each row is an observation and each column is a variable? Thanks!

John Park

Join Date: Dec 2022

Posts: 11
#1

How to reorganize my raw dataset into the following form: each row is an observation and each column is a variable? Thanks!

01 Dec 2022, 05:56

Hi, guys, I'm a new stata user, and I got the following raw data, which documents the brand choice for different teams in different neighborhoods as well as the expenditure on environmental education in each neighborhood. In the following illustrative table, the brand column documents brand choice, team column documents team name, loc_id column documents neighborhood id, and loc_edu documents education expenditure:

brand team loc_id loc_edu

A team1 n1 13.1

B team1 n1 13.1

A team1 n2 9.7

B team1 n2 9.7

X team2 n1 13.1

Y team2 n1 13.1

X team2 n3 5

B team1 n4 6.2

Y team2 n4 6.2

team1 could choose brand A and B, only brand A, only brand B or none. team 2 could choose brand X and Y, only brand X, only brand Y or none.
I want to reorganize my data in the following way:

loc_id choice_team1 choice_team2 loc_edu

n1 1 1 13.1

n2 1 0 9.7

n3 0 2 5

n4 3 3 6.2

In the new table each row corresponds to a loc_id (an observation) and each column corresponds to a variable. team1's choice equals 1 if both A and B are purchased, and equals 2 if only A is purchased, and equals 3 if only B is purchased and equals 0 if none is purchased. team2' choice equals 1 if both X and Y are purchased, equals 2 if only X is purchased, equals 3 if only Y is purchased and equals 0 if none is purchased.

What's the stata codes that could achieve this? Thanks!
Tags: generate variable, group

Nick Cox

Join Date: Mar 2014
Posts: 35661

01 Dec 2022, 06:41

Some people charge hundreds of USD, or the equivalent, for this kind of thing!

Code:

 
* Example generated by -dataex-. For more info, type help dataex
clear
input str1 brand str5 team str2 loc_id float loc_edu
"A" "team1" "n1" 13.1
"B" "team1" "n1" 13.1
"A" "team1" "n2"  9.7
"B" "team1" "n2"  9.7
"X" "team2" "n1" 13.1
"Y" "team2" "n1" 13.1
"X" "team2" "n3"    5
"B" "team1" "n4"  6.2
"Y" "team2" "n4"  6.2
end

fillin loc_id team 
bysort loc_id (loc_edu) : replace loc_edu = loc_edu[1]

gen choice = 0 
bysort loc_id team (brand): replace choice = 1 if inlist(brand[1], "A", "X") & inlist(brand[2], "B", "Y")
by loc_id team: replace choice = 2 if _N == 1 & inlist(brand, "A", "X")
by loc_id team: replace choice = 3 if _N == 1 & inlist(brand, "B", "Y")

duplicates drop loc_id team, force 
drop brand _fillin 

reshape wide choice, i(loc_id) j(team) string  

list 

    +----------------------------------------+
     | loc_id   choice~1   choice~2   loc_edu |
     |----------------------------------------|
  1. |     n1          1          1      13.1 |
  2. |     n2          1          0       9.7 |
  3. |     n3          0          2         5 |
  4. |     n4          3          3       6.2 |
     +----------------------------------------+

Comment

John Park

Join Date: Dec 2022
Posts: 11

01 Dec 2022, 07:05

Originally posted by Nick Cox View Post

Some people charge hundreds of USD, or the equivalent, for this kind of thing!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 brand str5 team str2 loc_id float loc_edu
"A" "team1" "n1" 13.1
"B" "team1" "n1" 13.1
"A" "team1" "n2" 9.7
"B" "team1" "n2" 9.7
"X" "team2" "n1" 13.1
"Y" "team2" "n1" 13.1
"X" "team2" "n3" 5
"B" "team1" "n4" 6.2
"Y" "team2" "n4" 6.2
end

fillin loc_id team
bysort loc_id (loc_edu) : replace loc_edu = loc_edu[1]

gen choice = 0
bysort loc_id team (brand): replace choice = 1 if inlist(brand[1], "A", "X") & inlist(brand[2], "B", "Y")
by loc_id team: replace choice = 2 if _N == 1 & inlist(brand, "A", "X")
by loc_id team: replace choice = 3 if _N == 1 & inlist(brand, "B", "Y")

duplicates drop loc_id team, force
drop brand _fillin

reshape wide choice, i(loc_id) j(team) string

list

+----------------------------------------+
| loc_id choice~1 choice~2 loc_edu |
|----------------------------------------|
1. | n1 1 1 13.1 |
2. | n2 1 0 9.7 |
3. | n3 0 2 5 |
4. | n4 3 3 6.2 |
+----------------------------------------+

Thank you so much Nick! This works!

Comment

John Park

Join Date: Dec 2022
Posts: 11

01 Dec 2022, 20:39

Originally posted by Nick Cox View Post

Some people charge hundreds of USD, or the equivalent, for this kind of thing!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 brand str5 team str2 loc_id float loc_edu
"A" "team1" "n1" 13.1
"B" "team1" "n1" 13.1
"A" "team1" "n2" 9.7
"B" "team1" "n2" 9.7
"X" "team2" "n1" 13.1
"Y" "team2" "n1" 13.1
"X" "team2" "n3" 5
"B" "team1" "n4" 6.2
"Y" "team2" "n4" 6.2
end

fillin loc_id team
bysort loc_id (loc_edu) : replace loc_edu = loc_edu[1]

gen choice = 0
bysort loc_id team (brand): replace choice = 1 if inlist(brand[1], "A", "X") & inlist(brand[2], "B", "Y")
by loc_id team: replace choice = 2 if _N == 1 & inlist(brand, "A", "X")
by loc_id team: replace choice = 3 if _N == 1 & inlist(brand, "B", "Y")

duplicates drop loc_id team, force
drop brand _fillin

reshape wide choice, i(loc_id) j(team) string

list

+----------------------------------------+
| loc_id choice~1 choice~2 loc_edu |
|----------------------------------------|
1. | n1 1 1 13.1 |
2. | n2 1 0 9.7 |
3. | n3 0 2 5 |
4. | n4 3 3 6.2 |
+----------------------------------------+

Hi Nick, one last question. I just noticed that I have some neighborhoods in which teams purchase A or B multiple times (see n5 in the example below). For such observations, I did not get what I wanted.

brand	team	loc_id	loc_edu
A	team1	n1	13.1
B	team1	n1	13.1
A	team1	n2	9.7
B	team1	n2	9.7
X	team2	n1	13.1
Y	team2	n1	13.1
X	team2	n3	5
B	team1	n4	6.2
Y	team2	n4	6.2
A	team1	n5	24
A	team1	n5	24
A	team1	n5	24
B	team1	n5	24
B	team1	n5	24

In this example, for loc_id=n5, the choice is set to 0(see the output below):

loc_id choiceteam1 choiceteam2 loc_edu
n1 1 1 13.1
n2 1 0 9.7
n3 0 2 5
n4 3 3 6.2
n5 0 0 24

But I still want choiceteam1=1 for n5 as both A and B are purchased. How could I fix this? Thanks!

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35661
#5

02 Dec 2022, 06:33

As in #2, the duplicates command can be used to drop unwanted duplicates.
1 like
Comment
John Park

Join Date: Dec 2022

Posts: 11
#6

20 Dec 2022, 21:13

Originally posted by Nick Cox View Post

As in #2, the duplicates command can be used to drop unwanted duplicates.

Thanks Nick! This is very helpful!
Comment

Announcement