Survey reshape with multiple treatments over multiple questions

Eva Warren

Join Date: Feb 2022
Posts: 6

Survey reshape with multiple treatments over multiple questions

29 Sep 2022, 15:01

Hello friends,

I have a dataset, example version included below, with information about the treatment (Version A - Version E; breakfasttreat_1, lunchtreat_2, etc) and participant assessment (0-100; breakfast_1, lunch_2, etc) for six questions provided to participants. I would like to reshape to long so that there are six rows for each participant with a column for meal (breakfast or lunch), a column for treatment (Version A - Version E), and a column for assessment (0-100).

My brain is stuck on the idea that I need to run reshape twice, once to generate the information about question (1-3) and then again to generate the information about meal (breakfast or lunch) but that doesn't work because pid no longer uniquely identifies data.

Do you have any ideas about how else I can approach this?

Thank you!
NB: I'm running Stata 16

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 pid int(breakfasttreat_1breakfasttreat_2 breakfasttreat_3 lunchtreat_1 lunchtreat_2 lunchtreat_3 Female) byte(breakfast_1 breakfast_2 breakfast_3 lunch_1 lunch_2 lunch_3)
"1"  1 5 1 5 3 2 0   1  99  99  99 99   0
"2"  2 4 1 5 2 1 0   0  21 100 100 39  99
"3"  3 3 1 5 1 3 0  50  22  53  48 50  52
"4"  4 2 1 5 5 4 1  36  35  35   4  1  45
"5"  5 1 2 4 4 5 1  56   9  36   8 61  19
"6"  1 1 2 4 3 3 0  15  80  50  80 80   0
"7"  2 2 2 4 2 2 1  80  31  18   2  2   5
"8"  3 3 2 4 1 1 0 100 100  50  80  5 100
"9"  4 4 3 3 5 2 1  41  42  33  45 49  57
"10" 5 5 3 3 4 5 0  70  71  52   0 20  30
"11" 1 5 3 3 3 4 1  31  12  73  57 39  80
"12" 2 4 3 3 2 3 0  53  52  59  22 71  49
"13" 3 3 4 2 1 5 1 100 100 100  50 60   0
"14" 4 2 4 2 5 3 1   1  51  57  99  2  99
"15" 5 1 4 2 4 2 1  55  95  99   2 39   6
"16" 1 1 4 2 3 1 0  98  41  92   4 76 100
"17" 2 2 5 1 2 4 1  23   0   2 100  7  99
"18" 3 3 5 1 1 2 0  24  18  63  18 19  11
"19" 4 4 5 1 5 1 1  11  87  82  93 14  85
"20" 5 5 5 1 4 5 1  91  99   4  91 60  18
end
label values breakfast_1_treat Treatment
label values breakfast_2_treat Treatment
label values breakfast_3_treat Treatment
label values lunch_1_treat Treatment
label values lunch_2_treat Treatment
label values lunch_3_treat Treatment
label def Treatment 1 "Treatment A", modify
label def Treatment 2 "Treatment B", modify
label def Treatment 3 "Treatment C", modify
label def Treatment 4 "Treatment D", modify
label def Treatment 5 "Treatment E", modify

Tags: reshape, reshape wide long, vignette data

Clyde Schechter

Join Date: Apr 2014

Posts: 30354
#2

29 Sep 2022, 15:30

There is actually a way to do this in a single -reshape-, but it involves a lot of complicated renaming of the variables first. So I think, since it's likely your data set does not contain millions of observations, it's better to do it in two stages. True, pid is no longer a unique identifier after the first reshape, but the combination of that plus the variable created from the -j()- option is.

Code:

reshape long breakfasttreat_ lunchtreat_ breakfast_ lunch_, i(pid) j(seq) rename breakfast_ evaluationbreakfast rename lunch_ evaluationlunch reshape long @treat_ evaluation, i(pid seq) j(meal) string replace meal = meal + "_" + string(seq, "%1.0f") drop seq rename treat treatment

By the way, something is wrong with your -dataex- output. The variables in the -label values- commands do not exist, although their names are rearrangements of the "morphemes" in the names of the actual variables. It is hard for me to believe that -dataex- actually created this output from a real data set. Did you do some editing of the -dataex- output or piece together -dataex- outputs from different datasets?
Comment

Announcement

Survey reshape with multiple treatments over multiple questions

Comment