Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survey reshape with multiple treatments over multiple questions

    Hello friends,

    I have a dataset, example version included below, with information about the treatment (Version A - Version E; breakfasttreat_1, lunchtreat_2, etc) and participant assessment (0-100; breakfast_1, lunch_2, etc) for six questions provided to participants. I would like to reshape to long so that there are six rows for each participant with a column for meal (breakfast or lunch), a column for treatment (Version A - Version E), and a column for assessment (0-100).

    My brain is stuck on the idea that I need to run reshape twice, once to generate the information about question (1-3) and then again to generate the information about meal (breakfast or lunch) but that doesn't work because pid no longer uniquely identifies data.

    Do you have any ideas about how else I can approach this?

    Thank you!
    NB: I'm running Stata 16

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 pid int(breakfasttreat_1breakfasttreat_2 breakfasttreat_3 lunchtreat_1 lunchtreat_2 lunchtreat_3 Female) byte(breakfast_1 breakfast_2 breakfast_3 lunch_1 lunch_2 lunch_3)
    "1"  1 5 1 5 3 2 0   1  99  99  99 99   0
    "2"  2 4 1 5 2 1 0   0  21 100 100 39  99
    "3"  3 3 1 5 1 3 0  50  22  53  48 50  52
    "4"  4 2 1 5 5 4 1  36  35  35   4  1  45
    "5"  5 1 2 4 4 5 1  56   9  36   8 61  19
    "6"  1 1 2 4 3 3 0  15  80  50  80 80   0
    "7"  2 2 2 4 2 2 1  80  31  18   2  2   5
    "8"  3 3 2 4 1 1 0 100 100  50  80  5 100
    "9"  4 4 3 3 5 2 1  41  42  33  45 49  57
    "10" 5 5 3 3 4 5 0  70  71  52   0 20  30
    "11" 1 5 3 3 3 4 1  31  12  73  57 39  80
    "12" 2 4 3 3 2 3 0  53  52  59  22 71  49
    "13" 3 3 4 2 1 5 1 100 100 100  50 60   0
    "14" 4 2 4 2 5 3 1   1  51  57  99  2  99
    "15" 5 1 4 2 4 2 1  55  95  99   2 39   6
    "16" 1 1 4 2 3 1 0  98  41  92   4 76 100
    "17" 2 2 5 1 2 4 1  23   0   2 100  7  99
    "18" 3 3 5 1 1 2 0  24  18  63  18 19  11
    "19" 4 4 5 1 5 1 1  11  87  82  93 14  85
    "20" 5 5 5 1 4 5 1  91  99   4  91 60  18
    end
    label values breakfast_1_treat Treatment
    label values breakfast_2_treat Treatment
    label values breakfast_3_treat Treatment
    label values lunch_1_treat Treatment
    label values lunch_2_treat Treatment
    label values lunch_3_treat Treatment
    label def Treatment 1 "Treatment A", modify
    label def Treatment 2 "Treatment B", modify
    label def Treatment 3 "Treatment C", modify
    label def Treatment 4 "Treatment D", modify
    label def Treatment 5 "Treatment E", modify

  • #2
    There is actually a way to do this in a single -reshape-, but it involves a lot of complicated renaming of the variables first. So I think, since it's likely your data set does not contain millions of observations, it's better to do it in two stages. True, pid is no longer a unique identifier after the first reshape, but the combination of that plus the variable created from the -j()- option is.

    Code:
    reshape long breakfasttreat_ lunchtreat_ breakfast_ lunch_, i(pid) j(seq)
    rename breakfast_ evaluationbreakfast
    rename lunch_ evaluationlunch
    reshape long @treat_ evaluation, i(pid seq) j(meal) string
    
    replace meal = meal + "_" + string(seq, "%1.0f")
    drop seq
    rename treat treatment
    By the way, something is wrong with your -dataex- output. The variables in the -label values- commands do not exist, although their names are rearrangements of the "morphemes" in the names of the actual variables. It is hard for me to believe that -dataex- actually created this output from a real data set. Did you do some editing of the -dataex- output or piece together -dataex- outputs from different datasets?

    Comment

    Working...
    X