Structure data for repeated measures analysis

Heather Dolphin

Join Date: Nov 2020

Posts: 18
#1

Structure data for repeated measures analysis

07 Nov 2021, 12:25

Hello, Can someone please help me organize this data for analysis of repeated measures? I have 7 datasets I have merged together that are survey data among 2000 participants in 7 waves of data collection. I want to see it in long format and while I have the "i" (beneficiary ID) I don't have a "j" or "time" variable. in wide format, each beneficiary has separate columns labeled by which survey the data came from. "BLdaily wage" is baseline daily wage, "exit daily wage" month6 daily wage, month 12, month 18 and month 24 etc. Since I don't have a single variable to denote the time for the survey wave, I went back to each individual database and added a variable to denote which survey it was (1=baseline, 2=exit survey, 3=6 mo follow up etc) but when I merged them, since they all participated in the baseline, that variable became 1 (for baseline) for every participant-- what would you suggest for how I can ask stata to organize the rows for each beneficiary by survey (1-7)?
Thank you
Heather
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

07 Nov 2021, 12:34

You need to use the -reshape- command. It would be more helpful if you had posted example data with the -dataex- command instead of a description of your data, so we could see exactly how the variables are named. After all, what you describe, things like "exit daily wage," are simply not possible variable names in Stata. Let me assume that the variable names are actually BL_daily_wage, exit_dailywage, month6_dailywage, month12_dailywage, month18_dailywage, and month24_dailywage. And let me assume the ID variable is called beneficiary_ID. The key is to identify the common part (known as the "stub"), namely, in this case, _daily_wage. Then you can get the long layout with:

Code:

reshape long @_daily_wage, i(beneficiary_ID) j(time) string

Now, you don't do this separately for each series of variables. Rather you examine your data set first and identify all of the different stubs and list all of them in the part of the command where I have shown _daily_wage above, each one with @ in the spot where BL, exit, month6, etc. are found in the variable name.

You will probably want to then convert your time variable into something numeric. The -encode- command will do that for you. (See -help encode- for details if you are not familiar with it.) Before you -encode-, you may want to define your own value label so that the numerical sequence of the encoded variable is in chronological order. (See -help label define- if you are not familiar with it.)

In the future, when asking for help with code, show data examples, and please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

By the way, your mistake was in -merge-ing these files. You should have added a date variable to each one, and then combined them with -append-. That is the better way to combine separate files into longitudinal data sets in long layout.
Comment
Heather Dolphin

Join Date: Nov 2020

Posts: 18
#3

07 Nov 2021, 13:39

oh thank you--it's now in long form!
Yes I saw the instructions about dataex but the HOW to do it was not clear to me. I've since found a You-Tube video in Spanish that is more helpful than the dataex instructions in English! I will apply it next time.
Thank you again for your guidance even tho you didn't have the code--it's resolved!!
Comment
Heather Dolphin

Join Date: Nov 2020

Posts: 18
#4

08 Nov 2021, 13:40

Hi Clyde,

I am trying to follow your instructions to define the value label to get the numerical sequence in chronological order but I am getting an invalid syntax error.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str15 id_beneficiario float(original exit_school exit_work exit_employed mo6_school mo6_work mo6_employed mo12_school mo12_work mo12_employed) "Hhr-005e9e8-13b" 1 . . . 0 0 . . . . "Hhr-00b69bf-667" 2 . . . . . . . . . "Hhr-00d90fa-b13" 2 . . . . . . . . . "Hhr-00d90fa-e7a" 2 . . . . . . . . . "Hhr-00da329-1ae" 3 . . . 0 0 . 0 0 . "Hhr-0115e72-290" 3 0 0 . . . . . . . "Hhr-013440f-718" 3 0 0 . . . . . . . "Hhr-0147fdd-fc0" 2 . . . . . . . . . "Hhr-0149bcf-f70" 3 0 0 . . . . 0 0 . "Hhr-015c441-dd8" 3 1 0 . . . . 1 0 . end label values exit_school nosi label values exit_work nosi label values exit_employed nosi label values mo6_school nosi label values mo6_work nosi label values mo6_employed nosi label values mo12_school nosi label values mo12_work nosi label values mo12_employed nosi label def nosi 0 "No", modify label def nosi 1 "Yes", modify

Here is the rest of the code that gives me the syntax error:

Code:

reshape long @_school _work _employed, i(id_beneficiario) j(time) string label variable time "survey wave" describe label define time1 exit "1" mo6 "2" mo12 "3" label values time time1

Secondly, when I carry out the long format command, I don't understand why stata is telling me about all these missing variables which are not variables. They look like variable names that have been mixed-- have I used the wrong code? Please see below: (Kindly note the values label "nosi" refer to no/yes in Spanish)

reshape long @_school _work _employed, i(id_beneficiario) j(time) string
(note: j = exit final lb mo12 mo18 mo24 mo6)
(note: _workexit not found)
(note: _employedexit not found)
(note: _workfinal not found)
(note: _employedfinal not found)
(note: _worklb not found)
(note: _employedlb not found)
(note: _workmo12 not found)
(note: _employedmo12 not found)
(note: _workmo18 not found)
(note: _employedmo18 not found)
(note: _workmo24 not found)
(note: _employedmo24 not found)
(note: _workmo6 not found)
(note: _employedmo6 not found)

Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 2315 -> 16205
Number of variables 155 -> 152
j variable (7 values) -> time
xij variables:
exit_school final_school ... mo6_school -> _school
_workexit _workfinal ... _workmo6 -> _work
_employedexit _employedfinal ... _employedmo6->_employed
-----------------------------------------------------------------------------

Thank you!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#5

08 Nov 2021, 13:54

Your syntax in the -label define- command is backwards: you need to have the numbers precede the text labels, and the text labels should be in quotes, but the numbers should not. After that, you cannot apply value labels to a string variable. You want to do, in effect, the reverse of that. That is what -encode- does:

Code:

label define time 1 "exit" 2 "mo6" 3 "mo12" encode time, gen(time1) label(time) drop time rename time1 time

I don't understand why stata is telling me about all these missing variables which are not variables.

Because you forgot the @'s for those variables. It should be:

Code:

reshape long @_school @_work @_employed, i(id_beneficiario) j(time) string
Comment

Announcement

Structure data for repeated measures analysis

Comment

Comment

Comment

Comment