Generating a variable to account for response patterns by ID and wave

Jonas Jakobi

Join Date: Sep 2018

Posts: 19
#1

Generating a variable to account for response patterns by ID and wave

04 Sep 2018, 10:49

Dear Statalist members,

I could really need your help with a problem that drives me mad for a while. I am certain that this is an easy problem but I can't seem to find anything on the forum. Personally I consider this a rather complex issue and hope that someone could help me find a solution for my problem. Or a link where I myself can find the answer.

Anyway! Bellow I provide you a data example taken from my actual dataset. If it matters for the solution, I'm using Stata 14.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long id byte(wave q1 q2 q3 q4 q5 q6) 4010908 1 .m .m .m .m .m .m 4010908 2 .m .m .m .m .m .m 4010908 3 5 1 1 1 5 4 4010908 4 . . . . . . 4010908 5 .m .m .m .m .m .m 4010908 6 . . . . . . 4010908 7 5 1 1 1 5 5 4010908 8 4 2 1 1 3 4 4010908 9 . . . . . . 4010909 1 .m .m .m .m .m .m 4010909 2 .m .m .m .m .m .m 4010909 3 . . . . . . 4010909 4 . . . . . . 4010909 5 . . . . . . 4010909 6 . . . . . . 4010909 7 . . . . . . 4010909 8 . . . . . . 4010909 9 . . . . . . 4010910 1 .m .m .m .m .m .m 4010910 2 .m .m .m .m .m .m 4010910 3 3 3 1 1 4 3 4010910 4 . . . . . . 4010910 5 .m .m .m .m .m .m 4010910 6 . . . . . . 4010910 7 4 3 2 2 4 3 4010910 8 5 3 2 2 4 4 4010910 9 . . . . . . end label values wave en2574 label def en2574 1 "Fall 2010", modify label def en2574 2 "Summer 2011", modify label def en2574 3 "2011/2012", modify label def en2574 4 "Spring 2012", modify label def en2574 5 "2012/2013", modify label def en2574 6 "Spring 2013", modify label def en2574 7 "2013/2014", modify label def en2574 8 "2014/2015", modify label def en2574 9 "2015/2016", modify label values q1 en511 label values q2 en511 label values q3 en511 label values q4 en511 label values q5 en511 label values q6 en511 label def en511 3 "half and half", modify label def en511 4 "rather agree", modify label def en511 5 "completely agree", modify label def en511 .m "Missing by design", modify label def en511 1 "completely disagree", modify label def en511 2 "rather disagree", modify

What I'm trying to achieve is the following:
I'd like to create a variable x that counts in how many waves each individual has answered all six questions (q1-q6). For example, some individuals have answered the questions only in wave 3 but not in wave 7, 8 and 9. Others, however, might have answered the questions at all four time points. The variable x I'd like to create should tell me that person 1 has answered all 6 questions 3 times, meaning in 3 waves. The value for the second person would be 0 because he or she did not answer the six questions in any wave. For the third person the value of x should be 3 again because he, too, has given full responses in 3 waves.

Furthermore, I'd like to know at which time point a person has given his or her responses. It makes a difference in my analysis if a person answered questions q1-q6 in wave 3 and 7 or in wave 8 and 9.

Any help on your side would be highly appreciated. Or a link to a thread I did no see while searching the forum history. Since I have trouble myself naming this thread probably I have not seen another thread with a similar question.

Many thanks in advance,
Jonas
Tags: Suggestion, syntax
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

04 Sep 2018, 11:11

I think this will resolve your first question:

Code:

egen all_questions_this_wave = rowmiss(q1-q6) replace all_questions_this_wave = !all_questions_this_wave by id, sort: egen num_waves_all_questions = total(all_questions_this_wave)

As for your second one, I don't understand what you are looking for here. Perhaps you could illustrate it with an example of what the results would look like for your example data.
Comment
Jonas Jakobi

Join Date: Sep 2018

Posts: 19
#3

05 Sep 2018, 00:31

Dear Clyde,

First of all, thank you for you help. The Stata command you provided worked wonderfully. Now I know for each individual in how many waves he or she answered the full set of questions.

The reason why I was trying to find out at which time point a person answered the full set of questions is the following: in my study pupils were asked about their attitudes towards education at four different time points – wave 3, 7,8 and 9. Consequently, students who went to school longer are more likely to answer the questions in all four waves. On the other hand, pupils who dropped out from school earlier might answer only in wave 3 and 7. In addition, panel attrition is a huge issue in my dataset. While roughly 11,000 pupils answered the questions in wave 3 only around 5,000 respondents are left in wave 7. It continues to decline to ~ 300 cases in wave 9.

Now, coming to the precise reason why I thought it useful to have a variable that indicates in which wave a person started to answer the question: for doing SEM I intend to reshape the dataset from long to wide and then use the new variable to identify the time at which a person started to answer the question. I thought this might be a starting point to implement lagged variables in SEM. Anyway. I might be totally wrong with my ideas. If you have some time, what’s your thought on my approach?

At the end I came up with the following solution to create a dummy variables that indicates the starting point for each individual.

Code:

local num 3 7 8 9 foreach x in `num' { bysort id : gen h`x' = 1 if wave == `x' & q1 != . & q2 != . & /// q3 != . & q4 != . & q5 != . & q6 != . }

Later on in my analysis I can then use something like the command bellow to define my sample of analysis.

Code:

do something if all_questions_this_wave == 2 & h3 == 1

Again, thank you for your help Clyde. I think I have to reconsider my approach again.

Jonas
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

05 Sep 2018, 10:35

So to generate a variable that shows in which wave a given id responded to the survey, and then generated indicators ("dummies") for that wave being 3, 7, 8, and 9, respectively, you can do this:

Code:

egen any_question_this_wave = rownonmiss(q1-q6) replace any_question_this_wave = any_question_this_wave != 0 by id, sort: egen first_wave_responded = min(cond(any_question_this_wave, wave, .)) foreach n of numlist 3 7 8 9 { gen byte first_responded_wave_`n' = `n'.first_wave_responded }

Note: I have defined response to the wave as having a non-missing response for any of the items q1-q6. If that is not what you intend, then the first two lines will have to be modified to accommodate your definition of what constitutes a response to the wave.

This will leave your data in a hybrid layout that is partly long and partly wide. On the other hand, if your intent is to use these indicator variables for when they first began to respond as predictors in a model, I suppose this layout would make sense.
Comment

Announcement

Generating a variable to account for response patterns by ID and wave

Comment

Comment

Comment