Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a variable to account for response patterns by ID and wave

    Dear Statalist members,

    I could really need your help with a problem that drives me mad for a while. I am certain that this is an easy problem but I can't seem to find anything on the forum. Personally I consider this a rather complex issue and hope that someone could help me find a solution for my problem. Or a link where I myself can find the answer.

    Anyway! Bellow I provide you a data example taken from my actual dataset. If it matters for the solution, I'm using Stata 14.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id byte(wave q1 q2 q3 q4 q5 q6)
    4010908 1 .m .m .m .m .m .m
    4010908 2 .m .m .m .m .m .m
    4010908 3  5  1  1  1  5  4
    4010908 4  .  .  .  .  .  .
    4010908 5 .m .m .m .m .m .m
    4010908 6  .  .  .  .  .  .
    4010908 7  5  1  1  1  5  5
    4010908 8  4  2  1  1  3  4
    4010908 9  .  .  .  .  .  .
    4010909 1 .m .m .m .m .m .m
    4010909 2 .m .m .m .m .m .m
    4010909 3  .  .  .  .  .  .
    4010909 4  .  .  .  .  .  .
    4010909 5  .  .  .  .  .  .
    4010909 6  .  .  .  .  .  .
    4010909 7  .  .  .  .  .  .
    4010909 8  .  .  .  .  .  .
    4010909 9  .  .  .  .  .  .
    4010910 1 .m .m .m .m .m .m
    4010910 2 .m .m .m .m .m .m
    4010910 3  3  3  1  1  4  3
    4010910 4  .  .  .  .  .  .
    4010910 5 .m .m .m .m .m .m
    4010910 6  .  .  .  .  .  .
    4010910 7  4  3  2  2  4  3
    4010910 8  5  3  2  2  4  4
    4010910 9  .  .  .  .  .  .
    end
    label values wave en2574
    label def en2574 1 "Fall 2010", modify
    label def en2574 2 "Summer 2011", modify
    label def en2574 3 "2011/2012", modify
    label def en2574 4 "Spring 2012", modify
    label def en2574 5 "2012/2013", modify
    label def en2574 6 "Spring 2013", modify
    label def en2574 7 "2013/2014", modify
    label def en2574 8 "2014/2015", modify
    label def en2574 9 "2015/2016", modify
    label values q1 en511
    label values q2 en511
    label values q3 en511
    label values q4 en511
    label values q5 en511
    label values q6 en511
    label def en511 3 "half and half", modify
    label def en511 4 "rather agree", modify
    label def en511 5 "completely agree", modify
    label def en511 .m "Missing by design", modify
    label def en511 1 "completely disagree", modify
    label def en511 2 "rather disagree", modify
    What I'm trying to achieve is the following:
    1. I'd like to create a variable x that counts in how many waves each individual has answered all six questions (q1-q6). For example, some individuals have answered the questions only in wave 3 but not in wave 7, 8 and 9. Others, however, might have answered the questions at all four time points. The variable x I'd like to create should tell me that person 1 has answered all 6 questions 3 times, meaning in 3 waves. The value for the second person would be 0 because he or she did not answer the six questions in any wave. For the third person the value of x should be 3 again because he, too, has given full responses in 3 waves.
    2. Furthermore, I'd like to know at which time point a person has given his or her responses. It makes a difference in my analysis if a person answered questions q1-q6 in wave 3 and 7 or in wave 8 and 9.
    Any help on your side would be highly appreciated. Or a link to a thread I did no see while searching the forum history. Since I have trouble myself naming this thread probably I have not seen another thread with a similar question.

    Many thanks in advance,
    Jonas

  • #2
    I think this will resolve your first question:

    Code:
    egen all_questions_this_wave = rowmiss(q1-q6)
    replace all_questions_this_wave = !all_questions_this_wave
    by id, sort: egen num_waves_all_questions = total(all_questions_this_wave)
    As for your second one, I don't understand what you are looking for here. Perhaps you could illustrate it with an example of what the results would look like for your example data.

    Comment


    • #3
      Dear Clyde,

      First of all, thank you for you help. The Stata command you provided worked wonderfully. Now I know for each individual in how many waves he or she answered the full set of questions.

      The reason why I was trying to find out at which time point a person answered the full set of questions is the following: in my study pupils were asked about their attitudes towards education at four different time points – wave 3, 7,8 and 9. Consequently, students who went to school longer are more likely to answer the questions in all four waves. On the other hand, pupils who dropped out from school earlier might answer only in wave 3 and 7. In addition, panel attrition is a huge issue in my dataset. While roughly 11,000 pupils answered the questions in wave 3 only around 5,000 respondents are left in wave 7. It continues to decline to ~ 300 cases in wave 9.

      Now, coming to the precise reason why I thought it useful to have a variable that indicates in which wave a person started to answer the question: for doing SEM I intend to reshape the dataset from long to wide and then use the new variable to identify the time at which a person started to answer the question. I thought this might be a starting point to implement lagged variables in SEM. Anyway. I might be totally wrong with my ideas. If you have some time, what’s your thought on my approach?

      At the end I came up with the following solution to create a dummy variables that indicates the starting point for each individual.

      Code:
      local num 3 7 8 9
      foreach x in `num' {
      bysort id : gen h`x' = 1 if wave == `x' & q1 != . & q2 != . & /// 
          q3 != . & q4 != . & q5 != . & q6 != .
      }
      Later on in my analysis I can then use something like the command bellow to define my sample of analysis.

      Code:
      do something if all_questions_this_wave == 2 & h3 == 1
      Again, thank you for your help Clyde. I think I have to reconsider my approach again.

      Jonas



      Comment


      • #4
        So to generate a variable that shows in which wave a given id responded to the survey, and then generated indicators ("dummies") for that wave being 3, 7, 8, and 9, respectively, you can do this:

        Code:
        egen any_question_this_wave = rownonmiss(q1-q6)
        replace any_question_this_wave = any_question_this_wave != 0
        by id, sort: egen first_wave_responded = min(cond(any_question_this_wave, wave, .))
        foreach n of numlist 3 7 8 9 {
            gen byte first_responded_wave_`n' = `n'.first_wave_responded
        }
        Note: I have defined response to the wave as having a non-missing response for any of the items q1-q6. If that is not what you intend, then the first two lines will have to be modified to accommodate your definition of what constitutes a response to the wave.

        This will leave your data in a hybrid layout that is partly long and partly wide. On the other hand, if your intent is to use these indicator variables for when they first began to respond as predictors in a model, I suppose this layout would make sense.

        Comment

        Working...
        X