Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a wave identifier - observations wrongly mapped

    Hello,

    I am quite new to Stata and this is my first post and I would really appreciate any help. I am working on my master's thesis and I am doing some empirical analysis on 5 waves of the SHARE panel data sets. Hence, I had to append the 5 datasets and also merge the different modules of each wave. I merged before appending. To do this, I used the unique identifier of each observation (mergeid). After appending, I also created a wave identifier so I can uniquely identify each wave's observations in the combined dataset.

    After creating the wave identifier (without syntax errors), I realized that some variables that were not present in the first and second waves in the original datasets had some observations in the combined dataset. Please see my code below.


    sort mergeid , stable
    by mergeid: gen wave = 1 if _n==1

    bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 4 if _n==3 & firstwave==1 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 5 if _n==4 & firstwave==1 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==5 & firstwave==1 & hhid4~="" & hhid5~=""

    bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 4 if _n==2 & firstwave==2 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 5 if _n==3 & firstwave==2 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==4 & firstwave==2 & hhid4~="" & hhid5~=""

    bysort mergeid: replace wave = 4 if _n==1 & firstwave==3 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 5 if _n==2 & firstwave==3 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==3 & firstwave==3 & hhid4~="" & hhid5~=""

    bysort mergeid: replace wave = 4 if _n==1 & firstwave==4 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 5 if _n==2 & firstwave==4 & hhid4~="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==3 & firstwave==4 & hhid4~="" & hhid5~=""

    bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 5 if _n==3 & firstwave==1 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==4 & firstwave==1 & hhid4=="" & hhid5~=""

    bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 5 if _n==2 & firstwave==2 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==3 & firstwave==2 & hhid4=="" & hhid5~=""

    bysort mergeid: replace wave = 5 if _n==1 & firstwave==3 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==2 & firstwave==3 & hhid4=="" & hhid5~=""

    bysort mergeid: replace wave = 5 if _n==1 & firstwave==4 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==2 & firstwave==4 & hhid4=="" & hhid5~=""

    bysort mergeid: replace wave = 5 if _n==1 & firstwave==5 & hhid4=="" & hhid5~=""
    bysort mergeid: replace wave = 6 if _n==2 & firstwave==5 & hhid4=="" & hhid5~=""

    bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4~="" & hhid5==""
    bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4~="" & hhid5==""
    bysort mergeid: replace wave = 4 if _n==3 & firstwave==1 & hhid4~="" & hhid5==""
    bysort mergeid: replace wave = 6 if _n==4 & firstwave==1 & hhid4~="" & hhid5==""

    bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4~="" & hhid5==""
    bysort mergeid: replace wave = 4 if _n==2 & firstwave==2 & hhid4~="" & hhid5==""
    bysort mergeid: replace wave = 6 if _n==3 & firstwave==2 & hhid4~="" & hhid5==""

    bysort mergeid: replace wave = 4 if _n==1 & firstwave==3 & hhid4~="" & hhid5==""
    bysort mergeid: replace wave = 6 if _n==2 & firstwave==3 & hhid4~="" & hhid5==""

    bysort mergeid: replace wave = 4 if _n==1 & firstwave==4 & hhid4~="" & hhid5==""
    bysort mergeid: replace wave = 6 if _n==2 & firstwave==4 & hhid4~="" & hhid5==""

    bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4=="" & hhid5==""
    bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4=="" & hhid5==""
    bysort mergeid: replace wave = 6 if _n==3 & firstwave==1 & hhid4=="" & hhid5==""

    bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4=="" & hhid5==""
    bysort mergeid: replace wave = 6 if _n==2 & firstwave==2 & hhid4=="" & hhid5==""

    bysort mergeid: replace wave = 6 if _n==1 & firstwave==3 & hhid4=="" & hhid5==""

    bysort mergeid: replace wave = 6 if _n==1 & firstwave==6 & hhid4=="" & hhid5==""

    mergeid is the unique identifier for each observation, hhid is the unique identifier for each household present in each wave, firstwave is the first wave in which the respondent appeared.

    Please what could be the possible issue with my logic or code? Thank you very much.


  • #2
    I cannot quite figure out what the code you shared is attempting to do, but since you write

    After appending, I also created a wave identifier ...
    I will say that you should have created the wave identifier in each of the merged datasets before appending. In the data for the first wave
    Code:
    generate wave = 1
    and so on. So go back to your 5 merged datasets and do this, and then append the datasets again.

    Comment


    • #3
      Thank you very much William, this was spot on. My mistake was that I tried to come up with a "complicated" logic to create the wave identifier after appending when i should have simply generated it in each wave.

      I have corrected the code and it works perfectly now.

      Comment

      Working...
      X