Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • capturing responses from the same individuals across waves

    Hi all,

    I would like to understand how 'dfruit_main' & 'dvege_main' (fruit & veg consumption) changed over time. These variables were only captured in particular waves in the survey and I would like to analyse responses in Waves 9,15 and 18.

    How do I ensure that I am following the same individuals in these waves?

    This is what I tried but I'm losing a lot of observations and the sample sizes are not constant across waves. It may not be surprising to lose some observations as Waves 15 & 18 were fielded during the pandemic.

    gen surveyp = .
    replace surveyp = 1 if wave == 9
    replace surveyp = 2 if wave == 15
    replace surveyp = 3 if wave == 18
    tab surveyp

    followed by

    egen present = total(surveyp), by(pidp)
    keep if present==3

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pidp float(surveyp dfruit_main dvege_main) byte wave
     22445 . . .  6
     22445 . 3 4  5
     22445 . . .  4
     22445 . 3 4  7
     22445 . . .  8
     22445 . 4 4 11
     22445 1 4 4  9
     22445 . . . 10
     29925 . . .  6
     29925 . . . 10
     29925 . . .  8
     29925 . 2 3 11
     29925 1 2 3  9
     29925 . . .  4
     29925 . 2 3  7
     76165 2 4 1 15
     76165 . . . 12
     76165 . . . 16
     76165 . . .  8
     76165 . . . 20
     76165 . 3 2 11
     76165 3 3 2 18
     76165 1 4 3  9
     76165 . 4 3  7
     76165 . . . 19
     76165 . . . 17
     76165 . . . 14
     76165 . . . 10
     76165 . . . 13
    223725 . . .  7
    223725 . . .  8
    280165 . . .  8
    280165 . 4 4  2
    280165 . . . 14
    280165 . . . 16
    280165 . 4 4  7
    280165 . . .  3
    280165 . 4 4  5
    280165 . 4 4 11
    280165 . . . 12
    280165 . . .  4
    280165 . . .  6
    280165 2 4 4 15
    280165 . . . 10
    280165 . . . 13
    280165 1 3 4  9
    333205 . 2 4 11
    333205 . . . 10
    333205 . . .  8
    333205 . 4 4  7
    333205 . . .  6
    333205 1 3 4  9
    387605 . 4 4  5
    387605 . 3 2  7
    387605 . . .  6
    387605 . . .  4
    469205 . . . 10
    469205 1 2 2  9
    469205 . . . 20
    469205 . 2 2 11
    469205 . . . 12
    469205 . . . 16
    541285 . . .  6
    541285 . . .  3
    541285 . . .  4
    541285 . . .  5
    541965 . . .  3
    599765 . . . 20
    599765 . . . 13
    599765 . 4 4  5
    599765 1 4 4  9
    599765 . . . 14
    599765 . . . 12
    599765 . . .  4
    599765 . 4 4 11
    599765 . . . 10
    665045 . . .  8
    665045 . . .  4
    665045 . 2 1 11
    665045 . . .  6
    665045 . . .  3
    665045 . 2 2  5
    665045 . . . 10
    732365 . . . 16
    732365 3 1 2 18
    732365 . 2 1 11
    732365 . . . 20
    732365 1 2 3  9
    732365 . . . 14
    732365 . . . 10
    732365 . . . 12
    732365 . . . 19
    732365 . . . 17
    732365 . . . 13
    760925 . . . 11
    760925 . . . 10
    813285 . . .  7
    813285 . . .  4
    813285 . . .  6
    813285 . . .  8
    end
    Thanks in advance for your help with this.

    Many thanks
    Karen

  • #2
    Karen, were you attempting to create a balanced panel data with only waves 9, 15, and 18? If yes, then,

    Code:
    keep if inlist(wave,9,15,18)
    bys pidp: keep if _N == 3
    In your data example, only "76165" is kept.

    Comment


    • #3
      That's great, thanks so much for your help Fei Wang. In the main dataset I'm using, I have about ~10,000 observations per wave after keeping individuals who appear in all 3 waves.

      Comment


      • #4
        Can I ask a follow up question, how can I ensure that the sample size is constant across waves for the outcome variables after using

        keep if inlist(wave,9,15,18)
        bys pidp: keep if _N == 3

        In the example below, I would like to take the smallest sample size available in Wave 18 to get the tab results.

        bysort wave: tab dfruit_main

        > wave = 9

        dfruit_main | Freq. Percent Cum.
        ------------+-----------------------------------
        1 | 567 5.47 5.47
        2 | 2,592 24.99 30.45
        3 | 2,499 24.09 54.54
        4 | 4,716 45.46 100.00
        ------------+-----------------------------------
        Total | 10,374 100.00

        ---------------------------------------------------------------------------------------------------------------------------------------------------------------
        -> wave = 15

        dfruit_main | Freq. Percent Cum.
        ------------+-----------------------------------
        1 | 562 5.45 5.45
        2 | 2,726 26.44 31.89
        3 | 2,133 20.68 52.57
        4 | 4,891 47.43 100.00
        ------------+-----------------------------------
        Total | 10,312 100.00

        ---------------------------------------------------------------------------------------------------------------------------------------------------------------
        -> wave = 18

        dfruit_main | Freq. Percent Cum.
        ------------+-----------------------------------
        1 | 592 5.83 5.83
        2 | 2,743 27.01 32.84
        3 | 2,206 21.72 54.56
        4 | 4,614 45.44 100.00
        ------------+-----------------------------------
        Total | 10,155 100.00


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long pidp byte wave float(dfruit_main dvege_main)
           76165 15 4 1
           76165  9 4 3
           76165 18 3 2
         1587125 15 2 2
         1587125  9 3 3
         1587125 18 3 3
         4849085 18 4 4
         4849085 15 3 4
         4849085  9 2 2
        68002725 18 2 2
        68002725  9 2 3
        68002725 15 2 2
        68008847  9 3 2
        68008847 15 3 3
        68008847 18 2 2
        68010887 18 2 3
        68010887 15 4 4
        68010887  9 3 4
        68029931 15 2 2
        68029931 18 2 2
        68029931  9 2 2
        68031967  9 2 3
        68031967 15 2 2
        68031967 18 3 2
        68035365 15 4 2
        68035365 18 2 2
        68035365  9 4 3
        68035367 18 3 3
        68035367 15 3 3
        68035367  9 4 3
        68041487 18 3 3
        68041487  9 4 3
        68041487 15 3 3
        68045567 15 4 4
        68045567 18 4 4
        68045567  9 4 4
        68051007 18 2 2
        68051007  9 2 3
        68051007 15 2 2
        68051011 18 4 4
        68051011  9 4 4
        68051011 15 4 4
        68058487  9 4 4
        68058487 18 4 4
        68058487 15 4 4
        68058491  9 4 4
        68058491 15 4 4
        68058491 18 4 3
        68060531 15 2 3
        68060531  9 2 3
        end
        Many thanks
        Karen

        Comment


        • #5
          I don't understand #4. But harking back to #1: the total for any identifier if all 3 waves are present would be 6 = 1 + 2 + 3 not 3. That's why it didn't work as wanted.

          Comment


          • #6
            Yikes, can't believe I missed that, thanks for pointing that out Nick.

            With regards to #4 - after keeping individuals present in these three waves, the sample sizes are still different by waves so there may be item non-response. In regressions, I'd save the sample afterwards and use that sample throughout. I'd like to do this here so I can then see whether differences are due to changes in time or caused by differences in the sample. It might be obvious but I just can't seem to figure it out here.

            Comment

            Working...
            X