Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best option to check for follow-ups in survey

    Hi,

    So I'm analysing data from 2 rounds of a survey (first completed in 2009, second in 2013). Participants who completed round 1 were followed up for round 2 - but of course there are the usual problems with attrition as well as deaths in the cohort. We want to see what number was lost to follow up, how many died and how many completed round 2.

    The way I can tell who was followed up in round 2 of the survey is that the ID numbers are the same between rounds but I am unsure of how to get the ID variables from both datasets into one so that I can check - was thinking just a simple merge but not sure if there is another way to do this (i.e. time-series?) as basically I just want to import observations from ID(round 1) into round 2.

    Hope that this makes sense, this is new territory for me and I'm flying a bit blind!

    Cheers,

    Maddie

  • #2
    I think your idea of a simple merge is the way to go; if you don't want to merge all the data from each round, you can just make subsets of the data with only the id's; then merge on id and you will learn who is only in round 1, whether anyone is only in round 2 and who if so and how many are in both; based on the info you present, I don't see how to distinguish among reasons for being only in round 1, however; for example, how do you distinguish between died and just didn't respond?

    Comment


    • #3
      A simple merge (merge 1:1 idvar using filename) will solve the basic problem. Look closely at the results of the merge command, and the _merge variable that gets created.

      Before merging, you can rename variables to end with the year 2009 or 2013, in the two different survey rounds. Then the merge will keep variables from both times, even if the variable names initially were identical (which ideally they were). You can use Stata's reshape command to get the data into a panel format.

      This might help to get used to some of the commands:
      http://homepages.rpi.edu/~simonk/pdf...taCommands.pdf

      Comment


      • #4
        Cheers guys, really appreciate the advice and glad to know that I am on the right track!

        Comment


        • #5
          Using -merge- will solve the immediate problem. But down the road, I suspect you will want to do some analyses that make use of both rounds of data. You will need a data set that contains that information. If the variables are named the same way in both rounds' data sets, -merge- will not import the round 2 data for those who already have a record in round 1. You can get around that by renaming the variables with suffixes indicating which round they are from and then -merging-. But then you are still left with the data in -wide- format, which is going to be unwieldy for most analyses. Since you probably need a data set in long layout in the end, why not do it from the start?

          Code:
          use round_1_data_set
          isid ID // VERIFY IDs ALL DISTINCT
          append using round_2_data_set, gen(round)
          replace round = round + 1
          isid ID round, sort
          save data_from_both_rounds, replace
          
          // NOW COUNT HOW MANY PEOPLE AVAILABLE IN EACH ROUND
          tab round
          // AND FIND OUT HOW MANY ARE IN BOTH ROUNDS
          duplicates report ID
          Note: This code assumes that the corresponding variables have the same names in both data sets and also have the same general storage type. In real life, it is not uncommon to find discrepancies in variable names (e.g. var1 vs. var01), or storage types (string in 1 data set, numeric in another). But you will have to fix up those problems at some point anyway.



          Comment

          Working...
          X