Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding a Year Suffix to DHS data

    Hello,

    I would like to add _16 to the end of all my variables in my dataset so that when I merge with a DHS data set of a prior year I can differentiate which variables correspond to which year (as the names of the vars are the same across years). I believe I should use a varlist to do so but I'm unsure of the exact code. After reading different posts, I believe it should look something like this:

    foreach x of varlist `myvars' {
    local y_`x' = regexr("`x'")
    rename `x' v_16`x'
    }

    I successfully created the varlist "myvars" which includes all the variables in my data set, I just need to add the suffix now.
    Thanks!

    Annalivia

  • #2
    Welcome to Statalist, Annalivia.

    If you look at the output of the command help rename, it tells you

    Also see [D] rename group for renaming groups of variables.
    Clicking on rename group in that output - or typing the command help rename group - will describe how to accomplish what you need. Example 11 suggests that
    Code:
    rename (`myvars') (=_16)
    will serve your purposes without a loop, adding "_16" to the name of each variable in the `myvars' variable list.

    In the spirit of answering the question you actually asked, a looping solution would be something like the following.
    Code:
    foreach x of varlist `myvars' {
    rename `x' `x'_16
    }

    Comment


    • #3
      But let me take a step further back. Perhaps you are analyzing longitudinal data, where for each individual you have observations of the same measures over several years.

      If this is the case, then you will almost certainly need to use the techniques described in the Stata Longitudinal-Data/Panel-Data Reference Manual PDF. The reference manual PDFs are included in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. If you have not done so, you should review them before continuing.

      To do so, you will not want to have for example income_14 income_15 and income_16 being the income measure in 2014, 2015, and 2016. Instead, you will want three observations for the individual, with a "year" variable identifying the year the observation corresponds to, and the corresponding income variable for each observation.

      You will build your analysis dataset not with the merge command but rather with the append command. You will have data in what is called a "long" layout rather than a "wide" layout. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data.

      Comment


      • #4
        Thank you for your tips William! Unfortunately, the data is not longitudinal, as they do not interview the same families in each wave of the surveys. I believe I will have have to merge it on the cluster, or village level. If you have any tips on how to do this, I would also greatly appreciate this.

        Comment


        • #5
          Then you certainly want to use append rather than merge, as I described above, and treat the analysis as a pooled cross-sectional data. Going the route you are going with variable renaming is a mistake.

          Comment


          • #6
            OK. Thanks for the advice!

            Comment

            Working...
            X