No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying variables not present in multiple datasets and then creating them

    ​​​​​​Hi all,

    I have 100 datasets, each with a similar set of variables var1 to var60. I would like to write a generic recode do file using a loop to recode var1 to var60 in each dataset to create new datasets with variables newvar1 to newvar5. I would then be able to append these 100 datasets together for my analysis. The loop is fine, but the problem is that some datasets are missing one of the original variables, so my generic recode do file does not run. e.g.

    egen newvar1 = rowtotal(var1 var 2 var3)

    But if var2 is missing from one of the datasets, the code stops with the r(111) variable var2 not found error. I wanted to include in the loop a first step to identify variables that are not present, then create the variables with missing values in the original dataset, so that the code can run through. e.g.

    gen var2=.
    egen newvar1 = rowtotal(var1 var 2 var3)

    I got as far as using lookfor to identify the variables present in each dataset, but I'm not sure how to return a varlist containing the variables that are not present, and then to use this varlist returned to create them.

    Can anyone help with this? Or suggest another way of doing it?



  • #2
    Some technique in this recent thread


    • #3
      Hi Nick,

      Thanks for the link! I solved the problem like this in the end, and it seems to do what I need:

      quietly describe, varlist
      local vars `r(varlist)'
      local need var1 var2 var3
      local create : list need - vars
      foreach var in `create'{
      gen `var'=.
      keep `need'