Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Show to split a data set

    Hello,

    I have a data set that contains observations from two different cities. I would like to split the dataset into two, with observations from each city in a separate dataset. I would prefer to do this at the end of a do file that will define the variables in the dataset, as each city has the same variables.

    I have tried using foreach, where 111 and 112 are the individual city codes:

    foreach x in varlist 111 112 {;
    saveold "`data'_`x'_rds.dta", version(12) replace;
    };

    However, that just saves the same dataset twice under a different name.

    I have also tried using an "if" option with save ( saveold "`data'_111_rds.dta" if city==111, version(12) replace but that is not allowed.

    I can find plenty of information on how to merge datasets but not so much on splitting them so any advice is very welcome!

    Thank you in advance,
    Sarah

  • #2
    One way to do this is to preserve and restore the data to the same point before.
    Code:
    sysuse auto, clear
    
    forval i=0/1 {
        preserve
        keep if foreign==`i'
        save foreign_`i', replace
        restore
    }    ​

    Comment


    • #3
      Your code should actually save the same dataset three times with the variable part of the filename being in turn

      Code:
      varlist
      111
      112
      as with the key word in the command foreach takes what follows as literal text.

      Code:
      foreach x of varlist 111 112
      would be wrong too as the arguments are not variable names.

      http://www.stata.com/help.cgi?saveold shows that saveold does not allow an if qualifier (not option), as it seems you have found out.

      This should work if city is a numeric variable. You don't show us how your local macro data is defined, so clearly you must change that, as well as the name of the dataset file as a whole. .

      Code:
      local data "whatever it is"
      foreach code in 111 112 {
            use alldata
            keep if city == `code'
            saveold "`data'_`code'_rds.dta", version(12) replace
      }
      If city is a string variable you need

      Code:
      keep if city == "`code'" 
      One good reason this doesn't appear to be well documented may just be that experienced Stata users work backwards from seeing that if is not allowed with saveold to seeing that they need to use keep or drop beforehand. That is naturally not so clear to the beginner. Another good reason could be that Stata documentation tends to encourage you to keep different parts of a related dataset together and to use if to subset, unless problems of memory or speed dictate otherwise.

      See also savesome (SSC). I haven't checked to see if changes in version 14 have broken it, but I am optimistic.

      Last edited by Nick Cox; 27 Oct 2015, 03:10.

      Comment

      Working...
      X