Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subset dataset into separate files by groups within a single variable

    hello all,
    I'm just trying to break a large dataset into subdata sets, and save each of those data sets as separate files. Each of the subdata sets should include only the observations for each of a finite groups of one variable.

    I'll reference the auto.dta dataset to keep things simple.

    I'd like to create and save a dataset for each of the types of observations of the variable "foreign." One of the created files would have all of the observations for foreign vehicles, and another file would have all of the observations for domestic vehicles.

    I'm trying this code (based on what I read here: https://www.stata.com/support/faqs/d...-with-foreach/)[INDENT=2]levelsof foreign, local(ForeignLevels)
    foreach 1 of local ForeignLevels {
    keep if foreign == `1'

    save "D:\MyDocs\foreign_category_data_`ForeignLevel s'", replace
    }[/INDENT]
    Many of you already know that only one file is created and it only has the variable headers; no observations.

    Two things before you answer:
    -- yes, know that the value for "domestic" in the auto.dta is actually zero. I think that's beside the immediate point because in my actual data set, all of the values are 1 through 13.
    -- I imagine that the problem is in the save command. Perhaps something to do with including the the local name.

    I also imagine that there is simply an easier way to achieve my goal.


    Thanks for any help you can provide!

  • #2
    keep obliterates part of the dataset in memory. You can use preserve and restore to overcome this.

    Code:
    levelsof foreign, local(ForeignLevels)
    foreach 1 of local ForeignLevels {
        preserve
        keep if foreign == `1'
        save dataset`1', replace
        restore
    }
    Last edited by Andrew Musau; 03 Apr 2023, 11:08.

    Comment


    • #3
      I think your loop should work just once -- in this example saving observations with foreign == 0 to a dataset. I cannot replicate the report of a dataset with no observations. I think you did something else to get that result.

      The problem is that second time around the loop, you have lost all the other subsets, because you did keep only the subset you just saved, which meant dropping everything else.

      StataCorp didn't support if with save --- and why not is a good question, to which I do not have a good answer -- but you need to read in the dataset again, or use a work-around like savesome from SSC that does that for you. Or use frames. Or use some other export command. If you want, as you do, separate .dta files, I fear that you need to read the data in again.

      --- OR as Andrew Musau flags do precisely that using preserve and restore.
      Last edited by Nick Cox; 03 Apr 2023, 11:13.

      Comment


      • #4
        Originally posted by Andrew Musau View Post
        keep obliterates part of the dataset in memory. You can use preserve and restore to overcome this.

        Code:
        levelsof foreign, local(ForeignLevels)
        foreach 1 of local ForeignLevels {
        preserve
        keep if foreign == `1'
        save dataset`1', replace
        restore
        }
        This worked a charm. Thanks Andrew Musau and Nick Cox

        And this is really useful; I've already got two other projects which I 'm going to go back and recode for this. One step closer to a semi-efficient and semi-clean code. :-)
        Thanks again.

        Comment


        • #5
          Hopefully a quick follow-up question, and, I suspect a very short answer of "no."

          The code, as written by Andrew Musau , has each of the saved files being named using the encoded number of local ForeignLevels.

          Is there a way to save the files via the foreach loop so that the original text version of the variable is in the name of the saved file?

          In the case of the my original example, using the file auto.dta, the ideal file names would have been:
          -- dataset_foreign
          -- dataset_domestic

          So, is there a way to save the files via the foreach loop so that the original text version of the variable is in the name of the saved file?

          Thank you!

          Comment


          • #6
            Assuming the variable is labeled, as is the case of foreign in the auto dataset,

            Code:
            levelsof foreign, local(ForeignLevels)
            foreach 1 of local ForeignLevels {
                preserve
                keep if foreign == `1'
                save dataset_`:lab (foreign) `1'', replace
                restore
            }
            [.] An issue is that variable labels may contain spaces and other characters not allowed in Stata names. Therefore, you may feed the label through -strtoname()- to generate a legal Stata name.

            Code:
            help strtoname()
            Code:
            levelsof foreign, local(ForeignLevels)
            foreach 1 of local ForeignLevels {
                preserve
                keep if foreign == `1'
                save dataset_`=strtoname("`:lab (foreign) `1''")', replace
                restore
            }
            Last edited by Andrew Musau; 04 Apr 2023, 15:42.

            Comment

            Working...
            X