Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating multiple file from a single dataset

    Hi. From the data below, I want to create individual datasets for each treated state, i.e. where treat=1. The code below gives me empty datasets for all states where statenum>1, and I haven't been able to correct this. Could someone please help me?

    Also, is there anyway to do this without creating a numerical variable for state, so that the state name can appear in the name of the .dta file created.


    Code:
    
    preserve
    
    keep if treat==1
    
    *creating a numeric variable for states
    encode state, gen(statenum)
    
    forvalues X=1/15 { 
         keep if statenum==`X'
         save "${outdir}\treat`X'.dta", replace 
    }
    
    restore
    
    input str50 state byte treat
    
    "Arunachal Pradesh" 0
    "Arunachal Pradesh" 0
    "Arunachal Pradesh" 0
    "Arunachal Pradesh" 0
    "Arunachal Pradesh" 0
    "Arunachal Pradesh" 0
    "Arunachal Pradesh" 0
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Arunachal Pradesh" 1
    "Assam"             0
    "Assam"             0
    "Assam"             0
    "Assam"             0
    "Assam"             0
    "Assam"             0
    "Assam"             0

  • #2
    I can't see much operational advantage to splitting a dataset into many smaller datasets, unless each is so big that you don't have enough memory to be comfortable with calculations.

    But that's not the question. The bug here lies within your loop: once you have gone

    Code:
    keep if statenum == 1
    and kept only observations for state number 1, then necessarily observations for all other states have disappeared. That is what keep does; you can't keep a subset without dropping the complement. It's not illegal to work with a dataset with no observations, but that doesn't help here.

    Stata doesn't allow save with an if qualifier, so you need to read the data in again. But you can use with an if qualifier.

    Note that restore occurs after the loop and so is irrelevant to this problem.

    I think this should be closer to what you want.

    Code:
    encode state, gen(statenum)
    
    save work, clear
    
    forvalues X=1/15 {
         use work if statenum == `X', clear
        
         save "${outdir}\treat`X'.dta", replace
    }
    Last edited by Nick Cox; 27 Aug 2023, 03:20.

    Comment


    • #3
      You can have state names in filenames, subject to various qualifications. In principle, you just loop over the names. levelsof can help here.

      Some Indian states clearly have multiple word names, so three questions arise:

      1. Does your operating system allow spaces in filenames?

      2. Do you collaborate with people using other operating systems. which may differ on that?

      3. Are you happy with Stata's insistence that filenames including spaces must be specified within " "?

      Depending on your answers, you might want to replace spaces with underscores, or use two-letter abbreviations for states instead.

      Comment

      Working...
      X