Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using value of a string variable in the file name of a temporary file

    Hi all, I'm using Stata 13.1 on Windows 7.

    I am importing data from a number of worksheets in a single Excel file and formatting the data for analysis. I generate a new variable called gender which will have 'male', 'female' or persons depending upon the data sheet the data have been imported from. My aim is to loop through the worksheets, save the results from each worksheet as a temporary file in stata , and then append all of the temporary files - so that I will have males, females and persons all in one final dataset for analysis. I'm trying to find a way of using the value 'male' or 'female' or 'person' from the gender variable and using that with 'tempfile whatever_gender save whatever_gender, but I cant seem to be able to call gender to the tempfile section.

    I've tried the following:

    Code:
    import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, describe
    forvalues sheet=3/`=r(N_worksheet)' {
      local sheetname=r(worksheet_`sheet')
      local cellrange = r(range_`sheet')
      import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, sheet("`sheetname'") cellrange("`cellrange'") clear
      di regexr(r(worksheet_`sheet'),".+ ","")
      g gender = regexr(r(worksheet_`sheet'),".+ ","")
      tempfile "`regexr(r(worksheet_`sheet'),".+ ","")'"
      save "file_`sheetname'", replace
    }
    append using `Male'
    append using `Female'
    append using `Person'

  • #2
    Well, I'm not sure this is the only problem, as your code is fairly complex, and the use of -regexr()- makes it even harder for me to follow. But there is certainly one thing wrong.

    Code:
    local sheetname=r(worksheet_`sheet')
    local cellrange = r(range_`sheet')
    will only work the first time through your loop. Once3 you hit the -import excel- command inside the loop, r() will be overwritten, and the previous information about how many worksheets there are and their names and cell ranges will be lost.

    So after the -import excel, describe- command, you need to store r(N_worksheet) in a local macro, and you need to then run a separate -forvalues- loop to create new local macros that store the names and cell ranges of the worksheets.

    Again, I have always found -regexr- difficult to work with and I offer no opinion on whether your use of it here is correct or not. So it may be that after you fix what I have pointed out above you will still not get what you are looking for. but I am confident that you must fix this first to get anywhere at all.

    Comment


    • #3
      Hi Clyde,

      Thanks for this. I use -regexr- because the sheets in Excel are named 'Mid-2015 Males', 'Mid-2015 Females', and 'Mid-2015 Persons'. I am only interested in the last word to be used in a filename and also to show what gender the group is.

      The following code works ok in the fact that I get each of the worksheets out and can save as a .dta file. The thing is I didnt want to save physical working files to the network and preferred to keep in temporary files. Any ideas on how I can translate this to being saved in memory rather than physical files?

      Code:
      cd c:/working_files
      import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, describe
      forvalues sheet=2/`=r(N_worksheet)' {
          local sheetname=r(worksheet_`sheet')
          local cellrange = r(range_`sheet')
             import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, sheet("`sheetname'") cellrange("`cellrange'") clear
             di regexr(r(worksheet_`sheet'),".+ ","")
             g gender = regexr(r(worksheet_`sheet'),".+ ","")
           save `=regexr(r(worksheet_`sheet'),".+ ","")', replace
         }
        
        append using Males
        append using Persons
      I also didnt want to have to specify the filenames in the append just in case I run this on a batch of files where there was an inconsistency with naming conventions i.e. males rather than Males

      Happy to paste in further details of my files if necessary.
      Last edited by Tim Evans; 24 Nov 2016, 15:32.

      Comment


      • #4
        So something like this:

        Code:
        cd c:/working_files
        tempfile Males
        tempfile Females
        tempfile Persons
        import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, describe
        forvalues sheet=2/`=r(N_worksheet)' {
            local sheetname=r(worksheet_`sheet')
            local cellrange = r(range_`sheet')
               import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, sheet("`sheetname'") cellrange("`cellrange'") clear
              local savein = proper(regexr(r(worksheet_`sheet'),".+ ",""))
               g gender = `"`savein'"'
             save ``savein''
           }
          use `Females', clear
          append using `Males'
          append using `Persons'
        The use of the -proper()- function will deal with issues like male vs Male (though it won't deal with other kinds of misnamings).

        I think that -save- is the wrong command at the end of the loop, as it will overwrite whatever was previously in the tempfile. It looks to me as if you were building up these files, appending as we go through the loop. If so, you want:
        Code:
        append using ``savein''
        save `"``savein''"', replace // BE SURE TO REPLICATE THIS QUOTE PATTERN EXACTLY
        Also, I don't quite get the rationale for building up separate files for males, females, and persons if you are just going to append them all together in the end. Why not just build up one file from the start? Am I missing something?
        Last edited by Clyde Schechter; 24 Nov 2016, 15:46. Reason: Correct error in code

        Comment


        • #5
          I suppose the reason to not just build up one file is that each male, female and person is a tab in Excel. The formatting of the data means that the top row is not the heading and I have to manipulate this somewhat to get the headers in the right place. I'd be happy to append them all in one go and try it from that angle.

          One other query - where would I place the second piece of code in your previous reply - within or outside the main code you suggest?
          Last edited by Tim Evans; 24 Nov 2016, 16:22. Reason: Further question added

          Comment


          • #6
            Hi Clyde,


            I was having a little trouble with trying to get certain commands to run within my much larger loop. Ultimately falling over as you rightly suggested before when the local macro loses its contents/gets overwritten. I've now split up and combined what you suggested - including building a base file with males, females and persons following which I do the transformation just the once - many thanks for your help - I think I was trying to be too clever/complicated.

            This is what I'm doing now and it seems to work nicely:

            Code:
             
             
            cd c:/working_files
            tempfile Males
            tempfile Females
            tempfile Persons
            import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, describe
            forvalues sheet=2/`=r(N_worksheet)' {
                local sheetname=r(worksheet_`sheet')
                local cellrange = r(range_`sheet')
                   import excel using SAPE18DT5-mid-2015-ccg-syoa-estimates.xls, sheet("`sheetname'") cellrange("`cellrange'") clear
                  local savein = proper(regexr(r(worksheet_`sheet'),".+ ",""))
                   g gender = `"`savein'"'
                   
                 save ``savein''
               }
              use `Females', clear
              append using `Males'
              append using `Persons'
            
              g row = _n
              drop if row<5
              drop row B
              g row = _n
                foreach var of varlist F - CR {
                tostring `var', replace
                replace `var' = "_"+`var' if row==1
                }
                foreach var of varlist E - CR {
                rename `var' `=strtoname(`var'[1])'
                }
            
            rename A ons_ccg_code
            rename D CCGname
            drop if ons_ccg_code ==""
            drop if CCGname ==""
            foreach var of varlist _* {
            destring `var', replace
                }
            destring All_Ages, replace
            
            drop C
            
            
            forvalues i = 0(5)85 {
                local ilast = `i' + 4
                gen _`i'_`ilast' = 0
                forvalues j = `i'/`ilast' {
                    replace _`i'_`ilast' = _`i'_`ilast' + _`j'
                    }
                }
            
            drop _0 - _89
            order ons_ccg_code - All_Ages _0_4- _85_89 _90
            drop row

            Comment

            Working...
            X