Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem importing multiple excel sheets in one Stata file

    Dear Stata user,

    I have an Excel spreadsheet with 13 sheets. I want to import the last 12 sheets into 12 different Stata files and then merge them into a single file. The 12 Excel sheets have exactly the same structure. The name of the Stata variables are made by pasting the name of the original Excel sheet and the original Excel column title (this latter is turned into a label when importing into Stata because it is a numeric value in Excel).

    I tried to replicate the code given in the following post http://www.statalist.org/forums/foru...ne-stata-file:

    ************************************************** **
    import excel using "myxlsfile.xls", describe

    forvalues sheet=2/`=r(N_worksheet)' {

    local sheetname=r(worksheet_`sheet')

    import excel using "myxlsfile.xls", sheet("`sheetname'") cellrange(A8:AF154) firstrow case(lower) clear

    destring _all,replace

    local filename=subinstr("`sheetname'"," ","",.) // remove any space from the Excel sheet names

    // This loop is for creating new variable names in Stata combining the original Excel sheet name and Excel column title (imported as label in Stata)
    foreach v of varlist b-af {
    local newname: var label `v'
    rename `v' `filename'`newname'
    }

    save "`filename'", replace

    }
    ************************************************** **

    Initially I had not the option clear after the command "import excel using "myxlsfile.xls"". This works well for the first sheet, but then stops after the first file is saved and return the error message "no; data in memory would be lost".

    Alternatively, I have added the option clear after the import excel using "myxlsfile.xls" command as suggested in the post. Again it works for the first sheet but then it stops with the error message "worksheet not found".

    Many thanks in advance for any help.

    Bertrand

  • #2
    I use version 13

    Comment


    • #3
      I'd take the "worksheet not found" error message to be meaningful, so that a first diagnostic step this would be to display what the sheetname macro actually contains right before the import. Try:
      Code:
      .... preceding code
      local sheetname=r(worksheet_`sheet')
      display "sheetname = `sheetname'"
      import excel using "myxlsfile.xls", sheet("`sheetname'") cellrange(A8:AF154) firstrow case(lower) clear
      Then, looking at the Excel file, is what you see here the same as the name of the sheet in that file?

      For diagnosing problems like this, you would likely find the -set trace on- command useful. See -help set trace-.

      Comment


      • #4
        Mike's advice for troubleshooting this problem, and other problems with macros in general, is quite on point. But in this case, I think I see the problem.

        -import excel using "myxlsfile.xls", describe- stores the names of all of the different worksheets in r(). It is important to remember that results in r() are only there temporarily: they go away as soon as the next rclass command is invoked. So, you are fine going through the -foreach- command, as it does not modify r(). Similarly with -local sheetname=r(worksheet_`sheet')-. But then you hit -import excel using "myxlsfile.xls", sheet("`sheetname'") cellrange(A8:AF154) firstrow case(lower) clear-, it eliminates everything that was in r() up to that point. So then, when you get to the next iteration of your loop, r(worksheet_3) no longer exists, and that accounts for your error message.

        What you need to do is write another loop that runs through the worksheets and stores their names in local macros immediately after you run that -import excel..., describe- command.

        Code:
        import excel using "myxlsfile.xls", describe
        local n_sheets `r(N_worksheet)'
        forvalues i = 2/`n_sheets' {
            local sheet_`i' `r(worksheet_`i')'
        }
        
        forvalues i = 2/`n_sheets' {
            import excel using "myxlsfile.xls", sheet(`"`sheet_`i''"') cellrange(A8:AF154) firstrow case(lower) clear
            // ETC.
        }
        Note: Not tested. Beware of typos, other errors. But this is the gist of it.

        Comment


        • #5
          Thanks Mike for your msg.

          Actually the code works nicely if I don't change anything to the intial post I refered to: http://www.statalist.org/forums/foru...one-stata-file. Excel sheets are imported and saved in Stata after the initial Excel sheet name has been reformated to suppress any space and be used as filename.

          The problem comes when I change the data before saving them. Indeed I need to destring some variables and want to modify their name. When I do so, there is nothing anymore stored into the macro local sheetname=r(worksheet_`sheet').

          It seems that when I modify the dataset, the content of the local macros are deleted and this is why the error message "worksheet . not found" .

          Comment


          • #6
            Thank you very much Clyde. This works perfectly well now.

            I was confused because in the intial code i was refering to http://www.statalist.org/forums/foru...one-stata-file, it seems that the content of r() stays in memory even after the command -import-, provided no change was made to the dataset.

            Thanks again to both of you.

            Comment


            • #7
              The post cited in #6 is not different. The r-class result is carefully copied to a local macro before it is overwritten.

              This is tricky stuff because it seems that you have to know which commands are r-class. Conversely copying r-class stuff to safe places immediately it is produced is a fairly easy principle to grasp.

              Comment


              • #8
                Thanks Nick. I'll follow the advice.
                One thing is not yet fully clear, it is why my code works and does not suppress the content of r() after -import- provided I don't modify the dataset, not running -destring- and the /* ... */ commands in the code below ?

                Code:
                clear all
                
                import excel using "myxlsfile.xls", describe
                
                forvalues sheet=2/`=r(N_worksheet)' {
                    
                    local sheetname=r(worksheet_`sheet')
                
                    import excel using "myxlsfile.xls", sheet("`sheetname'") cellrange(A8:AF154) firstrow case(lower) clear
                        
                    **destring _all, replace
                    
                    local filename=subinstr("`sheetname'"," ","",.)
                
                    /*
                    foreach v of varlist b-af {
                        local newname: var label `v'
                        rename `v' `filename'`newname'
                    }
                    */
                
                    save "`filename'", replace
                }

                Comment


                • #9
                  destring internally runs commands like summarize and count, which is sufficient to zap previous r-class results. Does that answer the question?

                  Comment


                  • #10
                    Yes it does, thanks.

                    Comment


                    • #11
                      Clyde was obviously thinking in the right direction, but import excel is not the problem. The command is clever enough not to touch results stored in r() unless it is asked to overwrite them, i.e. when the describe option is specified. I would like to point to the "programmers" solution to this problem, which is the _return command,

                      Personally, I do not like the behavior of destring here. In my view, commands (especially such for simple data-management tasks) that do not themselves return results in r(), e() or s(), should not overwrite stuff that was stored before the command was called. Many official Stata commands behave this way, but obviously not all of them.

                      Last, as usual, I would like to point to xls2dta (SSC) for tasks like the ones discussed here. Admittedly, it might be more cumbersome than writing the loop, but on the other side details, such as cleared r() results, are handled automatically.

                      The following is not tested.

                      First, set up a do-file that does the data manipulation.

                      Code:
                      /* begin mydo.do **/
                      local filename=subinstr(sheetname[1]," ","",.)
                      drop filename sheetname // <- new, see below
                      
                      destring _all , replace
                      
                      foreach v of varlist b-af {
                          local newname: var label `v'
                          rename `v' `filename'`newname'
                      }
                      /** end mydo.do */
                      Then we import the files

                      Code:
                      xls2dta , sheets(2/13) generate(filename sheetname) : ///
                      import excel using "myxlsfile.xls" , cellrange(A8:AF154) firstrow case(lower)
                      execute our data manipulation do-file for each of the imported files

                      Code:
                      xls2dta : do mydo.do
                      and merge all files, using id as an identifier

                      Code:
                      xls2dta , clear : merge 1:1 id
                      Best
                      Daniel
                      Last edited by daniel klein; 02 Jun 2017, 00:13.

                      Comment


                      • #12
                        Thanks Daniel for your suggestion.

                        Comment


                        • #13
                          Daniel Klein: Like most users who look at saved results I've been bitten many times by finding out that what I want has been overwritten. Conversely, I no doubt cause problems for others by freely firing up commands like summarize within my programs.

                          It's an interesting long-term issue, perhaps suitable for users' meetings!

                          Comment


                          • #14
                            Getting a bit off topic here, but I wish StataCorp would enhance the nclass option for the program command in a way that an nclass command should indeed be nclass, i.e. not return anything when it concludes, regardless of which commands are called inside the program.

                            Best
                            Daniel

                            Comment

                            Working...
                            X