Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Melanie,

    It seems that the various worksheets you are trying to append together are rather heterogeneous. This is going to be a slow, painstaking task, and it is going to be difficult to coach the process from a distance without being able to see the actual data. This not unusual: spreadsheets were developed with a view to organizing data for the human eye: analysis and compatibility with statistical software were at best secondary considerations in their design.

    It also seems that these worksheets include some which look very different to human eyes than they do to Stata.

    So here's what I suggest. Rather than trying to put them all together in one fell swoop, import each one separately into Stata. Then go through the Stata imports one by one, starting with -describe- to see which variables they contain, and in which storage types. Then use Stata to transform each of these imports to a common set of variable names and types and save those separately. Finally, run a loop to -append- together the transformed Stata data sets.

    In the end, it's the same as trying to do everything in one loop, but dealing with only one data set at a time is likely to be less confusing and less frustrating.

    Welcome to the wonderful world of spreadsheet data management.

    Comment


    • #17
      One additional caution:

      If you ultimately want to have numeric variables with value labels created by -encode-, do not do the encoding separately in each file. The value labels that -encode- creates depends entirely on the particular numerical values that occur within each file, and when you then append the files together, the labelings may be inconsistent and the result will be garbage--but it won't necessarily be obvious that that is the case. So keep your numeric variables un-encode-d until the final append, and then run -encode- on the combined data.

      Comment


      • #18
        Originally posted by Clyde Schechter View Post
        Code:
        clear
        tempfile building
        save `building', emptyok
        
        import excel using "C:\Users\Originaldata.xls", describe
        return list
        local n_sheets `r(N_worksheet)'
        forvalues j = 1/`n_sheets' {
        local sheet`j' `r(worksheet_`j')'
        }
        
        forvalues j = 1/`n_sheets' {
        import excel using c":Users]Originaldata.xls", sheet(`"`sheet`j''"') firstrow case(lower) clear
        // whatever it is you do with this data, perhaps -gen source = `"`sheet`j''"'-
        append using `building'
        save `"`building'"', replace
        }
        Hi, I came across this post when trying to solve a similar problem and wanted to ask a question for the benefit of my own learning. It is not clear to me, why the code save `"`building'"', replace has extra `' at the end, wouldn't be appropriate to use "`building'" ?

        Kind regards,
        Konrad
        Version: Stata/IC 13.1

        Comment


        • #19
          Originally posted by Clyde Schechter View Post
          It is possible that the names on the tabs of the Excel worksheets are "contaminated" with leading or trailing blank spaces or, less commonly, with embedded non-printing characters. It is also possible that some are simply misspelled.

          You can get around these problems with:

          Code:
          clear
          tempfile building
          save `building', emptyok
          
          import excel using "C:\Users\Originaldata.xls", describe
          return list
          local n_sheets `r(N_worksheet)'
          forvalues j = 1/`n_sheets' {
          local sheet`j' `r(worksheet_`j')'
          }
          
          
          
          forvalues j = 1/`n_sheets' {
          import excel using c":Users]Originaldata.xls", sheet(`"`sheet`j''"') firstrow case(lower) clear
          // whatever it is you do with this data, perhaps -gen source = `"`sheet`j''"'-
          append using `building'
          save `"`building'"', replace
          }
          At the end of this you should have all of the data sheets appended into `building', which you can then load, clean up, and do whatever you please with.

          I have seen this approach sometimes fail with embedded non-printing characters in the names on the tabs. In that case I know of no solution other than going back to the Excel file and manually naming the tabs correctly, though perhaps someone else on the forum has a solution for this. (If so, I would love to see it!)
          I am using this code and its gives me error message invalid syntax. I have several sheets in the excel file which I need to combine. Any help is greatly appreciated! Thanks.

          Comment


          • #20
            Re #18 Konrad Zdeb : the `" "' were used out of an abundance of caution. Suppose when you work with this code, you mistype building. The command
            Code:
            save `mistype', replace
            now contains an undefined macro, so Stata interprets this as
            Code:
            save, replace
            which causes Stata to overwrite the last .dta file you read in! The use of `" "' prevents that, because
            Code:
            save `""', replace
            is recognized by the parser as an error, so no action is taken and you are notified that something is wrong.

            I learned this from another Forum member who posted it in response to a problem I raised about the dangers of treating undefined macros as empty strings instead of treating them as errors. I do not remember who that person was, but the credit for this belongs to him or her.


            Sorry for the very delayed response on this. I had not seen your post before today.

            Re #19 Sriparna Ghosh : The code you cite has been used both by me and others successfully in the past. There are some configurations of the spreadsheets that could cause it to fail, although it is more likely that the code you are using is in some subtle way altered from what was originally posted and that is causing your problems. In any case, in order to troubleshoot this, more information is needed. Precede the code with
            Code:
            set tracedepth 1
            set trace on
            and then run it again. Then post back showing the exact output you get from Stata so we can see which command is giving you the error message. When posting the output, be sure to enclose it between code delimiters so that it is well-aligned and readable. (If you are not familiar with code delimiters, see Forum FAQ #12.)

            Comment

            Working...
            X