Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difficulty in importing excel sheet in Stata14

    Hi All,
    When I try to import an excel file (xlsx) into the stata- the tick box for the option ''import first row as variable names' is not getting activated. I tried ticking the box first and then import the file. But the moment I click on ímport' - the tick box gets deactivated. Can some one advise on where exactly am I going wrong?

  • #2
    Sucheta:
    welcome to the list.
    Temptative reply: is your copy of Stata fully updated?
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      I am having the same problem with one of nine Excel sheets, each from different but similarly configured workbooks.
      Oddly, I can import that same problematic Excel workbook spreadsheet using both JMP and STAT/Transfer packages.

      The source of the data is from Guatemala; so, data headings are in Spanish and contain (unicode?) characters,
      specific to that language. Of course, so are the other eight Excel workbook sheets, each of which do not disable
      the "import" option, which allows use of the first row of the spreadsheet as variable names for the STATA (v.14.1) output.

      FYI, I did attempt to read the problem Excel file with an earlier version of STATA (v 12.1) and the problem recurred.
      Could this be a unicode issue? Is there an options that would allow unicode characters as variable names?
      The files are named 2005.xlsx through 2013.xlsx with file 2012.xlsx having the problem behavior. I suspect that the
      same characters were used in row 1 of all 9 Excel spreadsheets; but don't know that for fact. Any thoughts on this?

      Comment


      • #4
        This seems to be a quirky problem. Just recently someone else reported the same kind of phenomenon. It's rare enough that it will be difficult to pin down unless someone is willing to share an Excel workbook that gives rise to the problem. Does the problem still occur if you keep only the header row (i.e., delete all of the data)? If so, and if you are unable to share the entire workbook, then perhaps you could make a copy, delete the data if you cannot share them, and attach the problematic workbook to a post for others to take a look at.

        Comment


        • #5
          I saved copies of the original 2012.xlsx file with the headers in row one, only, and saw the same behavior without data. That file is attached to this reply. Again, the file headers are written in Spanish, which Excel identifies as using the Calibri11 point font. Thank you for any insights on this problem.
          Attached Files

          Comment


          • #6
            ps I used the STATA v 14.2 menu sequence [File] -> [Import] -> [Excel] and navigated to the attached file. As before, the "Import first row as variable names" turned grey, regardless if the option had been checked prior to selecting the Excel file or had not been checked. The STATA command generated: import excel "CopyZero_2012_Ed2.xlsx", sheet("2012") and adding option firstrow is ignored.

            Comment


            • #7
              Thanks. Could you do the same with one of the workbooks that does not have a problem?

              Comment


              • #8
                There are repeated variable names in the excel sheet. If you use the import excel command with firstrow option, Stata just replaces the names with a capital letter. Apparently it isnt so smart when you use the menu to import

                Comment


                • #9
                  There are repeated column names, but that isn't the problem, or at least that isn't by itself a problem with the GUI's control being greyed out. From my playing around with the file, the problem with the GUI's peculiar behavior is much more complicated than that.

                  This will almost certainly be a job for technical services, but it would be good at least to get an Excel workbook from another year that is ostensibly identical to this problematic workbook but that it doesn't present a problem.

                  Comment


                  • #10
                    Well, bit of an update: it seems it is possible to import duplicate var names with the GUi in some cases.
                    However, removing duplicate var names in attached excel file does fix the issue for this sheet. Mayeb someone can work further based on this one.
                    Attched is a sheet that has the original list of vars (checkbox is greyed out), a sheet with duplicate var names removed (not greyed out), and a sheet with a simple set of duplicate var names (not greyed out) CopyZero_2012_Ed2.xlsx

                    Comment


                    • #11
                      It's more involved than that. There is some kind of strange interaction of number of columns, number of duplicate column headers and probably something else. In my playing around, I was able to drop one column (any column) and that removed the problem even though all of the duplicate column headers remained. I was able to keep the all of the columns and rename two of the five duplicate pairs of columns (three duplicate-name pairs remaining) and that removed the problem. What I have not been able to do is to create a worksheet that replicates the problem.

                      Comment


                      • #12
                        In response to the request, "Thanks. Could you do the same with one of the workbooks that does not have a problem?"/, please see attachment of a file without the problem.
                        Attached Files

                        Comment


                        • #13
                          I think an explanation of what is happening will be helpful...

                          When import excel imports variable names and it encounters a duplicate, it simply uses the default excel column name for that duplicate. It does this by making a single pass through the first row of the data. An example of the problem import excel encountered, with the original Excel file posted, can be seen when examining columns Q and T. Both contain "ESTADIOC" in the first row, and based on the rule that duplicates get replaced with the Excel column name, column T gets named "T". So far we are still in good shape. Now if we look down to column CA we will see that the first row contains a "T". "T" was already used as mentioned, and due to only one pass of duplicate resolution logic, import excel just gives up and uses the column names for everything.

                          The dialog box uses the same logic to determine if variables names could be imported. Based on the description above, it does not allow the checkbox to be enabled.

                          I hope the explanation helps.

                          Perhaps we can improve the logic for name collision resolution in the future.

                          Comment


                          • #14
                            Thank you for your explanation.

                            Like you, I am hopeful that STATA programmers can improve the logic for name collision resolution, soon. The error persists with STATA v15 and, as mentioned in my earlier post, programmers of STAT/Transfer, JMP and probably SAS have logic for name collision resolution in place.

                            I noticed that with an earlier spreadsheet, which included two variables with with more than 8 characters, but identical first 8 characters, STATA identified the duplicates with an error alert. I was able to modify the eighth character of the second of the two variables to workaround the problem. I am guessing that the reason that I did not get that alert with this spreadsheet is because characters beyond the first 8 were NOT identical or that the length of the column label was NOT greater than 8 characters. Seems as though STATA should consistent alerts when duplicate STATA variables are generated, which could be addressed with minimal programming effort. I hope my reply is helpful and thank you, again!

                            Comment

                            Working...
                            X