Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sheets option not working with xls2dta ?

    I am trying to append some excel files using the xls2dta command. Currently, the folders are organized in the following manner
    C:/Dropbox/Research/Data/VDSA/XX/*var*.xlsx
    where XX represents a state abbreviation. Within each XX folder are a bunch of excel files, each containing observations for a single variable.
    My goal is to create variable specific Stata datasets where each file will have all observations for that variable from every state.
    The code I have written up is reproduced below. The hurdle is that while each excel file has only one sheet, the sheet name themselves vary by file. I don't care what the sheet name is but I need to figure out a way to ignore the sheetname.
    Whenever I run the module below, I get an error saying:
    option sheets() not allowed.

    Here is my code:
    clear
    cd "C:/Dropbox/Research/Data/VDSA Data"
    local vdsafname "aesr agri_implements area_prod cia fert_consumption fert_prices fhp fodder hyv june_julyaug landuse lgp livestock mai market_road nca_gca_nia_gia normal_rainfall operational_holdings pet population sia soil_type wages"
    local statename "AP AS BH CH GJ HP HR JH KA KE MH MP OD PU RJ TN UK UP WB"
    foreach x of local vdsafname{
    foreach s of local statename{
    xls2dta : import excel "`s'/dt_`x'_a_web.xlsx", sheets(*) firstrow
    drop DISTNAME
    xls2dta , save("`x'.dta") : append
    }
    }

    I am not interested in saving the individual files as Stata files. My goal is to append them and create one large file.
    Thanks.

  • #2
    To your specific problem of the error message "option sheets() not allowed" a review of the output of help xls2dta shows us that sheets is an option to the xls2dta command, not to the import excel command. My understanding of the syntax is that you want
    Code:
    xls2dta, sheets(*) : import excel "`s'/dt_`x'_a_web.xlsx", firstrow

    Comment


    • #3
      xls2dta is from SSC (most likely).

      William is right in stating that the sheets() option is for xls2dta, not import excel. However, if there is only one sheet per Excel file, the option should not be necessary at all.

      The individual files can be erased from disk (permanently!) after appending with

      ​​​​​​
      Code:
      xls2dta , eraseok : erase
      If Branon manages to specify all files to be appended in one call, the individual files will not even be saved to disk in the first place. The syntax would be

      Code:
      xls2dta , save(filename.dta) importopts(firstrow) : append using
      ​path/allfiles
      Best
      Daniel
      Last edited by daniel klein; 02 Dec 2017, 16:58. Reason: Correct autocorrection ...

      Comment


      • #4
        Reading through the initial post, I do not believe that the suggested code will give the desired result. I am not even sure that xls2dta is the best suited tool for what is wanted. However, I also do not fully understand the structure of the files and what exactly is wanted. My best guess is that it should be something like

        Code:
        clear
        cd "C:/Dropbox/Research/Data/VDSA Data"
        local vdsafname "aesr agri_implements ..."
        foreach x of local vdsafname{
            xls2dta , recursive : import excel "./dt_`x'_a_web.xlsx", firstrow
            xls2dta , save("`x'.dta") : append
        }
        Some comments:

        The first line, i.e. clear, is not needed because xls2dta does neither require an empty dataset nor does it destroy the current dataset (if not asked to do so).

        I assume the second line defines the different variables for which we want one file containing the observations of each state. This is the only list that we need to loop over.

        The first call to xls2dta specifies the recursive option. The recursive option causes xls2dta to search the current directory recursively, meaning all subdirectories are searched. Here, I assume that the only subdirectories are the XX directories that indicate the states. This syntax replaces the inner loop in the code that Branon suggested. The syntax is also the only way to get xls2dta to store all the converted files for each of the states in s(). Note that that option recursive needs filelist (Picard, SSC) to be installed. Type

        Code:
        ssc install filelist
        help filelist
        The second call to xls2dta then appends all state-specific files for one variable in one file and saves it under variable.dta. This last step can only work correctly if all files to be appended have been stored in s() by one call to xls2dta previously, as I have done above with the recursive option.

        I cannot make sense of the

        Code:
        drop DISTNAME
        line in Branon's code. I might therefore well misunderstand the problem, in which case I suggest Branon tries to rephrase and clarify. The key point to understand about
        xls2dta is that it can only append the files that have been converted in the one(!) previous call. The original loop structure will most likely not do what Branon wants.

        Best
        Daniel
        Last edited by daniel klein; 03 Dec 2017, 03:50.

        Comment


        • #5
          I'm with daniel in that I do not fully understand the exact structure of the data to import and what is wanted here. I like to tackle this kind of problem using filelist and runby, both from SSC. So that everyone can play along, I used the following code to create Excel files according to the structure described in #1. The code creates a "VDSA Data" subdirectory within Stata's current directory (help cd), and within it, a series of subdirectories for states:
          Code:
          clear all
          set seed 3123
          
          cap mkdir "VDSA Data"
          local vdsafname "aesr agri_implements area_prod cia fert_consumption fert_prices fhp fodder hyv june_julyaug landuse lgp livestock mai market_road nca_gca_nia_gia normal_rainfall operational_holdings pet population sia soil_type wages"
          local statename "AP AS BH CH GJ HP HR JH KA KE MH MP OD PU RJ TN UK UP WB"
          foreach s of local statename {
              cap mkdir "VDSA Data/`s'"
          }
          
          foreach x of local vdsafname{
              foreach s of local statename{
                  clear
                  set obs 10
                  gen id = _n
                  gen v1 = runiform()
                  gen v2 = runiform()
                  export excel "VDSA Data/`s'/dt_`x'_a_web.xlsx",  firstrow(variables) replace
              }
          }
          If the goal is to import all Excel files and append them all into one single big file, then you can do something like:
          Code:
          clear all
          
          * make a list of the Excel files to process
          filelist, dir("VDSA Data")
          keep if strmatch(filename, "*.xlsx")
          
          * use string functions to extract state and variable names
          gen state    = subinstr(dirname, "VDSA Data/", "", 1)
          gen variable = regexs(1) if regexm(filename, "dt_(.+)_a_web.xlsx")
          
          * code to import one Excel file
          program import_xlsx
              local s = state
              local v = variable
              dis "doing `s', `v'"
              import excel "`=dirname'/`=filename'", firstrow clear
              gen obs      = _n
              gen state    = "`s'"
              gen variable = "`v'"
          end
          
          * run program for each file
          runby import_xlsx, by(dirname filename) verbose
          
          sort variable state obs
          save "data_combo.dta", replace
          As you can see, the code uses filelist to make a dataset of all Excel files in all the subdirectories. String functions are used to extract the names for the state and the variable based on patterns in the filename. With runby, the import_xlsx program is called for each by-group; each time, runby replaces the data in memory with the observations from the current by-group. In the example above, each by-group contains exactly one observation (one by-group per Excel file). When the import_xlsx program terminates, what's left in memory is considered results for the by-group and is stored. Accumulated results are combined and replace the data in memory when runby is done processing all by-groups.

          If the goal is to generate one dataset per variable, then you could always break up the dataset created by the code above using the method described in the "partitioning a file into subfiles" section of runby's help file. You can also call runby within a runby user-program and do it directly:
          Code:
          clear all
          
          * make a list of the Excel files to process
          filelist, dir("VDSA Data")
          keep if strmatch(filename, "*.xlsx")
          
          * use string functions to extract state and variable names
          gen state    = subinstr(dirname, "VDSA Data/", "", 1)
          gen variable = regexs(1) if regexm(filename, "dt_(.+)_a_web.xlsx")
          
          * code to import one Excel file
          program import_xlsx
              local fpath = "`=dirname'/`=filename'"
              local s = state[1]
              dis "doing `fpath'"
              import excel "`fpath'", firstrow clear
              gen obs      = _n
              gen state    = "`s'"
          end
          
          * code to process each variable
          program do_vars
            local v = variable[1]
            runby import_xlsx, by(state) verbose
            gen variable = "`v'"
            save "`v'.dta", replace
          end
          
          * run program to process variable by-groups
          runby do_vars, by(variable) verbose

          Comment


          • #6
            Thanks everyone for your comments. Ultimately, I went ahead with a more primitive solution. To clarify the folder structure is the following: There are 19 folders- one for each state in India. Within each folder, there are identically named excel files that store data for an individual variable. For example, landuse.xlsx is a spreadsheet located in each folder. The excel file already has the state name and code in a column. What I was trying to do was to import all the landuse.xlsx file from each folder and append them all to create a master Stata dataset for landuse.
            Here is the code that I wrote up and worked


            local vdsafname "aesr agri_implements area_prod cia fert_consumption fhp hyv june_julyaug_rainfall landuse lgp mai market_road nca_gca_nia_gia normal_rainfall pet population sia soil_type wages"
            local statename "AS BH CH GJ HP HR JH KA KE MH MP OD PU RJ TN UK UP WB" //Note I have excluded AP from this list!
            cd "`dropbox'/Data/VDSA Data/Excel"
            local savepath "`dropbox'/Data/VDSA Data/Stata"
            *Create the first set of files using AP's data
            foreach x of local vdsafname{
            qui: import excel "AP/dt_`x'_a_web.xlsx", firstrow clear
            qui: mvdecode _all, mv(-1)
            rename *, lower
            qui: save "`savepath'/`x'.dta", replace
            }
            *Now add the other remaining states
            foreach x of local vdsafname{
            foreach s of local statename{
            qui: import excel "`s'/dt_`x'_a_web.xlsx", firstrow clear
            qui: mvdecode _all, mv(-1)
            }
            rename *, lower
            qui: append using "`savepath'/`x'.dta", force
            qui: save "`savepath'/`x'.dta", replace
            }



            Comment


            • #7
              I find it hard to believe that this code really works for you.

              Code:
              foreach x of local vdsafname{
                  foreach s of local statename{
                      qui: import excel "`s'/dt_`x'_a_web.xlsx", firstrow clear
                      qui: mvdecode _all, mv(-1)
                  }
                  rename *, lower
                  qui: append using "`savepath'/`x'.dta", force
                  qui: save "`savepath'/`x'.dta", replace
              }
              The inner loop finishes with the data of the last state, i.e. WB, in memeory but without having saved or appended any of the aesr agri_implements, area_prod, ... . The append command probably fails because there will be no `savepath'/AS.dta, `savepath'/BH.dta, ... as they are not previously saved.

              Best
              Daniel
              Last edited by daniel klein; 07 Dec 2017, 23:37.

              Comment


              • #8
                Sorry here is the correct version. I was trying to be concise in my message by removing some of the extraneous commands and ended up deleting the wrong brackets.

                Code:
                local vdsafname "aesr agri_implements area_prod cia fert_consumption fhp  hyv june_julyaug_rainfall landuse lgp  mai market_road nca_gca_nia_gia normal_rainfall  pet population sia soil_type wages"
                local statename "AS BH CH GJ HP HR JH KA KE MH MP OD PU RJ TN UK UP WB" //Note I have excluded AP from this list!
                cd "`dropbox'/Data/VDSA Data/Excel"
                local savepath "`dropbox'/Data/VDSA Data/Stata"
                *Create the first set of files using AP's dataCreate the first set of files using AP's data
                 foreach x of local vdsafname{
                 qui: import excel "AP/dt_`x'_a_web.xlsx", firstrow  clear
                 qui: mvdecode _all, mv(-1)
                
                 rename *, lower
                 qui: save "`savepath'/`x'.dta", replace
                 }
                 *Now add the other remaining states
                 foreach x of local vdsafname{
                  foreach s of local statename{
                   qui: import excel "`s'/dt_`x'_a_web.xlsx", firstrow  clear
                  qui: mvdecode _all, mv(-1)
                     rename *, lower
                        qui: append using "`savepath'/`x'.dta", force
                     qui: save "`savepath'/`x'.dta", replace
                 }
                 }

                Comment

                Working...
                X