Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compiling multiple data ranges from Excel into one Stata file

    Hello.

    I'm using Stata 15 and am working on compiling certain ranges from multiple Excel sheets into one Stata file (I can't provide a copy since it is proprietary data). My Excel file consists of over 20 sheets, each of which represent a construct such as "Dissatisfaction." Within the "Dissatisfaction" sheet, I have six dimensions of the construct for over 20 brands measured daily. I would like to write a loop that allows me to establish a longitudinal Stata data file with the following variables:
    Date, Brand, Construct1-Dimension1, Construct1-Dimension2, Construct1-Dimension3,.......Construct3-Dimension6.

    Here are some steps I took: first, I created variable Date and variable Brand. The following code does just that:

    Code:
    //import date and brand
    //pull brand name
    import excel "\\Client\C$\Users\..2007.xlsx", sheet("Dissatisfaction") cellrange(ED6:ED6) clear
    //save brand name in a local
    local brand=ED[1]
    //pull date
    import excel "\\Client\C$\Users\...2007.xlsx", sheet("Dissatisfaction") cellrange(A7:A372) firstrow clear
    //add brand name variable
    generate str12 brand = "`brand'"
    //save brand/year
    save "\\Client\C$\Users\....Brand1_2007.dta", replace
    Next, I'm saving each construct (or part of the data from each sheet) in a separate .dta file and rename each variable to the following format: ConstructxDimensionz (Note: I have to rename the variables z1-z6 since the variables are identical on each sheet; in excel terms: the column names are the same on each sheet)

    Code:
    local sheets Dissatisfaction Construct2 Construct3
    
     foreach x of local sheets {
      import excel "\\Client\C$\Users\..._2007.xlsx", sheet("`x'") cellrange(ED7:EI372) firstrow clear
      foreach var of varlist z1 z2 ... {
      rename `var' `x'`var'
       }
      save "\\Client\C$\Users\...\Brand1_`x'", replace
       }

    Next, I merge the four files:

    Code:
    use "\\Client\C$\Users\...\Brand1_Dissatisfaction.dta"
    
    merge using "\\Client\C$\Users\...\Brand1_Construct2.dta"
    
    drop _merge
    
    merge using "\\Client\C$\Users\...\Brand1_Construct3.dta"
    
    drop _merge
    
    merge using "\\Client\C$\Users\...\Brand1_2007.dta"
    
    save "\\Client\C$\Users\...\Brand1_2007_combined.dta"
    As you can see, I'm pulling from three different sheets per excel file and three different cellranges. Since I have six annual files and need to do this for about 45 different brands, I would like to see if the above steps can be integrated into one loop. Considering the complexity of the data structure in Excel, I think one loop for each brand would be suffice since I can then append each combined file into one final data set. In other words, after putting all six annual files with data for Brand1 into one folder, I'm hoping to loop over the above commands to end up with a longitudinal data file for all six years, three constructs and one brand.

    Any suggestions would be greatly appreciated.

    Thank you!

  • #2
    I don't know if this will help, but a couple of posting's that might be helpful can be seen here and here

    Although, it looks like you are able to extract the data from Excel into various Stata files, renaming the variables, etc just fine, and now would like to automate it for each construct and then each brand.
    Code:
    * This outer loop loops over the 20 brands
    forvalues i=1/20 {
    
    * This loops over the constructs
    forvalues j=1/6 {
    merge using "\\Client\C$\Users\...\Brand`i'_Construct`j'.dta" * Do anything else with the merge drop _merge save "\\Client\C$\Users\...\Brand`i'_2007_combined.dta", replace }
    }
    Note: consider the above pseudo-code. I haven't tested it, but for each brand, it combines the 6 constructs. Once you got this working the way you want it, you could put an outer loop that loops over the years.
    Also note: I don't know why the spacing is so weird in the code window.

    Hope that helps!

    Comment


    • #3
      Hi, David,

      Thank you for your reply. I might not have been clear on what I'm looking for. The code I listed above works well to produce a .dta file for Brand1 for one year (2007). I'm trying to construct a loop that allows me to run that code over seven years. In other words, I'm hoping to start the code listed in #1 with a loop that does the above year by year. I'm not trying to loop over brands since that information is in different parts of the excel sheets. I'm only trying to loop over the years.

      Does this clarify what I'm looking for?

      Thanks,
      Annette

      Comment


      • #4
        I might have found a working solution, albeit slow:

        Code:
        cd ...
        
        local annual : dir . files "*.xlsx"
        
        save ..., emptyok replace
        
        foreach file of local annual {
        
            import excel using "`file'", sheet("Dissatisfaction") cellrange(ED6:ED6) clear
            local brand = ED[1]
            import excel using "`file'", sheet("Dissatisfaction") cellrange(A7:A372) firstrow clear
            generate str12 brand = "`brand'"
            save "`= substr("`file'", 1, strpos("`file'", ".") - 1)'.dta", replace
        
        
            local sheets Dissatisfaction ...
                foreach x of local sheets {
                import excel using "`file'", sheet("`x'") cellrange(ED7:EI372) firstrow clear
                    foreach var of varlist ... {
                    rename `var' `x'`var'
                    }
                save "`= substr("`file'", 1, strpos("`file'", ".") - 1)'_`x'.dta", replace
                }
        
            use "`= substr("`file'", 1, strpos("`file'", ".") - 1)'.dta"
        
            merge using "`= substr("`file'", 1, strpos("`file'", ".") - 1)'_....dta"
        
            drop _merge
        
            merge using "`= substr("`file'", 1, strpos("`file'", ".") - 1)'_...dta"
        
            drop _merge
        
            merge using "`= substr("`file'", 1, strpos("`file'", ".") - 1)' Dissatisfaction.dta"
        
            append using "..."
            save "...", replace
           
        }
        rename A date
        sort date
        save "...", replace

        Comment

        Working...
        X