Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import many sheets from an excel file, reshape, and merge

    Dear All, The forthcoming paper "Hanselaar, Rogier M., Stulz, René M., and van Dijk, Mathijs A. Do firms issue more equity when markets become more liquid? Volume , Issue , 2018, page" (can be found here http://jfe.rochester.edu/data.htm) offers a data set in Excel (can be downloaded from the previous link). A representative snapshot is
    Click image for larger version

Name:	excel.png
Views:	1
Size:	41.8 KB
ID:	1467415


    My question is:
    • What kind of code can be used to do the following? The complications are:
      • The firs sheet is simply a note (`Notes’), and the relevant data starts from the second sheet (variable: `Issues’) to the last one (variable: `GDP growth’).
      • For each variable, the relevant data starts from cell A6 (including the `country code’ as the name) to cell AO106. Moreover, they are in wide format, and have to be reshaped into long format.
      • Finally, the variables in long formats have to be merged into a complete dataset for analysis.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    Here is code that will loop through the worksheets starting with the second one, once you figure out what you need to do on each worksheet.
    Code:
    import excel using Hanselaar_Stulz_vanDijk_data.xlsx, describe
    forvalues ws = 2/`=r(N_worksheet)' {
        display "worksheet `ws' is named `r(worksheet_`ws')'"
        }
    Code:
    . import excel using Hanselaar_Stulz_vanDijk_data.xlsx, describe
    
                          Sheet | Range
      --------------------------+--------------------------
                          Notes | A1:B59
                         Issues | A1:AO106
                      Liquidity | A1:AO106
                 Market returns | A1:AO106
                       Turnover | A1:AO106
                     Volatility | A1:AO106
                 Liquidity risk | A1:AO106
           Market-to-book ratio | A1:AO106
           Price earnings ratio | A1:AO106
                 Dividend yield | A1:AO106
                     GDP growth | A1:AO106
                   Sales growth | A1:AO106
                     LEI growth | A1:AO106
       Idiosyncratic volatility | A1:AO106
      Stock price synchronicity | A1:AO106
                   Issues ROA>0 | A1:AO106
                   Issues ROA<0 | A1:AO106
                           IPOs | A1:AO106
                           SEOs | A1:AO106
                       Proceeds | A1:AO106
    
    . forvalues ws = 2/`=r(N_worksheet)' {
      2.     display "worksheet `ws' is named `r(worksheet_`ws')'"
      3.         }
    worksheet 2 is named Issues
    worksheet 3 is named Liquidity
    worksheet 4 is named Market returns
    worksheet 5 is named Turnover
    worksheet 6 is named Volatility
    worksheet 7 is named Liquidity risk
    worksheet 8 is named Market-to-book ratio
    worksheet 9 is named Price earnings ratio
    worksheet 10 is named Dividend yield
    worksheet 11 is named GDP growth
    worksheet 12 is named Sales growth
    worksheet 13 is named LEI growth
    worksheet 14 is named Idiosyncratic volatility
    worksheet 15 is named Stock price synchronicity
    worksheet 16 is named Issues ROA>0
    worksheet 17 is named Issues ROA<0
    worksheet 18 is named IPOs
    worksheet 19 is named SEOs
    worksheet 20 is named Proceeds
    If indeed you only want to process the worksheets through GDP Growth, change the loop to
    Code:
    forvalues ws = 2/11 {

    Comment


    • #3
      Dear William, This is a great step towards my purpose. Thanks. However, I have trouble in the next step, namely importing a data sheet, reshaping, saving the data and finally do the merging. Could you kindly give some instructions?
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        Here is the full code, but restricted for testing to just three worksheets.
        Code:
        clear
        tempfile merged
        
        import excel using Hanselaar_Stulz_vanDijk_data.xlsx, describe
        forvalues ws = 2/`=r(N_worksheet)' {
            local wsn`ws' = `"`r(worksheet_`ws')'"'
            }
        forvalues ws = 2/4 {
            import excel using Hanselaar_Stulz_vanDijk_data.xlsx, ///
                sheet(`"`wsn`ws''"') cellrange(A6:AO106) firstrow clear
            generate quarter = yq(int(YearQuarter/10),mod(YearQuarter,10))
            format quarter %tq
            drop YearQuarter
            order quarter
            rename (ARG-ZAF) (v_=)
            reshape long v_, i(quarter) j(country) string
            // at this point I forget how to convert the worksheet name into a legit variable name
            // so this is a placeholder
            rename v_ var`ws'
            if `ws' > 2 {
                merge 1:1 country quarter using `merged'
                drop _merge
                }
            save `merged', replace
            }
        
        describe
        Code:
        . describe
        
        Contains data from /var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//S_14673.000001
          obs:         4,000                          
         vars:             5                          24 Oct 2018 21:14
         size:       148,000                          
        --------------------------------------------------------------------------------------------------
                      storage   display    value
        variable name   type    format     label      variable label
        --------------------------------------------------------------------------------------------------
        quarter         float   %tq                   
        country         str9    %9s                   
        var4            double  %10.0g                
        var3            double  %10.0g                
        var2            double  %10.0g                
        --------------------------------------------------------------------------------------------------
        Sorted by: country  quarter

        Comment


        • #5
          Dear William, Thank you so much. I will give it a try ASAP.
          Ho-Chuan (River) Huang
          Stata 17.0, MP(4)

          Comment


          • #6
            Dear William, I'd like to thank you again. The code works fine. In case you recall how to convert the worksheet name into a legitimate variable name, please let me know. Thanks in advance.
            Ho-Chuan (River) Huang
            Stata 17.0, MP(4)

            Comment


            • #7
              As usual, I will point to xls2dta (SSC). It is basically just the outer loop in William's code. Anyway, here are the steps

              1. Put William's code into a program to be executed in each Excel file (I have added the sheetnames)
              2. Convert all Excel files to Stata datasets
              3. Run the program in 1. for each converted file
              4. merge the resulting files, replacing the data in memory
              5. Optionally, erase the single files from disk (permanently)

              I have added lots of reporting options so you see what is happening. Here is the code

              Code:
              capture program drop reshapesheet
              program reshapesheet
                  // William's reshape
                  generate quarter = yq(int(YearQuarter/10),mod(YearQuarter,10))
                  format quarter %tq
                  drop YearQuarter
                  order quarter
                  rename (ARG-ZAF) (v_=)
                  // get worksheet name
                  local name = strtoname(sheet[1])
                  drop file sheet
                  // now reshape
                  reshape long v_, i(quarter) j(country) string
                  rename v_ `name'
              end
              
              // convert files
              xls2dta , sheets(1 , not) generate(file sheet) verbose :       ///
                        import excel using Hanselaar_Stulz_vanDijk_data.xlsx ///
                        , cellrange(A6:AO106) firstrow
              
              // reshape
              xls2dta , noisily : xeq reshapesheet
              
              // merge
              xls2dta , clear verbose : merge 1:1 country quarter
              
              // optionally erase the single files
              xls2dta , erase : erase
              Best
              Daniel

              Comment


              • #8
                Dear Daniel, I appreciate this helpful suggestion (The code works well). In fact, I have thought about using -xls2dta- command, but didn't know haw to start. Thanks again.
                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)

                Comment


                • #9
                  Dear all,


                  I am analysing data from a randomized trial (var "inter"). I have several variables and am using putexcel for the first time.

                  I am using the code below (I've found online) to putexcel 2 diferent variables (tipo_casa + agua_encanada). I'm exporting to different sheet because I don't know how to order the second variable (agua_encanada) bellow the first variable (tipo_casa). I mean, if I use the same sheet, putexcel overwrite the first variable with the results from the second variable.

                  Probably these lines says where the values must be placed on the excel file:

                  local CellRange = "`UpperLeft':`UpperRight'"
                  putexcel `CellRange', border(bottom)

                  local CellRange = "`BottomLeft':`BottomRight'"
                  putexcel `CellRange', border(top)

                  local CellRange = "`UpperLeft':`BottomLeft'"
                  putexcel `CellRange', border(right)

                  local CellRange = "`UpperRight':`BottomRight'"
                  putexcel `CellRange', border(left)


                  However, I don't know how to change it automatically (or even manually). As I have said, I need to run the code for several variables (categorical and numerical variables) and would be great to do it automatically. Any suggestions ou ideas?



                  Best wishes,

                  ______________________________________________



                  local RowVar = "tipo_casa"
                  local ColVar = "inter"
                  tabulate `RowVar' if !missing(`ColVar'), matcell(rowtotals)
                  tabulate `ColVar' if !missing(`RowVar'), matcell(coltotals)
                  tabulate `RowVar' `ColVar', matcell(cellcounts)
                  local RowCount = r(r)
                  local ColCount = r(c)
                  local TotalCount = r(N)

                  levelsof `RowVar', local(RowLevels)
                  local RowValueLabel : value label `RowVar'

                  levelsof `ColVar', local(ColLevels)
                  local ColValueLabel : value label `ColVar'

                  putexcel set teste.xlsx, sheet(teste1) replace

                  forvalues row = 1/`RowCount' {

                  local RowValueLabelNum = word("`RowLevels'", `row')
                  local CellContents : label `RowValueLabel' `RowValueLabelNum'
                  local Cell = char(64 + 1) + string(`row'+1)
                  putexcel `Cell' = "`CellContents'", right

                  local CellContents = rowtotals[`row',1]
                  local Cell = char(64 + `ColCount' + 2) + string(`row' + 1)
                  putexcel `Cell' = "`CellContents'", hcenter

                  forvalues col = 1/`ColCount' {
                  local cellcount = cellcounts[`row',`col']
                  local cellpercent = string(100*`cellcount'/rowtotals[`row',1],"%9.1f")
                  local CellContents = "`cellcount' (`cellpercent'%)"
                  local Cell = char(64 + `col' + 1) + string(`row' + 1)
                  putexcel `Cell' = "`CellContents'", right

                  if `row'==1 {
                  local ColValueLabelNum = word("`ColLevels'", `col')
                  local CellContents : label `ColValueLabel' `ColValueLabelNum'
                  local Cell = char(64 + `col' + 1) + string(1)
                  putexcel `Cell' = "`CellContents'", hcenter

                  local CellContents = coltotals[`col',1]
                  local Cell = char(64 + `col' + 1) + string(`RowCount' + 2)
                  putexcel `Cell' = "`CellContents'", hcenter
                  }
                  }
                  }

                  local Cell = char(64 + `ColCount' + 2) + string(`RowCount' + 2)
                  putexcel `Cell' = "`TotalCount'", hcenter

                  local Cell = char(64 + `ColCount' + 2) + string(1)
                  putexcel `Cell' = "Total", hcenter

                  local Cell = char(64 + 1) + string(`RowCount' + 2)
                  putexcel `Cell' = "Total", right

                  local UpperLeft = char(64 + 1)+ string(1)
                  local UpperRight = char(64 + `ColCount' + 2)+ string(1)
                  local BottomLeft = char(64 + 1)+ string(`RowCount'+2)
                  local BottomRight = char(64 + `ColCount' + 2)+ string(`RowCount'+2)

                  local CellRange = "`UpperLeft':`UpperRight'"
                  putexcel `CellRange', border(bottom)

                  local CellRange = "`BottomLeft':`BottomRight'"
                  putexcel `CellRange', border(top)

                  local CellRange = "`UpperLeft':`BottomLeft'"
                  putexcel `CellRange', border(right)

                  local CellRange = "`UpperRight':`BottomRight'"
                  putexcel `CellRange', border(left)





                  local RowVar = "agua_encanada"
                  local ColVar = "inter"
                  tabulate `RowVar' if !missing(`ColVar'), matcell(rowtotals)
                  tabulate `ColVar' if !missing(`RowVar'), matcell(coltotals)
                  tabulate `RowVar' `ColVar', matcell(cellcounts)
                  local RowCount = r(r)
                  local ColCount = r(c)
                  local TotalCount = r(N)

                  levelsof `RowVar', local(RowLevels)
                  local RowValueLabel : value label `RowVar'

                  levelsof `ColVar', local(ColLevels)
                  local ColValueLabel : value label `ColVar'

                  putexcel set teste.xlsx, sheet(teste2)


                  forvalues row = 1/`RowCount' {

                  local RowValueLabelNum = word("`RowLevels'", `row')
                  local CellContents : label `RowValueLabel' `RowValueLabelNum'
                  local Cell = char(64 + 1) + string(`row'+1)
                  putexcel `Cell' = "`CellContents'", right

                  local CellContents = rowtotals[`row',1]
                  local Cell = char(64 + `ColCount' + 2) + string(`row' + 1)
                  putexcel `Cell' = "`CellContents'", hcenter

                  forvalues col = 1/`ColCount' {
                  local cellcount = cellcounts[`row',`col']
                  local cellpercent = string(100*`cellcount'/rowtotals[`row',1],"%9.1f")
                  local CellContents = "`cellcount' (`cellpercent'%)"
                  local Cell = char(64 + `col' + 1) + string(`row' + 1)
                  putexcel `Cell' = "`CellContents'", right

                  if `row'==1 {
                  local ColValueLabelNum = word("`ColLevels'", `col')
                  local CellContents : label `ColValueLabel' `ColValueLabelNum'
                  local Cell = char(64 + `col' + 1) + string(1)
                  putexcel `Cell' = "`CellContents'", hcenter

                  local CellContents = coltotals[`col',1]
                  local Cell = char(64 + `col' + 1) + string(`RowCount' + 2)
                  putexcel `Cell' = "`CellContents'", hcenter
                  }
                  }
                  }

                  local Cell = char(64 + `ColCount' + 2) + string(`RowCount' + 2)
                  putexcel `Cell' = "`TotalCount'", hcenter

                  local Cell = char(64 + `ColCount' + 2) + string(1)
                  putexcel `Cell' = "Total", hcenter

                  local Cell = char(64 + 1) + string(`RowCount' + 2)
                  putexcel `Cell' = "Total", right

                  local UpperLeft = char(64 + 1)+ string(1)
                  local UpperRight = char(64 + `ColCount' + 2)+ string(1)
                  local BottomLeft = char(64 + 1)+ string(`RowCount'+2)
                  local BottomRight = char(64 + `ColCount' + 2)+ string(`RowCount'+2)

                  local CellRange = "`UpperLeft':`UpperRight'"
                  putexcel `CellRange', border(bottom)

                  local CellRange = "`BottomLeft':`BottomRight'"
                  putexcel `CellRange', border(top)

                  local CellRange = "`UpperLeft':`BottomLeft'"
                  putexcel `CellRange', border(right)

                  local CellRange = "`UpperRight':`BottomRight'"
                  putexcel `CellRange', border(left)




                  Comment


                  • #10
                    I found myself that "string" command says which row. However, I cannot fix it properly.
                    Best wishes,

                    Comment

                    Working...
                    X