Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Excel

    I need to import 3 excel files with different columns/rows into Stata.

    Excel #1 looks like this:

    Code:
    Date StockCode SharesRepurchased
    1-1-2015 US10 40
    1-1-2015 US30 10
    2-1-2015 US20 30
    2-1-2015 US40 40
    2-1-2015 US50 40
    2-1-2015 US10 10
    3-1-2015 US20 20
    4-1-2015 US20 10
    4-1-2015 US30 10
    Excel #2 looks like this:

    Code:
    Date US10_Price US10_Volume US20_Price US20_Volume US30_Price US30_Volume US40_Price US40_Volume US50_Price US50_Volume
    1-1-2015 300 6000 200 8000 100 7000 150 6000 300 8000
    2-1-2015 500 4000 400 5000 300 3000 350 6000 800 7000
    3-1-2015 200 7000 400 3000 200 2000 350 5000 700 3000
    4-1-2015 100 8000 600 3000 300 6000 250 4000 200 5000
    Excel #3 looks like this:

    Code:
    Date US10_Value US20_Value US30_Value US40_Value US50_Value
    1-1-2015 0 0 3 0 4
    2-1-2015 0 1 0 0 4
    3-1-2015 0 3 0 4 0
    4-1-2015 4 0 0 0 0
    I need to run a regression on these variables, and in order to do this I think I need a Stata file like this:

    Code:
    StockCode Date SharesRepurchased Price Volume Value
    US10 1-1-2015 40 300 6000 0
    US10 2-1-2015 10 500 4000 0
    US10 3-1-2015 0 200 7000 0 
    US10 4-1-2015 0 100 8000 4
    US20 1-1-2015 0 200 8000 0
    US20 2-1-2015 30 400 5000 1
    US20 3-1-2015 20 400 3000 3
    US20 4-1-2015 10 600 3000 0
    US30 1-1-2015 10 100 7000 3
    US30 2-1-2015 0  300 3000 0
    US30 3-1-2015 0 200 2000 0
    US30 4-1-2015 10 300 6000 0
    US40 1-1-2015 0 150 6000 0
    US40 2-1-2015 40 350 6000 0
    US40 3-1-2015 0  350 5000 4
    US40 4-1-2015 0 250 4000 0
    US50 1-1-2015 0 300 8000 4
    US50 2-1-2015 40 800 7000 4
    US50 3-1-2015 0 700 3000 0
    US50 4-1-2015 0 200 5000 0
    How can I convert the excel files such that I can get a Stata file like this? Also, in reality Excel file #2 contains too many variables, so do I need to split this excel file up?

    Thank you for your help in advance

  • #2
    An "Excel file" could be anything readable by MS Excel, including specifically the proprietary formats .xls or .xlsx. I guess you mean that specific sense.

    With students and colleagues fairly new to Stata I always recommend that they open up a dialog box with

    Code:
    db import excel
    and then look at the files in question.. But in importing it's usually essential to get Stata to read the first row as variable names, for which accept "Import first row as variable names".

    It's hard to give more precise instructions without speculating.

    But

    1. It's possible that Stata will understand your date variable as you wish, but not out of the question that it may need surgery.

    2. You can drop variables you don't want to use or -- what may be easier -- keep variables you do want to use.

    3. You need engineering to get your desired dataset layout, but you'll get better answers if you read in the data first and then show examples of what the data look like in Stata. See FAQ Advice #12 in particular.

    Comment


    • #3
      Indeed, but as yet it's too broad a question to tempt me further. You have yet to give a data example of your Stata data in the form we request.

      Comment


      • #4
        Excel 2 and Excel 3 require reshaping. From there it should just be a simple 1:1 merge. So something like this.
        Code:
        import excel Excel1.xlsx, firstrow
        save one
        clear
        
        import excel Excel2.xlsx, firstrow
        rename *_* **
        reshape long @Price @Volume, i(Date) j(StockCode) string
        save two
        clear
        
        import excel Excel3.xlsx, firstrow
        rename *_* **    
        reshape long @Value, i(Date) j(StockCode) string
        
        merge 1:1 Date StockCode using two, gen(merge1)
        merge 1:1 Date StockCode using one, gen(merge2)
        As Nick mentioned, this may or may not work depending on the storage type of your variables, so you may need to amend the code to your needs.

        Comment


        • #5
          The code in #6 for file #1 fails because no variable is called tradingdate. Perhaps that should be Date. I know enough about dataex to guess that you edited its output after the fact, which is likely to prove confusing unless you make changes consistently.

          As no variable in any dataset is called SharesRepurchased I can't relate #1 to #6.

          Sorry, I don't work myself with this kind of data and can't proceed further with this. What you want is likely to be clearer to people who do.

          Wouter Wakker posted code in #5. I guess Wouter's post crossed with yours, but please react to his suggestions.
          Last edited by Nick Cox; 31 Jul 2019, 01:07.

          Comment

          Working...
          X