Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping long to wide with a lot of missing quarterly data

    Dear Forum,

    I obtained a large dataset from Compustat with quarterly company figures. Right now it is in long format and I would like to transform it into wide format. If I use the following code I obtain the following error message

    reshape wide roa_quarterly, i(conm)j(quarter)
    variable quarter contains missing values
    r(498);

    As you see down below this is right because there are missing quarters in my data e.g. Alpharma did not report 2009 ongoing, whereas other companies such as American Airlines do.

    conm quarter roa_quarterly
    ALPHARMA INC -CL A 2006Q1 .0284465
    ALPHARMA INC -CL A 2006Q2 .0136541
    ALPHARMA INC -CL A 2006Q3 .0139487
    ALPHARMA INC -CL A 2006Q4 .0170441
    ALPHARMA INC -CL A 2007Q1 .0097151
    ALPHARMA INC -CL A 2007Q2 .0102314
    ALPHARMA INC -CL A 2007Q3 .0115811
    ALPHARMA INC -CL A 2007Q4 -.0416313
    ALPHARMA INC -CL A 2008Q1 .120644
    ALPHARMA INC -CL A 2008Q2 -.0053788
    ALPHARMA INC -CL A 2008Q3 .0030435
    AMERICAN AIRLINES GROUP INC 2006Q1 -.0030751
    AMERICAN AIRLINES GROUP INC 2006Q2 .0094628
    AMERICAN AIRLINES GROUP INC 2006Q3 .0004979
    AMERICAN AIRLINES GROUP INC 2006Q4 .0005833
    AMERICAN AIRLINES GROUP INC 2007Q1 .0027105
    AMERICAN AIRLINES GROUP INC 2007Q2 .0104242
    AMERICAN AIRLINES GROUP INC 2007Q3 .0059124
    AMERICAN AIRLINES GROUP INC 2007Q4 -.002415
    AMERICAN AIRLINES GROUP INC 2008Q1 -.0118543
    AMERICAN AIRLINES GROUP INC 2008Q2 -.0505099
    AMERICAN AIRLINES GROUP INC 2008Q3 .0011503
    AMERICAN AIRLINES GROUP INC 2008Q4 -.0137835
    AMERICAN AIRLINES GROUP INC 2009Q1 -.0152949
    AMERICAN AIRLINES GROUP INC 2009Q2 -.0161571
    AMERICAN AIRLINES GROUP INC 2009Q3 -.0139396
    AMERICAN AIRLINES GROUP INC 2009Q4 -.0135231
    AMERICAN AIRLINES GROUP INC 2010Q1 -.0197845
    AMERICAN AIRLINES GROUP INC 2010Q2 -.000425
    AMERICAN AIRLINES GROUP INC 2010Q3 .0056395
    AMERICAN AIRLINES GROUP INC 2010Q4 -.0039063
    AMERICAN AIRLINES GROUP INC 2011Q1 -.0160808
    AMERICAN AIRLINES GROUP INC 2011Q2 -.0110909
    AMERICAN AIRLINES GROUP INC 2011Q3 -.0065537
    AMERICAN AIRLINES GROUP INC 2011Q4 -.0459158
    AMERICAN AIRLINES GROUP INC 2012Q1 -.0677247
    AMERICAN AIRLINES GROUP INC 2012Q2 -.0097064
    AMERICAN AIRLINES GROUP INC 2012Q3 -.0099887
    AMERICAN AIRLINES GROUP INC 2012Q4 .0111867
    AMERICAN AIRLINES GROUP INC 2013Q1 -.0142965
    AMERICAN AIRLINES GROUP INC 2013Q2 .0083537
    AMERICAN AIRLINES GROUP INC 2013Q3 .0107916
    AMERICAN AIRLINES GROUP INC 2013Q4 -.0473296
    AMERICAN AIRLINES GROUP INC 2014Q1 .0109747
    AMERICAN AIRLINES GROUP INC 2014Q2 .0192587
    AMERICAN AIRLINES GROUP INC 2014Q3 .0213252

    How can I transform this table that it would look like this? Or at least generate/highlight/exclude the companies already in advance of which I know they did not report every quarter for the period of 2006- 2014? Searching manually is not an option, given the amount of 500,000 observations...

    2006Q1. 2006Q2. ..... 2014Q2
    Alpharma. .0284465 .0136541 .(/nn)
    American Airline -.0030751 .0094628 .0192587

    I think I would now how to transform this in Excel, but in Stata would be much more preferred.

    Many thanks in advance for your help!

    Best
    Lara


  • #2
    Welcome to Statalist.

    Please take a moment to read the FAQ (http://www.statalist.org/forums/help) on how to post your sample data using -dataex- instead of just pasting them as text. If anyone would like to help by contributing their code, posted dataex examples are much easier to use. Your current table would require some amount of work to even test it.

    Second, I think you might have mis-interpreted the error. For long to wide to work, each id do not have to be equal in length:

    Code:
    clear
    input id time y
    1 1 5
    1 2 10
    1 3 15
    2 1 20
    2 2 25
    2 3 30
    2 4 35
    2 5 40
    end
    
    reshape wide y, i(id) j(time)
    list
    The non-data periods would be replaced by missing:

    Code:
         +-----------------------------+
         | id   y1   y2   y3   y4   y5 |
         |-----------------------------|
      1. |  1    5   10   15    .    . |
      2. |  2   20   25   30   35   40 |
         +-----------------------------+
    I think Stata is trying to tell you that you have missing values (not miss time) in the quarter variable. Try to use:

    Code:
    count if roa_qualterly == .
    and see if you have any of them. Importing from Excel often brings many, many empty rows at the bottom of the data. This is the usual culprit.

    Comment


    • #3
      I would like to back up a step and consider whether it is necessary to reshape your data to wide.

      The experienced users here generally agree that, with few exceptions, Stata's design makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. You should try to achieve what you need with the data organized as it currently is, and seek the help of Statalist in doing so. The sort of problems you will encounter trying to use your reshaped data will almost certainly be solved by reshaping the data back to long. It is much easier, for example, to compare the second observation to the first, the third to the second, and so on, than it is to compare the second variable to the first, the third to the second, etc.

      If you are familiar with Excel Pivot Tables, the data organization of a long layout is similar to the input to a Pivot Table. You have one data item (roa_quarterly) and then you have for value two descriptors - the firm (conm) and the date (quarter).

      Comment


      • #4
        Many Thanks Ken! I redid my example again with another, easier example and it worked. I thought i would have filtered out all the missing values earlier but apparently i had not!
        Now i know that it must be possible also with the larger dataset- i just need to continue trying

        Comment

        Working...
        X