Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble with reshaping datastream data

    Hi,

    I am currently working on my Master Thesis and I have obtained data from datastream. Unfortunately, a time series request in Datastream loads the relevant variables horizontally in one column and the years vertically. I am trying to reshape the data in Stata but so far I haven't been able to succeed. Company identifiers such as name (Name) and datastream code (Dscode) are already in long format, I would like to have the annual data (columns Y2002-2014) in long format as well and the variables which are currently in one column (Code) in wide format. The are 37 variables represented in the Code column which are at this point all included 248 times in the column (=N).

    Does anyone have any suggestions for me?

    Thanks in advance!
    Attached Files

  • #2
    Please posts a data example using dataex, see also the explanation on how and why in the FAQ (http://www.statalist.org/forums/help#stata).
    This makes it a lot easier for people on the forum to copy your data into Stata.
    Code:
    ssc install dataex
    dataex

    Comment


    • #3
      I would also recommend dataex. The example below shows how it is used.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str1 Name float Dscode str3 Code float(Y2002 Y2003 Y2004)
      "A" 1 "(A)" 1 5  9
      "A" 1 "(B)" 2 6 10
      "B" 2 "(A)" 3 7 11
      "B" 2 "(B)" 4 8 12
      end
      
      * Reshape from wide to long
      reshape long Y, i(Name Dscode Code) j(year)
      ren Y data
      
      * Convert Code to numeric variable
      ren Code Codeold
      encode Codeold, gen(Code)
      drop Codeold
      
      * Copy value labels
      levelsof Code, local(codes)
      foreach c of local codes {
        local lab`c' : label Code `c'
      }
      
      * Reshape from long to wide
      reshape wide data, i(Name Dscode year) j(Code)
      
      * Assign variable labels
      foreach c of local codes {
        lab var data`c' "`lab`c''"
      }

      Comment


      • #4
        I'm thinking the variable name creation could be a bit more simple:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str1 Name float Dscode str10 Code float(Y2002 Y2003 Y2004)
        "A" 1 "(CGBS)" 1 5  9
        "A" 1 "(ECNSCORE)" 2 6 10
        "B" 2 "(CGBS)" 3 7 11
        "B" 2 "(ECNSCORE)" 4 8 12
        end
        
        reshape long Y, i(Name Dscode Code) j(year)
        ren Y data
        *generate code name without () to be used as var names
        gen codenew =  substr(Code, 2, strlen(Code)-2)
        drop Code
        reshape wide data, i(Name Dscode year) j(codenew) string
        ren data* *
        where substr and strlen are string manipulation functions: http://www.stata.com/manuals14/fnstringfunctions.pdf

        Comment


        • #5
          My data looks very similar and the code works well on the examples but I could not figure out how to do it with my data.


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str61 Description str16 Code_Mix str7 ID_S str9 Datatypes str18 Q float qdate
          "DR HOENLE AG - NET PROCEEDS FROM SALE/ISSUE C" "13410M(WC04251A)" "D13410M" "WC04251A" ""     188
          "DR HOENLE AG - EARNINGS BEF INTEREST & TAXES"  "13410M(WC18191A)" "D13410M" "WC18191A" "1490" 212
          "DR HOENLE AG - NET PROCEEDS FROM SALE/ISSUE C" "13410M(WC04251A)" "D13410M" "WC04251A" ""     217
          "DR HOENLE AG - RESEARCH & DEVELOPMENT"         "13410M(WC01201A)" "D13410M" "WC01201A" "236"  220
          "DR HOENLE AG - PROPERTY, PLANT & EQUIP - NET"  "13410M(WC02501)"  "D13410M" "WC02501"  "6027" 201
          "DR HOENLE AG - PROPERTY, PLANT & EQUIP - NET"  "13410M(WC02501)"  "D13410M" "WC02501"  "6210" 196
          "DR HOENLE AG - CASH & SHORT TERM INVESTMENTS"  "13410M(WC02001)"  "D13410M" "WC02001"  "4378" 211
          "DR HOENLE AG - TOTAL INTANGIBLE OT ASSETS-NET" "13410M(WC02649A)" "D13410M" "WC02649A" "8673" 203
          "DR HOENLE AG - TOTAL DEBT"                     "13410M(WC03255A)" "D13410M" "WC03255A" "806"  192
          "DR HOENLE AG - COM/PFD PURCHASED, RETIRED, CO" "13410M(WC04751A)" "D13410M" "WC04751A" ""     211
          end
          format %tq qdate

          I tried this, getting an error message, I do not understand.

          . reshape wide Q, i(Description, Code_Mix, ID_S, qdate) j(Datatypes)
          invalid 'ID_S'
          r(198);

          What have I done wrong?

          Comment


          • #6
            It's quite difficult to understand what you hope to end up with actually.

            Your exact code could be fixed by dropping comma's and adding string option:
            Code:
            reshape wide Q, i(Description Code_Mix ID_S qdate) j(Datatypes) string
            But I'm really not sure if the result is what you wanted.

            If not what you wanted., explain a bit more about what we are seeing in this data example.

            Comment


            • #7
              Jorrit, you are right about the comma (probably due to years of modeling in Excel). However, it did work out finally

              . reshape wide Q, i(Description Code_Mix ID_S qdate) j(Datatypes) string

              (note: j = MV WC01001A WC01101A WC01201A WC01251A WC02001 WC02001A WC02501 WC02501A WC02649A WC02999A WC03051A WC03251A WC03255A WC03263A
              > WC03351A WC03451A WC03995A WC04251A WC04601A WC04751A WC07015 WC07230A WC08301A WC18100A WC18191A WC18198A, WC18199A WC18232A)
              , not allowed
              r(101);

              The errorcode above gives you a list of Datastream variables (like in the old post from 2017 - > MS Excel picture). These are mostly balance sheet items that I want to reshape from long to wide.

              What is meant by "not allowed" and how can I make it work?

              It the data to large with about 4 million observations?

              Comment


              • #8
                You cannot have the comma in your variable names.
                Code:
                (note: j = MV WC01001A ... WC18198A, WC18199A
                Remove it from your datatypes variable before trying to reshape.
                Last edited by Jorrit Gosens; 05 Mar 2019, 11:44.

                Comment


                • #9
                  Great, it worked, thanks very much. I had never thought of the "," causing these problems.

                  Comment

                  Working...
                  X