Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems while reshaping

    Hi all,

    I've tried to do it myself using examples on the internet and watching videos, but my brains are a bit fried. I currently have the closing prices of 285 companies between 2004 and 2022. The dataset is generated by Datastream in a Wide/Long format? Currently my dataset is as follows
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 date str5 com_haab str6(com_habn com_hbowe) str7 com_hbesi
    "1-1-2004"  "18,55" "NA" "9,39" "3,1301"
    "2-1-2004"  "18,88" "NA" "9,4"  "3,1161"
    "5-1-2004"  "18,83" "NA" "9,5"  "3,0835"
    "6-1-2004"  "18,84" "NA" "9,51" "3,0509"
    "7-1-2004"  "18,61" "NA" "9,54" "3,0369"
    "8-1-2004"  "18,71" "NA" "9,58" "3,1906"
    "9-1-2004"  "18,57" "NA" "9,71" "3,1534"
    "12-1-2004" "18,55" "NA" "9,6"  "3,1673"
    "13-1-2004" "18,52" "NA" "9,49" "3,2279"
    "14-1-2004" "18,65" "NA" "9,49" "3,2372"
    "15-1-2004" "19,24" "NA" "9,48" "3,2139"
    "16-1-2004" "19,35" "NA" "9,51" "3,2419"
    "19-1-2004" "19,48" "NA" "9,42" "3,349" 
    "20-1-2004" "19,45" "NA" "9,4"  "3,3164"
    "21-1-2004" "19,63" "NA" "9,24" "3,2605"
    "22-1-2004" "19,79" "NA" "9,34" "3,1813"
    "23-1-2004" "19,58" "NA" "9,43" "3,2372"
    "26-1-2004" "19,39" "NA" "9,4"  "3,2605"
    "27-1-2004" "19,6"  "NA" "9,5"  "3,2651"
    "28-1-2004" "19,7"  "NA" "9,5"  "3,2139"
    end
    I want the dataset to become long that the 18 years are repeated, the ID of the company and it's closing prices. So I want 3 columns, 1st date, 2nd CompanyId and 3rd matching closing prices.
    I've tried it myself but keep getting different errors.

    Kind regards,
    Stijn van den Berg

  • #2
    it looks like your data is in long format, can you show what you want the data to look like? also, do you have a company identifier?

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str10 date str9 companyticker str7 price
      "1-1-2004" "com_haab"  "18,55" 
      "2-1-2004" "com_haab"  "18,88" 
      "5-1-2004" "com_haab"  "18,83" 
      "6-1-2004" "com_haab"  "18,84" 
      "7-1-2004" "com_haab"  "18,61" 
      "1-1-2004" "com_habn"  "NA"    
      "2-1-2004" "com_habn"  "NA"    
      "5-1-2004" "com_habn"  "NA"    
      "6-1-2004" "com_habn"  "NA"    
      "7-1-2004" "com_habn"  "NA"    
      "1-1-2004" "com_hbowe" "9,39"  
      "2-1-2004" "com_hbowe" "9,4"   
      "5-1-2004" "com_hbowe" "9,5"   
      "6-1-2004" "com_hbowe" "9,51"  
      "7-1-2004" "com_hbowe" "9,54"  
      "1-1-2004" "com_hbesi" "3,1301"
      "2-1-2004" "com_hbesi" "3,1161"
      "5-1-2004" "com_hbesi" "3,0835"
      "6-1-2004" "com_hbesi" "3,0509"
      "7-1-2004" "com_hbesi" "3,0369"
      end
      I want my data to look something like this. The company identifiers are the variable names so com_"Ticker of Company". So i want the dates to repeat themselves with accompanying companies and their prices. So i guess the data is already in long format but I want it less wide? If that's possible?

      Comment


      • #4
        This appears to work for your data example. It goes beyond what you ask, as you would find string dates and prices to be fairly useless for analysis.

        Code:
         
         * Example generated by -dataex-. For more info, type help dataex clear input str10 date str5 com_haab str6(com_habn com_hbowe) str7 com_hbesi "1-1-2004"  "18,55" "NA" "9,39" "3,1301" "2-1-2004"  "18,88" "NA" "9,4"  "3,1161" "5-1-2004"  "18,83" "NA" "9,5"  "3,0835" "6-1-2004"  "18,84" "NA" "9,51" "3,0509" "7-1-2004"  "18,61" "NA" "9,54" "3,0369" "8-1-2004"  "18,71" "NA" "9,58" "3,1906" "9-1-2004"  "18,57" "NA" "9,71" "3,1534" "12-1-2004" "18,55" "NA" "9,6"  "3,1673" "13-1-2004" "18,52" "NA" "9,49" "3,2279" "14-1-2004" "18,65" "NA" "9,49" "3,2372" "15-1-2004" "19,24" "NA" "9,48" "3,2139" "16-1-2004" "19,35" "NA" "9,51" "3,2419" "19-1-2004" "19,48" "NA" "9,42" "3,349"  "20-1-2004" "19,45" "NA" "9,4"  "3,3164" "21-1-2004" "19,63" "NA" "9,24" "3,2605" "22-1-2004" "19,79" "NA" "9,34" "3,1813" "23-1-2004" "19,58" "NA" "9,43" "3,2372" "26-1-2004" "19,39" "NA" "9,4"  "3,2605" "27-1-2004" "19,6"  "NA" "9,5"  "3,2651" "28-1-2004" "19,7"  "NA" "9,5"  "3,2139" end ds date, not 
        
        rename (`r(varlist)') (price=)
        reshape long price, i(date) j(companyticker) string 
        
        gen betterdate = daily(date,"DMY") 
        drop date 
        rename betterdate date 
        format date %td 
        
        destring price, dpcomma ignore("NA") replace

        Comment


        • #5
          Nick Cox thanks for your reply! Unfortunately I wasn't able to reshape my data with your code or similar to your code. Secondly, I would like to transform the data in the first post to something similar in the 3rd post. The raw data looks like the following:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str10 v1 str32 v2 str13 v3 str31 v4 str27 v5 str7 v6
          "Name"      "ABN AMRO HOLDING DEAD - 25/04/08" "ABN AMRO BANK" "WESSANEN DEAD - DELIST.01/11/19" "BE SEMICONDUCTOR INDUSTRIES" "CORBION"
          "Date"      "H:AAB"                            "H:ABN"         "H:BOWE"                          "H:BESI"                      "H:CRBN" 
          "1-1-2004"  "18,55"                            "NA"            "9,39"                            "3,1301"                      "14,2874"
          "2-1-2004"  "18,88"                            "NA"            "9,4"                             "3,1161"                      "14,6252"
          "5-1-2004"  "18,83"                            "NA"            "9,5"                             "3,0835"                      "14,7076"
          "6-1-2004"  "18,84"                            "NA"            "9,51"                            "3,0509"                      "14,4439"
          "7-1-2004"  "18,61"                            "NA"            "9,54"                            "3,0369"                      "14,5593"
          "8-1-2004"  "18,71"                            "NA"            "9,58"                            "3,1906"                      "14,6581"
          "9-1-2004"  "18,57"                            "NA"            "9,71"                            "3,1534"                      "14,5263"
          "12-1-2004" "18,55"                            "NA"            "9,6"                             "3,1673"                      "14,5346"
          "13-1-2004" "18,52"                            "NA"            "9,49"                            "3,2279"                      "14,6087"
          "14-1-2004" "18,65"                            "NA"            "9,49"                            "3,2372"                      "14,7488"
          "15-1-2004" "19,24"                            "NA"            "9,48"                            "3,2139"                      "14,7817"
          "16-1-2004" "19,35"                            "NA"            "9,51"                            "3,2419"                      "14,7241"
          "19-1-2004" "19,48"                            "NA"            "9,42"                            "3,349"                       "14,6252"
          "20-1-2004" "19,45"                            "NA"            "9,4"                             "3,3164"                      "14,6581"
          "21-1-2004" "19,63"                            "NA"            "9,24"                            "3,2605"                      "14,584" 
          "22-1-2004" "19,79"                            "NA"            "9,34"                            "3,1813"                      "14,7158"
          "23-1-2004" "19,58"                            "NA"            "9,43"                            "3,2372"                      "14,584" 
          "26-1-2004" "19,39"                            "NA"            "9,4"                             "3,2605"                      "14,5675"
          end
          So what I did was using the 3rd row as the variable names giving the follow data after some destringing:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float date double(haab habn hbowe hbesi hcrbn)
          16071 18.55 . 9.39 3.1301 14.2874
          16072 18.88 .  9.4 3.1161 14.6252
          16075 18.83 .  9.5 3.0835 14.7076
          16076 18.84 . 9.51 3.0509 14.4439
          16077 18.61 . 9.54 3.0369 14.5593
          16078 18.71 . 9.58 3.1906 14.6581
          16079 18.57 . 9.71 3.1534 14.5263
          16082 18.55 .  9.6 3.1673 14.5346
          16083 18.52 . 9.49 3.2279 14.6087
          16084 18.65 . 9.49 3.2372 14.7488
          16085 19.24 . 9.48 3.2139 14.7817
          16086 19.35 . 9.51 3.2419 14.7241
          16089 19.48 . 9.42  3.349 14.6252
          16090 19.45 .  9.4 3.3164 14.6581
          16091 19.63 . 9.24 3.2605  14.584
          16092 19.79 . 9.34 3.1813 14.7158
          16093 19.58 . 9.43 3.2372  14.584
          16096 19.39 .  9.4 3.2605 14.5675
          16097  19.6 .  9.5 3.2651 14.5428
          16098  19.7 .  9.5 3.2139 14.5181
          end
          format %td date
          What I would like is only 3 columns, the first with the date (Which should repeat itself after 18 years, for each company 18 years), the second with the company ticker which are now the variable names and the last column to be the prices corresponding to the company and date. I hope this information will suffice

          Kind regards,
          Stijn van den Berg

          Comment


          • #6
            If it's a different problem each time, it's a different solution each time.

            Comment


            • #7
              That looks like the same ideas recast slightly.

              Code:
              ds date, not 
              rename (`r(varlist)') (price=)
              reshape long price, i(date) j(companyticker) string

              Comment

              Working...
              X