Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping Firms data

    I have this data FirmName CompCode SicCode VariableCode Q12010 Q22010 Q32010 .......
    Firm 1 Price 5555 111 P 100 101 102
    Firm 1 Beta 5555
    Firm 1 MV 5555
    Firm 1 MTBV 5555
    Firm 1 Ret 5555
    Firm 2 Price 6666
    Firm 2 Beta 6666
    Firm 2 MTBV 6666
    ..........
    I want to reshape this data to somehting like that(See down) but is a bit complicated in my mind how should they be reshaped because I want to know which firms data am using but if i reshape them like that I won't. I want to make portoflios out of these firms based on their SicCode if it helps you to understand what I want to do. In addition some of my data are missing as you can see firm 2 doesnt have a variable MV at all. So I won't be using that data. Would you suggest deleting it from excel or doing something else in stata?
    Quarters Price Beta MV MTBV Ret
    Q12010
    Q22010
    Q32010
    ...........

  • #2
    Please see https://www.statalist.org/forums/help#stata for how to improve your data example using dataex. The example doesn't have to be real data, but it should be realistic. The storage types of variables should be exactly as in your real dataset.

    Comment


    • #3
      I have tried the command dataex it is giving me in response error 1000 which is too many variables. I am using community Stata/MP 18.0.
      I will try writing down my data exactly as it is
      Name CompCode SicCode VariableCode Q12010 Q22010 Q32010 .........
      10GENOMICSX-Price 5555H 111 P 100 101.90 102.90
      10GENOMICSX-Beta 5555H 111 WC098 0.9 0.9 0.9
      10GENOMICSX -Market Value 5555H 111 MV 250 250.39 258.1
      10GENOMICSX-MRKT to Book Value 5555H 111 MTBV 1.23 1.53 1.52
      10GENOMICSX-Ret 5555H 111 RI 259 267 275
      111INC-Price 6666E 111 P 25.36 21.36 20.30
      111INC-Beta 6666E 111 WC098 1.3 1.3 1.3
      111INC-MRKT to Book Value 6666E 111 MTBV 1.35 0.95 1.23
      ..........
      ..........
      I hope this is helpful

      Comment


      • #4
        The error message you should have got was


        input statement exceeds linesize limit. Try specifying fewer variables
        I am struggling with your examples still. I made some guesses. This may count as progress.

        Code:
        clear 
        input str42 (Name CompCode SicCode VariableCode) Q12010 Q22010 Q32010 
        "10GENOMICSX" "Price" "5555H" "111 P" 100 101.90 102.90
        "10GENOMICSX" "Beta" "5555H" "111 WC098" 0.9 0.9 0.9
        "10GENOMICSX" "Market Value" "5555H" "111 MV" 250 250.39 258.1
        "10GENOMICSX" "MRKT to Book Value" "5555H" "111 MTBV" 1.23 1.53 1.52
        "10GENOMICSX" "Ret" "5555H" "111 RI" 259 267 275
        "111INC"      "Price" "6666E" "111 P" 25.36 21.36 20.30
        "111INC"      "Beta" "6666E" "111 WC098" 1.3 1.3 1.3
        "111INC"      "MRKT to Book Value" "6666E" "111 MTBV" 1.35 0.95 1.23
        end 
        
        compress 
        
        * you start here 
        encode CompCode, gen(Which) 
        su Which, meanonly 
        local max = r(max)
        forval j = 1/`max' { 
          local label`j' : label (Which) `j' 
        }
         
        reshape long Q, i(Name CompCode) 
        rename _j Date 
        replace Date = yq(mod(Date, 10000), floor(Date/10000))
        format Date %tq 
        drop CompCode VariableCode 
        reshape wide Q, i(Name Date) j(Which)  
        
        list 
        
        forval j = 1/`max' { 
            label var Q`j' "`label`j''" 
        }
        
        describe
        Code:
        . list 
        
             +--------------------------------------------------------------------+
             |        Name     Date    Q1     Q2       Q3      Q4    Q5   SicCode |
             |--------------------------------------------------------------------|
          1. | 10GENOMICSX   2010q1    .9   1.23      250     100   259     5555H |
          2. | 10GENOMICSX   2010q2    .9   1.53   250.39   101.9   267     5555H |
          3. | 10GENOMICSX   2010q3    .9   1.52    258.1   102.9   275     5555H |
          4. |      111INC   2010q1   1.3   1.35        .   25.36     .     6666E |
          5. |      111INC   2010q2   1.3    .95        .   21.36     .     6666E |
             |--------------------------------------------------------------------|
          6. |      111INC   2010q3   1.3   1.23        .    20.3     .     6666E |
             +--------------------------------------------------------------------+
        
        . 
        . 
        . describe 
        
        Contains data
          obs:             6                          
         vars:             8                          
         size:           228                          
        ----------------------------------------------------------------------------------------------
                      storage   display    value
        variable name   type    format     label      variable label
        ----------------------------------------------------------------------------------------------
        Name            str11   %11s                  
        Date            int     %tq                   
        Q1              float   %9.0g                 Beta
        Q2              float   %9.0g                 MRKT to Book Value
        Q3              float   %9.0g                 Market Value
        Q4              float   %9.0g                 Price
        Q5              float   %9.0g                 Ret
        SicCode         str5    %9s                   
        ----------------------------------------------------------------------------------------------
        Sorted by: Name  Date

        Comment


        • #5
          sorry for not helping i think this will be the best description now.
          In my dataset i have to have maximum for each firm 5 variables(price, beta, market value, market value to book and tot return ind) I want to have as rows the Quarters and as collumns the Variable Code but I also want to know which firms name is that variable code. I hope this is helpful. Thanks for everything up to now Nick
          Name Company Code SIC Code Variable Code Q1 2004 Q2 2004 Q3 2004 Q4 2004 Q1 2005 Q2 2005
          SUNOPTA INC 997759 119 P 9.07 9.84 7.84 7.13 6.56 4.999
          SUNOPTA INC - BETA 997759 119 WC09802 1.3 1.3 1.3 1.3 1.3 1.3
          SUNOPTA INC - MARKET VALUE 997759 119 MV 472.8 520.47 418.89 381.58 366.79 281.23
          SUNOPTA INC - TOT RETURN IND 997759 119 RI 403.11 437.33 348.44 316.89 291.56 222.18
          PYXUS INTERNATIONAL 9822V8 132 P NA NA NA NA NA NA
          PYXUS INTERNATIONAL - BETA 9822V8 132 WC09802 -0.68 -0.68 -0.68 -0.68 -0.68 -0.68
          PYXUS INTERNATIONAL - MARKET VALUE 9822V8 132 MV NA NA NA NA NA NA
          PYXUS INTERNATIONAL - MRKT VALUE TO BOOK 9822V8 132 MTBV NA NA NA NA NA NA
          PYXUS INTERNATIONAL - TOT RETURN IND 9822V8 132 RI NA NA NA NA NA NA
          APPHARVEST INC 97484J 161 P NA NA NA NA NA NA
          APPHARVEST INC - BETA 97484J 161 WC09802 3.09 3.09 3.09 3.09 3.09 3.09
          APPHARVEST INC - MARKET VALUE 97484J 161 MV NA NA NA NA NA NA
          APPHARVEST INC - MRKT VALUE TO BOOK 97484J 161 MTBV NA NA NA NA NA NA
          APPHARVEST INC - TOT RETURN IND 97484J 161 RI NA NA NA NA NA NA

          Comment


          • #6
            I don't have different suggestions except that you will need replace NA with numeric missing. And if the variable description is part of the company name, then you'll need to split that off.
            Last edited by Nick Cox; 13 Nov 2023, 15:48.

            Comment

            Working...
            X