Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem Reshaping Panel Data -- Wide / Long

    Hello everyone,

    I am a new user to Statalist and have tried to adhere to the forum guidelines. This has been a tremendously helpful forum for me over the years and so I thank you for all of your help.

    I have a problem reshaping my panel data. My data looks like this:
    ID Indicator q1_2006 q2_2006 q3_2006
    1 Board meetings 4 6 7
    1 Environmental score 8 9 9
    1 Social score 1 1 4
    ... ... ... ... ...
    8 Board meetings 7 1 8
    8 Environmental score 8 5 6
    8 Social score 4 3 4
    I would like to have the ID still in the same format but the Indicators as columns and current Columns (which are quarterly dates) where the Indicator variables are. Something like:

    ID Date Board meetings Environmental score Social score
    1 q1_2006
    1 q2_2006
    1 q3_2006
    ... ... ... ... ...
    8 q1_2006
    8 q2_2006
    8 q3_2006
    The closest I got was using this command: reshape long q, i(name indicator) j(date) string

    But that is still wrong as then it puts the dates AND the indicators as two single columns (instead of putting the indicators across the top as separate columns).

    Any help would be great appreciated!

    Thank you kindly.

    Roger

  • #2
    I don't really understand your "i(name indicator)", what is name? However, would it not be possible to do a second reshape to wide after having created a variable indicating rows within each q (by q: egen row=seq())? Probably you will have to change the names of the new collumns also, to get rid of the spaces in each name.

    Kind regards,
    Hanna

    Comment


    • #3
      This may help. I re-engineered your data example.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte id str19 indicator byte(q1_2006 q2_2006 q3_2006)
      1 "Board meetings"      4 6 7
      1 "Environmental score" 8 9 9
      1 "Social score"        1 1 4
      8 "Board meetings"      7 1 8
      8 "Environmental score" 8 5 6
      8 "Social score"        4 3 4
      end
      rename (q*) (y=)
      reshape long y, i(id indicator) j(date) string
      replace indicator = subinstr(indicator, " ", "_", .)
      reshape wide y , i(id date) j(indicator) string
      rename (y*) (*)
      gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))
      format ndate %tq
      list, sepby(id)
      Here's the output:

      Code:
      . * Example generated by -dataex-. To install: ssc install dataex
      . clear
      
      . input byte id str19 indicator byte(q1_2006 q2_2006 q3_2006)
      
                 id            indicator   q1_2006   q2_2006   q3_2006
        1. 1 "Board meetings"      4 6 7
        2. 1 "Environmental score" 8 9 9
        3. 1 "Social score"        1 1 4
        4. 8 "Board meetings"      7 1 8
        5. 8 "Environmental score" 8 5 6
        6. 8 "Social score"        4 3 4
        7. end
      
      . rename (q*) (y=)
      
      . reshape long y, i(id indicator) j(date) string
      (note: j = q1_2006 q2_2006 q3_2006)
      
      Data                               wide   ->   long
      -----------------------------------------------------------------------------
      Number of obs.                        6   ->      18
      Number of variables                   5   ->       4
      j variable (3 values)                     ->   date
      xij variables:
                   yq1_2006 yq2_2006 yq3_2006   ->   y
      -----------------------------------------------------------------------------
      
      . replace indicator = subinstr(indicator, " ", "_", .)
      (18 real changes made)
      
      . reshape wide y , i(id date) j(indicator) string
      (note: j = Board_meetings Environmental_score Social_score)
      
      Data                               long   ->   wide
      -----------------------------------------------------------------------------
      Number of obs.                       18   ->       6
      Number of variables                   4   ->       5
      j variable (3 values)         indicator   ->   (dropped)
      xij variables:
                                            y   ->   yBoard_meetings yEnvironmental_s
      > core ySocial_score
      -----------------------------------------------------------------------------
      
      . rename (y*) (*)
      
      . gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))
      
      . format ndate %tq
      
      . list, sepby(id)
      
           +--------------------------------------------------------+
           | id      date   Board_~s   Enviro~e   Social~e    ndate |
           |--------------------------------------------------------|
        1. |  1   q1_2006          4          8          1   2006q1 |
        2. |  1   q2_2006          6          9          1   2006q2 |
        3. |  1   q3_2006          7          9          4   2006q3 |
           |--------------------------------------------------------|
        4. |  8   q1_2006          7          8          4   2006q1 |
        5. |  8   q2_2006          1          5          3   2006q2 |
        6. |  8   q3_2006          8          6          4   2006q3 |
           +--------------------------------------------------------+

      Comment


      • #4
        Hi Roger. I've inserted a few more -list- commands in Nick's code to help you (and me!) see more clearly what's going on at each step.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte id str19 indicator byte(q1_2006 q2_2006 q3_2006)
        1 "Board meetings"      4 6 7
        1 "Environmental score" 8 9 9
        1 "Social score"        1 1 4
        8 "Board meetings"      7 1 8
        8 "Environmental score" 8 5 6
        8 "Social score"        4 3 4
        end
        list, sepby(id) // Original data
        rename (q*) (y=)
        list, sepby(id) // After -rename- command
        reshape long y, i(id indicator) j(date) string
        replace indicator = subinstr(indicator, " ", "_", .)
        list, sepby(id) // After first -reshape- command
        reshape wide y , i(id date) j(indicator) string
        list, sepby(id) // After second -reshape- command
        rename (y*) (*)
        gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))
        format ndate %tq
        list, sepby(id)
        Cheers,
        Bruce


        --
        Bruce Weaver
        Email: [email protected]
        Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
        Version: Stata/MP 18.0 (Windows)

        Comment


        • #5
          Thank you very much, all three of you. Like I said, I am a new user and I am extremely impressed with the speed and quality of responses this forum receives. It is tremendously helpful.

          I tried your code, Nick and Bruce, and I am running into errors. Namely, the first error I receive when running the second replace line of code is:

          variable indicator not constant within id date

          I think it may be because my actual data (quite long and lengthy, so not posted here) has a different number of indicators for each ID (company in this case). For example, there may be another ID, say #9, that only has two rows of data: 1 for Board meetings and 1 for Environmental score, but is missing Social score.

          Is that what the "constant" part of this error message means?

          If it's easier for me to post some of my actual data then please tell me to do so. It's the same structure as what we are discussing here (just a lot of it in terms of ID's, indicators, and dates/quarters).

          Thanks again very much!

          Roger

          Comment


          • #6
            Hello Roger. I think it would be helpful to have some of your actual data. If you include only the needed variables and not too many cases, it should not cause any great problems in the forum. Please use dataex, as Nick Cox did in post #3 of this thread. Type help dataex for more info and examples.

            Bruce
            --
            Bruce Weaver
            Email: [email protected]
            Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
            Version: Stata/MP 18.0 (Windows)

            Comment


            • #7
              Hi Bruce,

              Thanks again for the quick reply. I've downloaded dataex and copied some of my data here. A few notes that may help ease intepretation of the problem(s):
              * I have several indicators for each id (which are companies in the this case). Indicators are stable and include: cusip and isin (other codes to identify firms)
              * I have unbalanced data, where I have missing indicators for some companies. So I may have 3 indicators for 1 company and 10 for a different company. I'm not sure if this affects the reshaping.
              * The only values that should vary over time (I've included 4 quarters of data) are the indicators

              I hope this helps. Thank you again!

              Mark


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str24 id str17 cusip str12 isin str89 indicator double(q1_2007 q2_2007 q3_2007 q4_2007)
              "SCRIPPS NETWORKS INTACT." ""          ""             "Score - Shareholder Rights/Policy"                              .     .     .     .
              "SCRIPPS NETWORKS INTACT." ""          ""             "Score - Emission Reduction/Innovative Production"               .     .     .     .
              "ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Workforce /Employment Quality"                               73.5  73.5  73.5  73.5
              "ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Score - Emission Reduction/Monitoring"                       39.5  39.5  39.5  39.5
              "ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Management Training"       23.79 23.79 23.79 23.79
              "ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Improvements"              26.92 26.92 26.92 26.92
              "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
              "AKAMAI TECHS."            "00971T101" "US00971T1016" "Workforce /Employment Quality"                              91.91 82.33 82.33 82.33
              "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Improvements"                    45.45 43.26 43.26 43.26
              "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Voting Rights"                   68.82 69.82 69.82 69.82
              "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Emission Reduction/Monitoring"                          .     .     .     .
              "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Shareholder Rights/Ownership"                       61.35 63.79 63.79 63.79
              "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
              "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Internal Promotion"        33.98 26.92 26.92 26.92
              "ALPHABET 'A'"             "02079K305" "US02079K3059" "Workforce /Employment Quality"                              94.28 85.54 85.54 85.54
              "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/F-Gases Emissions"                  41  39.5  39.5  39.5
              "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Improvements"                  .     .     .     .
              "AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
              "AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Implementation"                      .     .     . 70.46
              "ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Emission Reduction/Monitoring"                      99.99     .     . 99.99
              "ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Training and Development/Improvements"              44.98     .     . 44.98
              "ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Shareholder Rights/Majority Shareholders"           53.94 53.94 53.94 53.94
              "GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Greenhouse Gas Emissions"            . 51.38 51.38 51.38
              "GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Monitoring"                      35.51 32.62 32.62 32.62
              "GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
              end

              Comment


              • #8
                This works for your more realistic example. I gave up on trying to map the distinct indicators to variable names. It struck me that the text "Score - " was useless. Otherwise there is some trickery mapping to a variable with value labels and then -- after the reshape -- copying the value labels to variable labels, as otherwise the informative text would disappear.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str24 id str17 cusip str12 isin str89 indicator double(q1_2007 q2_2007 q3_2007 q4_2007)
                "SCRIPPS NETWORKS INTACT." ""          ""             "Score - Shareholder Rights/Policy"                              .     .     .     .
                "SCRIPPS NETWORKS INTACT." ""          ""             "Score - Emission Reduction/Innovative Production"               .     .     .     .
                "ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Workforce /Employment Quality"                               73.5  73.5  73.5  73.5
                "ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Score - Emission Reduction/Monitoring"                       39.5  39.5  39.5  39.5
                "ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Management Training"       23.79 23.79 23.79 23.79
                "ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Improvements"              26.92 26.92 26.92 26.92
                "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
                "AKAMAI TECHS."            "00971T101" "US00971T1016" "Workforce /Employment Quality"                              91.91 82.33 82.33 82.33
                "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Improvements"                    45.45 43.26 43.26 43.26
                "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Voting Rights"                   68.82 69.82 69.82 69.82
                "AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Emission Reduction/Monitoring"                          .     .     .     .
                "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Shareholder Rights/Ownership"                       61.35 63.79 63.79 63.79
                "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
                "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Internal Promotion"        33.98 26.92 26.92 26.92
                "ALPHABET 'A'"             "02079K305" "US02079K3059" "Workforce /Employment Quality"                              94.28 85.54 85.54 85.54
                "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/F-Gases Emissions"                  41  39.5  39.5  39.5
                "ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Improvements"                  .     .     .     .
                "AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
                "AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Implementation"                      .     .     . 70.46
                "ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Emission Reduction/Monitoring"                      99.99     .     . 99.99
                "ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Training and Development/Improvements"              44.98     .     . 44.98
                "ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Shareholder Rights/Majority Shareholders"           53.94 53.94 53.94 53.94
                "GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Greenhouse Gas Emissions"            . 51.38 51.38 51.38
                "GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Monitoring"                      35.51 32.62 32.62 32.62
                "GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
                end
                
                rename (q*) (y=)
                reshape long y, i(id indicator) j(date) string
                replace indicator = subinstr(indicator, "Score - ", "", .)
                egen indic = group(indicator), label
                drop indicator 
                su indic, meanonly 
                local J = r(max)
                forval j = 1/`J' { 
                    local label`j' "`: label (indic) `j''" 
                } 
                reshape wide y , i(id date) j(indic) 
                forval j = 1/`J' { 
                    label var y`j' "`label`j''" 
                } 
                gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))
                format ndate %tq
                list, sepby(id)

                Comment


                • #9
                  Thank you, Nick. This is some tricky coding (some that I do not understand) so I appreciate your tenacity to figure it out. This is helping me learn.

                  I ran the code and it works wonderfully up until the second reshape, when I try to go back to wide. The error I get here is a data limit one:

                  CODE:
                  . reshape wide y , i(id date) j(indic)
                  characteristic contents too long
                  The maximum value of the contents is 67,784.
                  r(1004);

                  I have thousands of companies (groups), about 40 quarters (time periods) worth of data, and about 250 indicators (environmental, social, governance measures for the companies). I wonder where I am breaking the limit? I also wonder if I can fix this. When I transform the data to long (the first reshape) in your code then I end up with 5,649,688 cases. I know it's a big dataset but I'm hoping there's a way around this, or at least where I have to cut back my data (number of companies, number of quarters, or number of indicators) if that's the solution.

                  Again, sorry for the persistent problems. I'm feeling very close at this point. Thanks again!

                  Roger


                  Comment


                  • #10
                    You could try mapping your id variable to numeric.

                    Comment


                    • #11
                      Hi Nick,

                      Thanks again. I'm not sure what you mean by mapping. If you mean converting it to a numeric variable then Stata tells me it is already numeric (float). I wonder if you mean something different though?

                      I tried again and the reshape words when I convert the indic variable in your code to a numeric group variable and REMOVE the indic variable. Before removing indic, when I try the second reshape command in your code, Stata tells me there is an error: indic is not constant within id date

                      I firgued the problem was with that new indic variable so I converted it to a numeric group variable (1, 2, 3, etc.) and re-ran the reshape. It worked, but now my variables in the newly reshaped wide date are numbers without meaning: e.g., 1, 2, 3, etc. So I am feeling close but still a little off.

                      Do you know how to rename those new variables based on indic from 1, 2, 3, etc. to the appropriate variable name that matches the original indic value (e.g., Environmental score, Social score, etc.)?

                      I hope we've solved it after this.



                      Comment


                      • #12
                        Sorry, I am lost. The only concrete data example we have from you is #7 used in #8 in which the variable named id is clearly string. So, I don't get past your first paragraph where you seem to say otherwise.

                        A verbal report here without precise data and precise code is, unfortunately, no real help. I can't tell exactly what's going on without a reproducible example. Naturally I realise that your full dataset is too big to post here.

                        Comment


                        • #13
                          Thanks for your help, Nick and Bruce, and sorry for the delay responding -- I got quite tied up teaching. Eventually, I was able to figure out the code you supplied and make it work to transform my data in the requested manner. This was all very helpful. Thank you again!

                          Comment

                          Working...
                          X