Problem Reshaping Panel Data -- Wide / Long

Roger Clements

Join Date: Jun 2017
Posts: 40

Problem Reshaping Panel Data -- Wide / Long

16 Jun 2017, 06:32

Hello everyone,

I am a new user to Statalist and have tried to adhere to the forum guidelines. This has been a tremendously helpful forum for me over the years and so I thank you for all of your help.

I have a problem reshaping my panel data. My data looks like this:

ID	Indicator	q1_2006	q2_2006	q3_2006
1	Board meetings	4	6	7
1	Environmental score	8	9	9
1	Social score	1	1	4
...	...	...	...	...
8	Board meetings	7	1	8
8	Environmental score	8	5	6
8	Social score	4	3	4

I would like to have the ID still in the same format but the Indicators as columns and current Columns (which are quarterly dates) where the Indicator variables are. Something like:

ID	Date	Board meetings	Environmental score	Social score
1	q1_2006
1	q2_2006
1	q3_2006
...	...	...	...	...
8	q1_2006
8	q2_2006
8	q3_2006

The closest I got was using this command: reshape long q, i(name indicator) j(date) string

But that is still wrong as then it puts the dates AND the indicators as two single columns (instead of putting the indicators across the top as separate columns).

Any help would be great appreciated!

Thank you kindly.

Roger

Tags: None

Hanna Gyllensten

Join Date: Sep 2016

Posts: 19
#2

16 Jun 2017, 06:54

I don't really understand your "i(name indicator)", what is name? However, would it not be possible to do a second reshape to wide after having created a variable indicating rows within each q (by q: egen row=seq())? Probably you will have to change the names of the new collumns also, to get rid of the spaces in each name.

Kind regards,
Hanna
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35709

16 Jun 2017, 06:58

This may help. I re-engineered your data example.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str19 indicator byte(q1_2006 q2_2006 q3_2006)
1 "Board meetings"      4 6 7
1 "Environmental score" 8 9 9
1 "Social score"        1 1 4
8 "Board meetings"      7 1 8
8 "Environmental score" 8 5 6
8 "Social score"        4 3 4
end
rename (q*) (y=)
reshape long y, i(id indicator) j(date) string
replace indicator = subinstr(indicator, " ", "_", .)
reshape wide y , i(id date) j(indicator) string
rename (y*) (*)
gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))
format ndate %tq
list, sepby(id)

Here's the output:

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input byte id str19 indicator byte(q1_2006 q2_2006 q3_2006)

           id            indicator   q1_2006   q2_2006   q3_2006
  1. 1 "Board meetings"      4 6 7
  2. 1 "Environmental score" 8 9 9
  3. 1 "Social score"        1 1 4
  4. 8 "Board meetings"      7 1 8
  5. 8 "Environmental score" 8 5 6
  6. 8 "Social score"        4 3 4
  7. end

. rename (q*) (y=)

. reshape long y, i(id indicator) j(date) string
(note: j = q1_2006 q2_2006 q3_2006)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      18
Number of variables                   5   ->       4
j variable (3 values)                     ->   date
xij variables:
             yq1_2006 yq2_2006 yq3_2006   ->   y
-----------------------------------------------------------------------------

. replace indicator = subinstr(indicator, " ", "_", .)
(18 real changes made)

. reshape wide y , i(id date) j(indicator) string
(note: j = Board_meetings Environmental_score Social_score)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       18   ->       6
Number of variables                   4   ->       5
j variable (3 values)         indicator   ->   (dropped)
xij variables:
                                      y   ->   yBoard_meetings yEnvironmental_s
> core ySocial_score
-----------------------------------------------------------------------------

. rename (y*) (*)

. gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))

. format ndate %tq

. list, sepby(id)

     +--------------------------------------------------------+
     | id      date   Board_~s   Enviro~e   Social~e    ndate |
     |--------------------------------------------------------|
  1. |  1   q1_2006          4          8          1   2006q1 |
  2. |  1   q2_2006          6          9          1   2006q2 |
  3. |  1   q3_2006          7          9          4   2006q3 |
     |--------------------------------------------------------|
  4. |  8   q1_2006          7          8          4   2006q1 |
  5. |  8   q2_2006          1          5          3   2006q2 |
  6. |  8   q3_2006          8          6          4   2006q3 |
     +--------------------------------------------------------+

Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

16 Jun 2017, 07:40

Hi Roger. I've inserted a few more -list- commands in Nick's code to help you (and me!) see more clearly what's going on at each step.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str19 indicator byte(q1_2006 q2_2006 q3_2006)
1 "Board meetings"      4 6 7
1 "Environmental score" 8 9 9
1 "Social score"        1 1 4
8 "Board meetings"      7 1 8
8 "Environmental score" 8 5 6
8 "Social score"        4 3 4
end
list, sepby(id) // Original data
rename (q*) (y=)
list, sepby(id) // After -rename- command
reshape long y, i(id indicator) j(date) string
replace indicator = subinstr(indicator, " ", "_", .)
list, sepby(id) // After first -reshape- command
reshape wide y , i(id date) j(indicator) string
list, sepby(id) // After second -reshape- command
rename (y*) (*)
gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))
format ndate %tq
list, sepby(id)

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Roger Clements

Join Date: Jun 2017

Posts: 40
#5

19 Jun 2017, 06:54

Thank you very much, all three of you. Like I said, I am a new user and I am extremely impressed with the speed and quality of responses this forum receives. It is tremendously helpful.

I tried your code, Nick and Bruce, and I am running into errors. Namely, the first error I receive when running the second replace line of code is:

variable indicator not constant within id date

I think it may be because my actual data (quite long and lengthy, so not posted here) has a different number of indicators for each ID (company in this case). For example, there may be another ID, say #9, that only has two rows of data: 1 for Board meetings and 1 for Environmental score, but is missing Social score.

Is that what the "constant" part of this error message means?

If it's easier for me to post some of my actual data then please tell me to do so. It's the same structure as what we are discussing here (just a lot of it in terms of ID's, indicators, and dates/quarters).

Thanks again very much!

Roger
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#6

19 Jun 2017, 16:27

Hello Roger. I think it would be helpful to have some of your actual data. If you include only the needed variables and not too many cases, it should not cause any great problems in the forum. Please use dataex, as Nick Cox did in post #3 of this thread. Type help dataex for more info and examples.

Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Roger Clements

Join Date: Jun 2017
Posts: 40

21 Jun 2017, 01:44

Hi Bruce,

Thanks again for the quick reply. I've downloaded dataex and copied some of my data here. A few notes that may help ease intepretation of the problem(s):
* I have several indicators for each id (which are companies in the this case). Indicators are stable and include: cusip and isin (other codes to identify firms)
* I have unbalanced data, where I have missing indicators for some companies. So I may have 3 indicators for 1 company and 10 for a different company. I'm not sure if this affects the reshaping.
* The only values that should vary over time (I've included 4 quarters of data) are the indicators

I hope this helps. Thank you again!

Mark

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str24 id str17 cusip str12 isin str89 indicator double(q1_2007 q2_2007 q3_2007 q4_2007)
"SCRIPPS NETWORKS INTACT." ""          ""             "Score - Shareholder Rights/Policy"                              .     .     .     .
"SCRIPPS NETWORKS INTACT." ""          ""             "Score - Emission Reduction/Innovative Production"               .     .     .     .
"ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Workforce /Employment Quality"                               73.5  73.5  73.5  73.5
"ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Score - Emission Reduction/Monitoring"                       39.5  39.5  39.5  39.5
"ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Management Training"       23.79 23.79 23.79 23.79
"ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Improvements"              26.92 26.92 26.92 26.92
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Workforce /Employment Quality"                              91.91 82.33 82.33 82.33
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Improvements"                    45.45 43.26 43.26 43.26
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Voting Rights"                   68.82 69.82 69.82 69.82
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Emission Reduction/Monitoring"                          .     .     .     .
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Shareholder Rights/Ownership"                       61.35 63.79 63.79 63.79
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Internal Promotion"        33.98 26.92 26.92 26.92
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Workforce /Employment Quality"                              94.28 85.54 85.54 85.54
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/F-Gases Emissions"                  41  39.5  39.5  39.5
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Improvements"                  .     .     .     .
"AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
"AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Implementation"                      .     .     . 70.46
"ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Emission Reduction/Monitoring"                      99.99     .     . 99.99
"ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Training and Development/Improvements"              44.98     .     . 44.98
"ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Shareholder Rights/Majority Shareholders"           53.94 53.94 53.94 53.94
"GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Greenhouse Gas Emissions"            . 51.38 51.38 51.38
"GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Monitoring"                      35.51 32.62 32.62 32.62
"GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
end

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35709

21 Jun 2017, 04:27

This works for your more realistic example. I gave up on trying to map the distinct indicators to variable names. It struck me that the text "Score - " was useless. Otherwise there is some trickery mapping to a variable with value labels and then -- after the reshape -- copying the value labels to variable labels, as otherwise the informative text would disappear.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str24 id str17 cusip str12 isin str89 indicator double(q1_2007 q2_2007 q3_2007 q4_2007)
"SCRIPPS NETWORKS INTACT." ""          ""             "Score - Shareholder Rights/Policy"                              .     .     .     .
"SCRIPPS NETWORKS INTACT." ""          ""             "Score - Emission Reduction/Innovative Production"               .     .     .     .
"ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Workforce /Employment Quality"                               73.5  73.5  73.5  73.5
"ACTIVISION BLIZZARD"      "00507V109" "US00507V1098" "Score - Emission Reduction/Monitoring"                       39.5  39.5  39.5  39.5
"ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Management Training"       23.79 23.79 23.79 23.79
"ADOBE SYSTEMS"            "00724F101" "US00724F1012" "Score - Training and Development/Improvements"              26.92 26.92 26.92 26.92
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Workforce /Employment Quality"                              91.91 82.33 82.33 82.33
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Improvements"                    45.45 43.26 43.26 43.26
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Shareholder Rights/Voting Rights"                   68.82 69.82 69.82 69.82
"AKAMAI TECHS."            "00971T101" "US00971T1016" "Score - Emission Reduction/Monitoring"                          .     .     .     .
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Shareholder Rights/Ownership"                       61.35 63.79 63.79 63.79
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Internal Promotion"        33.98 26.92 26.92 26.92
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Workforce /Employment Quality"                              94.28 85.54 85.54 85.54
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Emission Reduction/F-Gases Emissions"                  41  39.5  39.5  39.5
"ALPHABET 'A'"             "02079K305" "US02079K3059" "Score - Training and Development/Improvements"                  .     .     .     .
"AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Policy"                          61.35 63.79 63.79 63.79
"AMAZON.COM"               "23135106"  "US0231351067" "Score - Shareholder Rights/Implementation"                      .     .     . 70.46
"ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Emission Reduction/Monitoring"                      99.99     .     . 99.99
"ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Training and Development/Improvements"              44.98     .     . 44.98
"ANALOG DEVICES"           "32654105"  "US0326541051" "Score - Shareholder Rights/Majority Shareholders"           53.94 53.94 53.94 53.94
"GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Greenhouse Gas Emissions"            . 51.38 51.38 51.38
"GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/Monitoring"                      35.51 32.62 32.62 32.62
"GOODYEAR TIRE & RUB."     "382550101" "US3825501014" "Score - Emission Reduction/NOx and SOx Emissions Reduction" 36.69 33.09 33.09 33.09
end

rename (q*) (y=)
reshape long y, i(id indicator) j(date) string
replace indicator = subinstr(indicator, "Score - ", "", .)
egen indic = group(indicator), label
drop indicator 
su indic, meanonly 
local J = r(max)
forval j = 1/`J' { 
    local label`j' "`: label (indic) `j''" 
} 
reshape wide y , i(id date) j(indic) 
forval j = 1/`J' { 
    label var y`j' "`label`j''" 
} 
gen ndate = yq(real(substr(date, -4, 4)), real(substr(date,2, 1)))
format ndate %tq
list, sepby(id)

Comment

Roger Clements

Join Date: Jun 2017

Posts: 40
#9

21 Jun 2017, 06:42

Thank you, Nick. This is some tricky coding (some that I do not understand) so I appreciate your tenacity to figure it out. This is helping me learn.

I ran the code and it works wonderfully up until the second reshape, when I try to go back to wide. The error I get here is a data limit one:

CODE:
. reshape wide y , i(id date) j(indic)
characteristic contents too long
The maximum value of the contents is 67,784.
r(1004);

I have thousands of companies (groups), about 40 quarters (time periods) worth of data, and about 250 indicators (environmental, social, governance measures for the companies). I wonder where I am breaking the limit? I also wonder if I can fix this. When I transform the data to long (the first reshape) in your code then I end up with 5,649,688 cases. I know it's a big dataset but I'm hoping there's a way around this, or at least where I have to cut back my data (number of companies, number of quarters, or number of indicators) if that's the solution.

Again, sorry for the persistent problems. I'm feeling very close at this point. Thanks again!

Roger
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#10

21 Jun 2017, 06:49

You could try mapping your id variable to numeric.
Comment
Roger Clements

Join Date: Jun 2017

Posts: 40
#11

22 Jun 2017, 08:05

Hi Nick,

Thanks again. I'm not sure what you mean by mapping. If you mean converting it to a numeric variable then Stata tells me it is already numeric (float). I wonder if you mean something different though?

I tried again and the reshape words when I convert the indic variable in your code to a numeric group variable and REMOVE the indic variable. Before removing indic, when I try the second reshape command in your code, Stata tells me there is an error: indic is not constant within id date

I firgued the problem was with that new indic variable so I converted it to a numeric group variable (1, 2, 3, etc.) and re-ran the reshape. It worked, but now my variables in the newly reshaped wide date are numbers without meaning: e.g., 1, 2, 3, etc. So I am feeling close but still a little off.

Do you know how to rename those new variables based on indic from 1, 2, 3, etc. to the appropriate variable name that matches the original indic value (e.g., Environmental score, Social score, etc.)?

I hope we've solved it after this.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#12

22 Jun 2017, 11:34

Sorry, I am lost. The only concrete data example we have from you is #7 used in #8 in which the variable named id is clearly string. So, I don't get past your first paragraph where you seem to say otherwise.

A verbal report here without precise data and precise code is, unfortunately, no real help. I can't tell exactly what's going on without a reproducible example. Naturally I realise that your full dataset is too big to post here.
Comment
Roger Clements

Join Date: Jun 2017

Posts: 40
#13

13 Jul 2017, 01:00

Thanks for your help, Nick and Bruce, and sorry for the delay responding -- I got quite tied up teaching. Eventually, I was able to figure out the code you supplied and make it work to transform my data in the requested manner. This was all very helpful. Thank you again!
Comment

Announcement

Problem Reshaping Panel Data -- Wide / Long

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment