Reshape long format

Cali Su

Join Date: Mar 2022
Posts: 5

Reshape long format

10 Mar 2022, 15:05

Hi everyone,

The whole file is attached below
How can I calculate the average value for DM Age colume of each company, the proportion of females and reshape wide to long format?
I tried to attach id for each company by using the command: egen company_id = group (BvD ID number) but the order of companies then changed -> is it ok?
I use the command bysort companyname: egen average_age = mean(Age) after encode the variable DMAge but it doesn't work.

Thank you

	Company name	BvD ID number	Operating revenue (Turnover) th EUR 2015	Operating revenue (Turnover) th EUR 2016	Operating revenue (Turnover) th EUR 2017	Number of directors & managers	DM Age	DM Gender
1.	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	77	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	67	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	67	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	67	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	67	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	68	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	68	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	68	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	68	M
	NESTLÉ S.A.	CHCHE105909036	82,665,557	84,259,512	77,466,693	163	67	M
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80	63	M
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80	63	M
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80	63	M
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80	63	M
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80	60	F
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80	63	M
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80		M
	ROCHE HOLDING AG	CHCHE101602521	46,665,159	49,148,191	47,659,316	80	61	M
3.	NOVARTIS AG	CHCHE103867266	48,163,868	48,726,896	43,445,359	140	66	M
	NOVARTIS AG	CHCHE103867266	48,163,868	48,726,896	43,445,359	140	66	M
	NOVARTIS AG	CHCHE103867266	48,163,868	48,726,896	43,445,359	140	70	M
	NOVARTIS AG	CHCHE103867266	48,163,868	48,726,896	43,445,359	140	64	F
	NOVARTIS AG	CHCHE103867266	48,163,868	48,726,896	43,445,359	140	71	F
4.	SOCAR TRADING SA	CHCHE113990112	n.a.	n.a.	n.a.	13	56	M
	SOCAR TRADING SA	CHCHE113990112	n.a.	n.a.	n.a.	13	44	M
	SOCAR TRADING SA	CHCHE113990112	n.a.	n.a.	n.a.	13	44	F
	SOCAR TRADING SA	CHCHE113990112	n.a.	n.a.	n.a.	13	62	M
	SOCAR TRADING SA	CHCHE113990112	n.a.	n.a.	n.a.	13	62	M
5.	COOP-GRUPPE GENOSSENSCHAFT	CHCHE109029938	24,934,747	26,445,305	24,970,144	71	66	M
	COOP-GRUPPE GENOSSENSCHAFT	CHCHE109029938	24,934,747	26,445,305	24,970,144	71	57	M
	COOP-GRUPPE GENOSSENSCHAFT	CHCHE109029938	24,934,747	26,445,305	24,970,144	71	57	M
	COOP-GRUPPE GENOSSENSCHAFT	CHCHE109029938	24,934,747	26,445,305	24,970,144	71	57	M

Attached Files

reda.xlsx (996.6 KB, 1 view)

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

10 Mar 2022, 15:45

I use the command bysort companyname: egen average_age = mean(Age) after encode the variable DMAge but it doesn't work.

-encoding- this variable is definitely the wrong thing to do. Either it will give you an error message (if DMAge is already numeric within Stata) or it will give you incorrect results (if it was for some reason a string variable in Stata). For future reference, never use -encode- on a variable that, to human eyes, looks like a number. Such a variable, if Stata has it as a string variable, should be converted to numeric using -destring-, not -encode-. The purpose of -encode- is entirely different; I won't elaborate on that here.

However, assuming that age variable is really numeric, -bysort companyname: egen average_age = mean(DMAge)- will give you the mean age. If it is actually a string variable, then run -destring DMAge, replace- first.

To get the proportion of females: -bysort companyname: egen proportion_female = mean(DM_Gender == "F")-

If you need more specific advice you need to post back. Do not attach an Excel file. Many of us won't download any attachments from people we don't know, especially not Microsoft Office documents, which can contain active malware. Do not, in fact, attach anything. First import your data into Stata. Then use the -dataex- command to show example data. It is the Stata data that is critical to providing you with code and advice that will actually work. How the data looks in Excel does not provide important information, such as whether a variable like DMAge is a string or numeric.

If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Cali Su

Join Date: Mar 2022

Posts: 5
#3

11 Mar 2022, 05:14

Hello Clyde,

Thank you for your help. My stata version is 13 and I run -help dataex- to find instructions, but can't take an example data.
I ran this command -egen firm_id = group (BvD ID number)- but then the firm_ id started with 1866 instead of 1 -> Is it ok ?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

11 Mar 2022, 11:03

-dataex- is not part of Stata version 13. But you can get it by running -ssc install dataex-. Then, open your Stata data file (or import your Excel file into Stata) and then run -dataex-. Copy the output you get from the Results window into the Forum editor. That will give the first 100 observations in your data set as an example.

I don't know what to make of what you are reporting about using -egen firm_id = group(BVD ID number)-. That cannot be the exact code you ran, unless your data set has three separate variables named BVD, ID, and number. You seem to have a single variable corresponding, I think, to column B of your spreadsheet. When imported into Stata its name would be run together as BvDIDnumber.

Also, when responding, please don't say that something "didn't work." Explain exactly what went wrong as clearly as you can. Better still, show the actual exact results you got and explain how that differs from what you wanted to get.
Comment
Cali Su

Join Date: Mar 2022

Posts: 5
#5

12 Mar 2022, 01:34

Below is the example of my data. Could you show me to attach id to companies and convert wide to long format (my data has some missing values for colume Turnover 2015,2016,2017) ? The firm_id starts with 452 instead of 1.

input str5 A str97 Companyname str14 BvDIDnumber str17(Turnover2015 Turnover2016 Turnover2017) int Numberofdirectorsmanagers str3 DMAge str1 DMGender float firm_id
"1." "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "77" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
"" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
"2." "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "61" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "61" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "63" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "63" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "63" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "60" "F" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "60" "F" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "67" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "67" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "65" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "65" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "64" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "64" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "38" "M" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "55" "F" 134
"" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "55" "F" 134
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

12 Mar 2022, 07:40

Your data show two values of BvDIDnumber with two values of firm_id

Code:

CHCHE105909036 452 CHCHE101602521 134

You think that the group numbering should start at 1 with the first observation, but in fact Stata first sorts your data by BvDIDnumber and then assigns group numbers. The fact that firm_id 134 has a lower BvDIDnumber than firm_id 452, even though it appears later in your data, supports this explanation.

The reason Stata sorts the data is to get every observation with the same BvDIDnumber together; otherwise, for each observation it would potentially have to look at all preceeding observations to see if that BvDIDnumber had appeared earlier.
Comment
Cali Su

Join Date: Mar 2022

Posts: 5
#7

12 Mar 2022, 11:35

You think that the group numbering should start at 1 with the first observation, but in fact Stata first sorts your data by BvDIDnumber and then assigns group numbers. The fact that firm_id 134 has a lower BvDIDnumber than firm_id 452, even though it appears later in your data, supports this explanation.

The reason Stata sorts the data is to get every observation with the same BvDIDnumber together; otherwise, for each observation it would potentially have to look at all preceeding observations to see if that BvDIDnumber had appeared earlier.

It doesn'taffect my results when I convert wide to long data and linear regression,right ? thank you
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

12 Mar 2022, 13:29

It does not affect your results.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#9

12 Mar 2022, 15:16

I believe what you want is:

Code:

egen full_id = concat(firm_id BvDIDnumber), punct(" ") drop A gen long obs_no = _n reshape long Turnover, i(obs_no) j(year)

Added: I notice that most of your observations in the example data are exact duplicates of each other. Perhaps in the real data they disagree on other variables not shown? If not, what is the rationale for having so many replicates of the same information?
1 like
Comment

Announcement

Reshape long format

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment