Need help on reshaping data for more than one "i"

Cedric Hung

Join Date: Jul 2018
Posts: 7

Need help on reshaping data for more than one "i"

26 Jul 2018, 13:42

Hi all,

I am doing a research on measuring banking competition across EU28 countries. I was trying to reshape the dataset from wide to long but it was not successful. I have tried the following command:

Code:

reshape long roaa tr, i(id) j(year)

Code:

reshape long roaa  tr, i(bankname country) j(year)

Basically, an error "variable id does not uniquely identify the observations " came up. I have tried to look at other posts in the forum but it does not seem to apply to my cases. May anyone gives me some advice? I will be very appreciated.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id str11 country str159 bankname double(roaa2016 roaa2015 roaa2014 roaa2013 roaa2012 roaa2011) long(tr2016 tr2015 tr2014 tr2013 tr2012 tr2011)
 1 "Austria" "3-Banken Wohnbaubank AG"                               .0009  .0012   .001      .      .     .    271     280     353      .       .       .
 2 "Austria" "A1 Bank AG"                                            .0627 -.0139 -.0684 -.0545  .0027 .0217  11703    8709   10246  10233    8994    9174
 3 "Austria" "ABS Factoring Bank AG"                                 .0054  .0319  .0134  .0154      .     .   3856    6064    6055   7242       .       .
 4 "Austria" "AirPlus Air Travel Card Vertriebsgesellschaft m.b.H."  .0696  .0633  .0611  .0524      .     .   8442    8504    9757  11023       .       .
 5 "Austria" "Allgemeine Sparkasse Ober¬îsterreich Bank AG"          .0017  .0053  .0017  .0037  .0039 .0026 262682  290356  248891 314435  277866  240277
 6 "Austria" "Alpenlaendische Garantie -Gesellschaft M.B.H."             .      .      .      .      . .0001     66       .       .    124      65      89
 7 "Austria" "Austrian Anadi Bank AG"                                .0034  .0012  .0002  .0013    .01 .0013  67522   68994   64074  82783   87550   97515
 8 "Austria" "Autobank AG"                                           .0003  .0001      0 -.0029      0 .0017  10036    9922   11136   8439    8135    9981
 9 "Austria" "Banco do Brasil AG"                                   -.0134 -.0089   .007  .0052      .     .  10126   19625   52707  31918       .       .
10 "Austria" "Bank Austria Creditanstalt AG"                             .   .002  -.016 -.0134 -.0025 -.004      . 3194147 1343306 878307 2023218 1832307
end

I want the dataset to look like the following:

Code:

 id
country
bankname
year
roaa
tr

1
Austria
3-Banken Wohnbaubank AG
2011



1
Austria
3-Banken Wohnbaubank AG
2012



1
Austria
3-Banken Wohnbaubank AG
2013



1
Austria
3-Banken Wohnbaubank AG
2014



1
Austria
3-Banken Wohnbaubank AG
2015



1
Austria
3-Banken Wohnbaubank AG
2016




 AND SO ON

Best regards and many thanks,
Cedric

Last edited by Cedric Hung; 26 Jul 2018, 13:52.

Tags: reshape, reshape long

Jesse Tielens

Join Date: Jul 2018

Posts: 46
#2

26 Jul 2018, 13:51

The commands work fine with the example dataset you gave.

Have you tried to look if there are actual duplicates in your id? You can use:

Code:

duplicates list

to check this.
1 like
Comment
Cedric Hung

Join Date: Jul 2018

Posts: 7
#3

26 Jul 2018, 13:55

Originally posted by Jesse Tielens View Post

The commands work fine with the example dataset you gave.

Have you tried to look if there are actual duplicates in your id? You can use:

Code:

duplicates list

to check this.

Hi Jesse,

Thanks for your reply. I have checked the id variable. All the id assigned to each bank is different.

Regards,
Cedric
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

26 Jul 2018, 14:03

Adding to Jesse's advice, since reshape long reported that id did not uniquely identify observations (in the first example) and presumably (since you do not tell us) bankname and country do not uniquely identify observations (in the second example), you will want to use something like

Code:

duplicates report id

or

Code:

duplicates report bankname country

as a quick check on the extent of your problems with duplicates. Then

Code:

duplicates list id

or

Code:

duplicates list bankname country

will show you complete observations.

It's import to include the variable names on the duplicates commands, since it's possible that you have two observations with the same id but different values for your roaa* and tr* variables.

To improve the presentation of your future posts, and increase the likelihood of receiving a helpful answer, you should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

26 Jul 2018, 14:08

My post #4 crossed with post #3.

While it may be true that

All the id assigned to each bank is different

that doesn't mean that you don't have the same bank, with the same id, twice in your dataset.

Stata has told you that id is not distinct - that some id's appear on more than one observation in your dataset. I trust Stata, you should too. Try the commands recommended in post #4.
Comment
Cedric Hung

Join Date: Jul 2018

Posts: 7
#6

26 Jul 2018, 14:46

Originally posted by William Lisowski View Post

Adding to Jesse's advice, since reshape long reported that id did not uniquely identify observations (in the first example) and presumably (since you do not tell us) bankname and country do not uniquely identify observations (in the second example), you will want to use something like

Code:

duplicates report id

or

Code:

duplicates report bankname country

as a quick check on the extent of your problems with duplicates. Then

Code:

duplicates list id

or

Code:

duplicates list bankname country

will show you complete observations.

It's import to include the variable names on the duplicates commands, since it's possible that you have two observations with the same id but different values for your roaa* and tr* variables.

To improve the presentation of your future posts, and increase the likelihood of receiving a helpful answer, you should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

Hi William,

Thanks so much for the specification. I have identified the problem now and I think the reshape function is working now.

I have one more question. Since I have 28 countries in the dataset, should I make 28 dummy variable to identify the countries so that it will be easier to conduct regression in each country separately?

Many thanks,
Cedric

Last edited by Cedric Hung; 26 Jul 2018, 14:49.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

26 Jul 2018, 15:27

Originally posted by Cedric Hung View Post

Since I have 28 countries in the dataset, should I make 28 dummy variable to identify the countries so that it will be easier to conduct regression in each country separately?

No, there's a better way.

You should read the output of help encode to create a categorical variable that will take the values 1-28, and which will have value labels associated with it so that the numeric value will display as the country name wherever value labels are supported. Then you can easily loop over the countries to run the 28 separate regressions.

Code:

forvalues c=1/28 { regress y x if countrycode==`c' }

Then, should you ever want to run a single regression with all 28 countries, you can read the output of help factor variables to learn how to have Stata use the categorical variable to create "virtual" dummy variables (more correctly, indicator variables) for the individual countries.

Code:

regress y x i.countrycode
Comment

Announcement

Need help on reshaping data for more than one "i"

Comment

Comment

Comment

Comment

Comment

Comment