Reshape from wide to long in multiple steps

Julian Scholz

Join Date: May 2019
Posts: 22

Reshape from wide to long in multiple steps

20 Sep 2019, 09:11

Dear Listers,

I have some trouble reshaping a set of country specific variables that I need to merge with my master panel dataset which is in long format. The orginial using set contains country specific indices about bank regulation and supervision in 4 survey periods (1-4) and is (at least in my view) in wide format (compiled and provided by Barth et al, 2013). Here is where I started:

Code:

clear
input str8 Number str40 Name byte Survey str17(cntry_Albania cntry_Algeria cntry_Angola cntry_Anguilla cntry_AntiguaandBarbuda cntry_Argentina cntry_Armenia)
"I.I"    "secur_act"               1 "1"    "n.a." "n.a." "1"    "1"    "2" "3"
"I.I"    "secur_act"               2 "1"    "2"    "n.a." "1"    "1"    "2" "2"
"I.I"    "secur_act"               3 "1"    "1"    "n.a." "1"    "1"    "1" "1"
"I.I"    "secur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "1" "1"
"I.II"   "insur_act"               1 "4"    "n.a." "n.a." "3"    "3"    "2" "3"
"I.II"   "insur_act"               2 "4"    "2"    "n.a." "2"    "2"    "3" "4"
"I.II"   "insur_act"               3 "4"    "4"    "4"    "4"    "4"    "3" "4"
"I.II"   "insur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "3"
"I.III"  "real_act"                1 "3"    "1"    "n.a." "4"    "4"    "2" "3"
"I.III"  "real_act"                2 "1"    "1"    "n.a." "2"    "2"    "3" "3"
"I.III"  "real_act"                3 "n.a." "1"    "4"    "4"    "4"    "4" "3"
"I.III"  "real_act"                4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "4"
"I.IV"   "act_restrict[1]"         1 "8"    "n.a." "n.a." "7"    "7"    "6" "9"
"I.IV"   "act_restrict[1]"         2 "6"    "5"    "n.a." "5"    "5"    "8" "9"
"I.IV"   "act_restrict[1]"         3 "7.5"  "6"    "12"   "9"    "9"    "8" "8"
"I.IV"   "act_restrict[1]"         4 "n.a." "n.a." "6"    "n.a." "n.a." "7" "8"
"II.I"   "own_firm"                1 "3"    "n.a." "n.a." "3"    "3"    "1" "2"
"II.I"   "own_firm"                2 "3"    "3"    "n.a." "2"    "2"    "3" "2"
"II.I"   "own_firm"                3 "3"    "4"    "1"    "2"    "2"    "3" "1"
"II.I"   "own_firm"                4 "n.a." "n.a." "3"    "n.a." "n.a." "3" "4"
"II.II"  "firm_own_bank"           1 "2"    "n.a." "n.a." "3"    "3"    "1" "2"
"II.II"  "firm_own_bank"           2 "3"    "1"    "n.a." "2"    "2"    "2" "2"
"II.II"  "firm_own_bank"           3 "n.a." "4"    "n.a." "3"    "3"    "2" "2"
"II.II"  "firm_own_bank"           4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
"II.III" "Nonbankfin_own_bank (*)" 1 "n.a." "n.a." "n.a." "1"    "1"    "1" "2"
"II.III" "Nonbankfin_own_bank (*)" 2 "1"    "2"    "n.a." "2"    "2"    "2" "2"
"II.III" "Nonbankfin_own_bank (*)" 3 "n.a." "4"    "n.a." "2"    "2"    "2" "2"
"II.III" "Nonbankfin_own_bank (*)" 4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
"II.IV"  "Overall_restrict (*)"    1 "7.5"  "n.a." "n.a." "7"    "7"    "3" "6"
"II.IV"  "Overall_restrict (*)"    2 "7"    "6"    "n.a." "6"    "6"    "7" "6"
end

The desired format is long, where the respective indices (variable "Name") are transformed to variables that carry information of the index values. I startet with the following approach:

Code:

reshape long cntry_, i(Number Name Survey) j(country) string
sort country Name Survey
rename cntry_ index

The actual structure is ok in respect of the countries, but not in respect of the relevant indices. Because I need to merge the respective indices which are listed as observations in the variable "Name" with the index values as seperate variables, I thought of reshaping the variable "Name" to wide format:

Code:

reshape wide Name, i(Number Survey country) j(index) string

Anyhow, I get an error message saying

Name 7.5 invalid variable name

. Right now, I can't find out what I am doing wrong or if the approach I am following is generally advisable for my desire. I would therefore appreciate your help.

Best regards,
Julian

Last edited by Julian Scholz; 20 Sep 2019, 09:28.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

20 Sep 2019, 12:28

I think what you want is:

Code:

reshape wide index, i(Number Survey country) j(Name) string

But this will not quite work either. The problem here is that some of the values of Name contain characters that are not permissible in variable names. And, although this problem does not arise in the example you show, it might be that when combined with index_, the resulting variable name would exceed the limit of 32 characters. So I think you first need to

Code:

replace Name = strtoname(Name)

And if there is a length problem after that:

Code:

rename index _

And if there is still a length problem after that:

Code:

replace Name = substr(Name, 1, 31)

Note: You should check that neither of the -replace Name =- commands proposed above results in two previously distinct values of Name being reduced to the same result, thereby fusing what should be distinct categories. If that does happen, then you need to go back to the original Name values and do some other changes so as to assure they remain distinct after transformations.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35720

20 Sep 2019, 12:40

Barth et al. 2013. Please note https://www.statalist.org/forums/help#references

Does this help? I am pretty clear that you have Name and index the wrong way round and that the distinct values of Name won't work well in new variable names. Hence I make them value labels before the reshape and variable labels after.

Code:

clear
input str8 Number str40 Name byte Survey str17(cntry_Albania cntry_Algeria cntry_Angola cntry_Anguilla cntry_AntiguaandBarbuda cntry_Argentina cntry_Armenia)
"I.I"    "secur_act"               1 "1"    "n.a." "n.a." "1"    "1"    "2" "3"
"I.I"    "secur_act"               2 "1"    "2"    "n.a." "1"    "1"    "2" "2"
"I.I"    "secur_act"               3 "1"    "1"    "n.a." "1"    "1"    "1" "1"
"I.I"    "secur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "1" "1"
"I.II"   "insur_act"               1 "4"    "n.a." "n.a." "3"    "3"    "2" "3"
"I.II"   "insur_act"               2 "4"    "2"    "n.a." "2"    "2"    "3" "4"
"I.II"   "insur_act"               3 "4"    "4"    "4"    "4"    "4"    "3" "4"
"I.II"   "insur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "3"
"I.III"  "real_act"                1 "3"    "1"    "n.a." "4"    "4"    "2" "3"
"I.III"  "real_act"                2 "1"    "1"    "n.a." "2"    "2"    "3" "3"
"I.III"  "real_act"                3 "n.a." "1"    "4"    "4"    "4"    "4" "3"
"I.III"  "real_act"                4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "4"
"I.IV"   "act_restrict[1]"         1 "8"    "n.a." "n.a." "7"    "7"    "6" "9"
"I.IV"   "act_restrict[1]"         2 "6"    "5"    "n.a." "5"    "5"    "8" "9"
"I.IV"   "act_restrict[1]"         3 "7.5"  "6"    "12"   "9"    "9"    "8" "8"
"I.IV"   "act_restrict[1]"         4 "n.a." "n.a." "6"    "n.a." "n.a." "7" "8"
"II.I"   "own_firm"                1 "3"    "n.a." "n.a." "3"    "3"    "1" "2"
"II.I"   "own_firm"                2 "3"    "3"    "n.a." "2"    "2"    "3" "2"
"II.I"   "own_firm"                3 "3"    "4"    "1"    "2"    "2"    "3" "1"
"II.I"   "own_firm"                4 "n.a." "n.a." "3"    "n.a." "n.a." "3" "4"
"II.II"  "firm_own_bank"           1 "2"    "n.a." "n.a." "3"    "3"    "1" "2"
"II.II"  "firm_own_bank"           2 "3"    "1"    "n.a." "2"    "2"    "2" "2"
"II.II"  "firm_own_bank"           3 "n.a." "4"    "n.a." "3"    "3"    "2" "2"
"II.II"  "firm_own_bank"           4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
"II.III" "Nonbankfin_own_bank (*)" 1 "n.a." "n.a." "n.a." "1"    "1"    "1" "2"
"II.III" "Nonbankfin_own_bank (*)" 2 "1"    "2"    "n.a." "2"    "2"    "2" "2"
"II.III" "Nonbankfin_own_bank (*)" 3 "n.a." "4"    "n.a." "2"    "2"    "2" "2"
"II.III" "Nonbankfin_own_bank (*)" 4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
"II.IV"  "Overall_restrict (*)"    1 "7.5"  "n.a." "n.a." "7"    "7"    "3" "6"
"II.IV"  "Overall_restrict (*)"    2 "7"    "6"    "n.a." "6"    "6"    "7" "6"
end

reshape long cntry_, i(Number Name Survey) j(country) string
sort country Name Survey
rename cntry_ index
destring index, force replace

encode Name, gen(which)

forval j = 1/8 { 
    local lbl`j' "`: label which `j''"
}

reshape wide index, i(Number Survey country) j(which)

forval j = 1/8 { 
    label var index`j' "`lbl`j''"
}

list

Comment

Julian Scholz

Join Date: May 2019
Posts: 22

20 Sep 2019, 14:17

Thank you Clyde and Nick for your reply. For the record, the database I am working with and the respective paper stem from: http://faculty.haas.berkeley.edu/ros...Regulation.htm.
Sor far, both your suggested approaches in #2 and #3 seem to deliver the desired outcom but with the flaw of newly created observations. Here are the results after I tried Clyde's codes on the original using set:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 Number byte Survey str22 country str17(vAccounting_Practices vBankAccounting vBankConcentration__assets_ vBankConcentration__deposits_ vCert_Audit vCourt_Involve____ vDivers_Index vDomRatedBanks_pct____)
"VIII.I"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"V.III"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VI.VI"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
"V.IV"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
"IV.I"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
"IV.II"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VII.VI"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VI.V"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
"II.IV"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VII.V"    1 "Albania" "" "2" "" ""     "" ""     "" ""    
"III.IV"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"IX.IV"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
"IX.I"     1 "Albania" "" ""  "" "n.a." "" ""     "" ""    
"I.I"      1 "Albania" "" ""  "" ""     "" ""     "" ""    
"V.VI"     1 "Albania" "" ""  "" ""     "" "n.a." "" ""    
"VIII.IV"  1 "Albania" "" ""  "" ""     "" ""     "" ""    
"II.III"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VI.VII"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"III.II"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VII.IV"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"X.IV"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VII.II"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
"II.I"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VIII.III" 1 "Albania" "" ""  "" ""     "" ""     "" ""    
"VII.III"  1 "Albania" "" ""  "" ""     "" ""     "" "n.a."
"X.I"      1 "Albania" "" ""  "" ""     "" ""     "" ""    
"III.I"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
"III.III"  1 "Albania" "" ""  "" ""     "" ""     "" ""    
"I.IV"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
"X.V"      1 "Albania" "" ""  "" ""     "" ""     "" ""    
end

The problem is that for each country there should only be one observation per Survey containing the different values for all indices. Instead, the reshape command seems to generate new observations with many "blank cells". So for each value per index a new observation is created. I guess there is some way to rearrange those values into one observation. Any Suggestions?

@Nick: The results your approach delivers seem very similar to Clyde's so I can't figure out if this is an improvement.

edit: you are right about the variable names so I see what you mean now.
Regards,
Julian

Last edited by Julian Scholz; 20 Sep 2019, 14:26.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

20 Sep 2019, 15:47

Well, I see what you mean. I think I can get you the kind of output you desire, as follows:

Code:

reshape long cntry_, i(Number Name Survey) j(country) string sort country Name Survey rename cntry_ index replace Name = strtoname(Name) rename index _ isid Survey country Name drop Number reshape wide _, i(Survey country) j(Name) string

The problem is that it requires you to drop the variable Number (which is what is creating the "excess" observations). That's fine if Number itself is meaningless or is redundant given country and Survey. In your example data, Name, country and Survey suffice to uniquely identify observations, so dropping Number is OK. But is that true in your full data set? (The -isid Survey country Name- command will tell you whether it is or not, and it won't let you proceed if it's not.)
1 like
Comment
Julian Scholz

Join Date: May 2019

Posts: 22
#6

21 Sep 2019, 01:47

Clyde, thats for your advise! As you asume, that variable "Number" is meaningless so dropping it is fine. The problem is solved.
Comment

Announcement

Reshape from wide to long in multiple steps

Comment

Comment

Comment

Comment

Comment