Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape from wide to long in multiple steps

    Dear Listers,

    I have some trouble reshaping a set of country specific variables that I need to merge with my master panel dataset which is in long format. The orginial using set contains country specific indices about bank regulation and supervision in 4 survey periods (1-4) and is (at least in my view) in wide format (compiled and provided by Barth et al, 2013). Here is where I started:

    Code:
    clear
    input str8 Number str40 Name byte Survey str17(cntry_Albania cntry_Algeria cntry_Angola cntry_Anguilla cntry_AntiguaandBarbuda cntry_Argentina cntry_Armenia)
    "I.I"    "secur_act"               1 "1"    "n.a." "n.a." "1"    "1"    "2" "3"
    "I.I"    "secur_act"               2 "1"    "2"    "n.a." "1"    "1"    "2" "2"
    "I.I"    "secur_act"               3 "1"    "1"    "n.a." "1"    "1"    "1" "1"
    "I.I"    "secur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "1" "1"
    "I.II"   "insur_act"               1 "4"    "n.a." "n.a." "3"    "3"    "2" "3"
    "I.II"   "insur_act"               2 "4"    "2"    "n.a." "2"    "2"    "3" "4"
    "I.II"   "insur_act"               3 "4"    "4"    "4"    "4"    "4"    "3" "4"
    "I.II"   "insur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "3"
    "I.III"  "real_act"                1 "3"    "1"    "n.a." "4"    "4"    "2" "3"
    "I.III"  "real_act"                2 "1"    "1"    "n.a." "2"    "2"    "3" "3"
    "I.III"  "real_act"                3 "n.a." "1"    "4"    "4"    "4"    "4" "3"
    "I.III"  "real_act"                4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "4"
    "I.IV"   "act_restrict[1]"         1 "8"    "n.a." "n.a." "7"    "7"    "6" "9"
    "I.IV"   "act_restrict[1]"         2 "6"    "5"    "n.a." "5"    "5"    "8" "9"
    "I.IV"   "act_restrict[1]"         3 "7.5"  "6"    "12"   "9"    "9"    "8" "8"
    "I.IV"   "act_restrict[1]"         4 "n.a." "n.a." "6"    "n.a." "n.a." "7" "8"
    "II.I"   "own_firm"                1 "3"    "n.a." "n.a." "3"    "3"    "1" "2"
    "II.I"   "own_firm"                2 "3"    "3"    "n.a." "2"    "2"    "3" "2"
    "II.I"   "own_firm"                3 "3"    "4"    "1"    "2"    "2"    "3" "1"
    "II.I"   "own_firm"                4 "n.a." "n.a." "3"    "n.a." "n.a." "3" "4"
    "II.II"  "firm_own_bank"           1 "2"    "n.a." "n.a." "3"    "3"    "1" "2"
    "II.II"  "firm_own_bank"           2 "3"    "1"    "n.a." "2"    "2"    "2" "2"
    "II.II"  "firm_own_bank"           3 "n.a." "4"    "n.a." "3"    "3"    "2" "2"
    "II.II"  "firm_own_bank"           4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
    "II.III" "Nonbankfin_own_bank (*)" 1 "n.a." "n.a." "n.a." "1"    "1"    "1" "2"
    "II.III" "Nonbankfin_own_bank (*)" 2 "1"    "2"    "n.a." "2"    "2"    "2" "2"
    "II.III" "Nonbankfin_own_bank (*)" 3 "n.a." "4"    "n.a." "2"    "2"    "2" "2"
    "II.III" "Nonbankfin_own_bank (*)" 4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
    "II.IV"  "Overall_restrict (*)"    1 "7.5"  "n.a." "n.a." "7"    "7"    "3" "6"
    "II.IV"  "Overall_restrict (*)"    2 "7"    "6"    "n.a." "6"    "6"    "7" "6"
    end
    The desired format is long, where the respective indices (variable "Name") are transformed to variables that carry information of the index values. I startet with the following approach:

    Code:
    reshape long cntry_, i(Number Name Survey) j(country) string
    sort country Name Survey
    rename cntry_ index
    The actual structure is ok in respect of the countries, but not in respect of the relevant indices. Because I need to merge the respective indices which are listed as observations in the variable "Name" with the index values as seperate variables, I thought of reshaping the variable "Name" to wide format:

    Code:
    reshape wide Name, i(Number Survey country) j(index) string
    Anyhow, I get an error message saying
    Name 7.5 invalid variable name
    . Right now, I can't find out what I am doing wrong or if the approach I am following is generally advisable for my desire. I would therefore appreciate your help.

    Best regards,
    Julian
    Last edited by Julian Scholz; 20 Sep 2019, 09:28.

  • #2
    I think what you want is:

    Code:
    reshape wide index, i(Number Survey country) j(Name) string
    But this will not quite work either. The problem here is that some of the values of Name contain characters that are not permissible in variable names. And, although this problem does not arise in the example you show, it might be that when combined with index_, the resulting variable name would exceed the limit of 32 characters. So I think you first need to

    Code:
    replace Name = strtoname(Name)
    And if there is a length problem after that:
    Code:
    rename index _
    And if there is still a length problem after that:
    Code:
    replace Name = substr(Name, 1, 31)
    Note: You should check that neither of the -replace Name =- commands proposed above results in two previously distinct values of Name being reduced to the same result, thereby fusing what should be distinct categories. If that does happen, then you need to go back to the original Name values and do some other changes so as to assure they remain distinct after transformations.

    Comment


    • #3
      Barth et al. 2013. Please note https://www.statalist.org/forums/help#references

      Does this help? I am pretty clear that you have Name and index the wrong way round and that the distinct values of Name won't work well in new variable names. Hence I make them value labels before the reshape and variable labels after.


      Code:
      clear
      input str8 Number str40 Name byte Survey str17(cntry_Albania cntry_Algeria cntry_Angola cntry_Anguilla cntry_AntiguaandBarbuda cntry_Argentina cntry_Armenia)
      "I.I"    "secur_act"               1 "1"    "n.a." "n.a." "1"    "1"    "2" "3"
      "I.I"    "secur_act"               2 "1"    "2"    "n.a." "1"    "1"    "2" "2"
      "I.I"    "secur_act"               3 "1"    "1"    "n.a." "1"    "1"    "1" "1"
      "I.I"    "secur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "1" "1"
      "I.II"   "insur_act"               1 "4"    "n.a." "n.a." "3"    "3"    "2" "3"
      "I.II"   "insur_act"               2 "4"    "2"    "n.a." "2"    "2"    "3" "4"
      "I.II"   "insur_act"               3 "4"    "4"    "4"    "4"    "4"    "3" "4"
      "I.II"   "insur_act"               4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "3"
      "I.III"  "real_act"                1 "3"    "1"    "n.a." "4"    "4"    "2" "3"
      "I.III"  "real_act"                2 "1"    "1"    "n.a." "2"    "2"    "3" "3"
      "I.III"  "real_act"                3 "n.a." "1"    "4"    "4"    "4"    "4" "3"
      "I.III"  "real_act"                4 "n.a." "n.a." "2"    "n.a." "n.a." "3" "4"
      "I.IV"   "act_restrict[1]"         1 "8"    "n.a." "n.a." "7"    "7"    "6" "9"
      "I.IV"   "act_restrict[1]"         2 "6"    "5"    "n.a." "5"    "5"    "8" "9"
      "I.IV"   "act_restrict[1]"         3 "7.5"  "6"    "12"   "9"    "9"    "8" "8"
      "I.IV"   "act_restrict[1]"         4 "n.a." "n.a." "6"    "n.a." "n.a." "7" "8"
      "II.I"   "own_firm"                1 "3"    "n.a." "n.a." "3"    "3"    "1" "2"
      "II.I"   "own_firm"                2 "3"    "3"    "n.a." "2"    "2"    "3" "2"
      "II.I"   "own_firm"                3 "3"    "4"    "1"    "2"    "2"    "3" "1"
      "II.I"   "own_firm"                4 "n.a." "n.a." "3"    "n.a." "n.a." "3" "4"
      "II.II"  "firm_own_bank"           1 "2"    "n.a." "n.a." "3"    "3"    "1" "2"
      "II.II"  "firm_own_bank"           2 "3"    "1"    "n.a." "2"    "2"    "2" "2"
      "II.II"  "firm_own_bank"           3 "n.a." "4"    "n.a." "3"    "3"    "2" "2"
      "II.II"  "firm_own_bank"           4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
      "II.III" "Nonbankfin_own_bank (*)" 1 "n.a." "n.a." "n.a." "1"    "1"    "1" "2"
      "II.III" "Nonbankfin_own_bank (*)" 2 "1"    "2"    "n.a." "2"    "2"    "2" "2"
      "II.III" "Nonbankfin_own_bank (*)" 3 "n.a." "4"    "n.a." "2"    "2"    "2" "2"
      "II.III" "Nonbankfin_own_bank (*)" 4 "n.a." "n.a." "n.a." "n.a." "n.a." "1" "2"
      "II.IV"  "Overall_restrict (*)"    1 "7.5"  "n.a." "n.a." "7"    "7"    "3" "6"
      "II.IV"  "Overall_restrict (*)"    2 "7"    "6"    "n.a." "6"    "6"    "7" "6"
      end
      
      reshape long cntry_, i(Number Name Survey) j(country) string
      sort country Name Survey
      rename cntry_ index
      destring index, force replace
      
      encode Name, gen(which)
      
      forval j = 1/8 { 
          local lbl`j' "`: label which `j''"
      }
      
      reshape wide index, i(Number Survey country) j(which)
      
      forval j = 1/8 { 
          label var index`j' "`lbl`j''"
      }
      
      list

      Comment


      • #4
        Thank you Clyde and Nick for your reply. For the record, the database I am working with and the respective paper stem from: http://faculty.haas.berkeley.edu/ros...Regulation.htm.
        Sor far, both your suggested approaches in #2 and #3 seem to deliver the desired outcom but with the flaw of newly created observations. Here are the results after I tried Clyde's codes on the original using set:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str8 Number byte Survey str22 country str17(vAccounting_Practices vBankAccounting vBankConcentration__assets_ vBankConcentration__deposits_ vCert_Audit vCourt_Involve____ vDivers_Index vDomRatedBanks_pct____)
        "VIII.I"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "V.III"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VI.VI"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "V.IV"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "IV.I"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "IV.II"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VII.VI"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VI.V"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "II.IV"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VII.V"    1 "Albania" "" "2" "" ""     "" ""     "" ""    
        "III.IV"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "IX.IV"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "IX.I"     1 "Albania" "" ""  "" "n.a." "" ""     "" ""    
        "I.I"      1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "V.VI"     1 "Albania" "" ""  "" ""     "" "n.a." "" ""    
        "VIII.IV"  1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "II.III"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VI.VII"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "III.II"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VII.IV"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "X.IV"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VII.II"   1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "II.I"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VIII.III" 1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "VII.III"  1 "Albania" "" ""  "" ""     "" ""     "" "n.a."
        "X.I"      1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "III.I"    1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "III.III"  1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "I.IV"     1 "Albania" "" ""  "" ""     "" ""     "" ""    
        "X.V"      1 "Albania" "" ""  "" ""     "" ""     "" ""    
        end
        The problem is that for each country there should only be one observation per Survey containing the different values for all indices. Instead, the reshape command seems to generate new observations with many "blank cells". So for each value per index a new observation is created. I guess there is some way to rearrange those values into one observation. Any Suggestions?

        @Nick: The results your approach delivers seem very similar to Clyde's so I can't figure out if this is an improvement.

        edit: you are right about the variable names so I see what you mean now.
        Regards,
        Julian
        Last edited by Julian Scholz; 20 Sep 2019, 14:26.

        Comment


        • #5
          Well, I see what you mean. I think I can get you the kind of output you desire, as follows:

          Code:
          reshape long cntry_, i(Number Name Survey) j(country) string
          sort country Name Survey
          rename cntry_ index
          
          replace Name = strtoname(Name)
          rename index _
          
          isid Survey country Name
          drop Number
          reshape wide _, i(Survey country) j(Name) string
          The problem is that it requires you to drop the variable Number (which is what is creating the "excess" observations). That's fine if Number itself is meaningless or is redundant given country and Survey. In your example data, Name, country and Survey suffice to uniquely identify observations, so dropping Number is OK. But is that true in your full data set? (The -isid Survey country Name- command will tell you whether it is or not, and it won't let you proceed if it's not.)

          Comment


          • #6
            Clyde, thats for your advise! As you asume, that variable "Number" is meaningless so dropping it is fine. The problem is solved.

            Comment

            Working...
            X