Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape with two columns to rows

    Hi, I am trying to reshape a dataset so that the rows in variable "title" become columns, while the two columns "country1" and "country2" become rows.

    My data looks as follows:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str90 title str6 region str20(country1 country2)
    "Total population"                                                             "Europe" "717438" "674920"
    "Male"                                                                         "Europe" "345865" "325647"
    "Female"                                                                       "Europe" "371573" "349273"
    "Under 5 years"                                                                "Europe" "43090"  "41267" 
    "5 to 9 years"                                                                 "Europe" "43281"  "44109" 
    "10 to 14 years"                                                               "Europe" "46701"  "41423" 
    "15 to 19 years"                                                               "Europe" "46000"  "42859" 
    "20 to 24 years"                                                               "Europe" "40348"  "43193" 
    "25 to 34 years"                                                               "Europe" "92674"  "91857" 
    "35 to 44 years"                                                               "Europe" "88657"  "82199" 
    "45 to 54 years"                                                               "Europe" "87619"  "81556" 
    "55 to 59 years"                                                               "Europe" "48179"  "42805" 
    "60 to 64 years"                                                               "Europe" "49738"  "43319" 
    "65 to 74 years"                                                               "Europe" "78932"  "70407" 
    "75 to 84 years"                                                               "Europe" "39375"  "37904" 
    "85 years and over"                                                            "Europe" "12844"  "12022" 
    "Median age (years)"                                                           "Europe" "40.4"   "39.0"  
    "18 years and over"                                                            "Europe" "556011" "523012"
    "65 years and over"                                                            "Europe" "131151" "120333"
    "Total population"                                                             "Europe" "717438" "674920"
    "One race"                                                                     "Europe" "702458" "661882"
    "White"                                                                        "Europe" "479955" "424115"
    end
    I tried to do:
    gen dummy=1
    encode title, gen(title_num)
    reshape wide country1 country2, i(dummy) j(title_num)
    But I get an error that "values of variable title_num not unique within dummy". Does anyone know how to solve this issue?

    Thank you very much in advance!!

  • #2
    This seems consistent with your data example.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str90 title str6 region str20(country1 country2)
    "Total population"                                                             "Europe" "717438" "674920"
    "Male"                                                                         "Europe" "345865" "325647"
    "Female"                                                                       "Europe" "371573" "349273"
    "Under 5 years"                                                                "Europe" "43090"  "41267"
    "5 to 9 years"                                                                 "Europe" "43281"  "44109"
    "10 to 14 years"                                                               "Europe" "46701"  "41423"
    "15 to 19 years"                                                               "Europe" "46000"  "42859"
    "20 to 24 years"                                                               "Europe" "40348"  "43193"
    "25 to 34 years"                                                               "Europe" "92674"  "91857"
    "35 to 44 years"                                                               "Europe" "88657"  "82199"
    "45 to 54 years"                                                               "Europe" "87619"  "81556"
    "55 to 59 years"                                                               "Europe" "48179"  "42805"
    "60 to 64 years"                                                               "Europe" "49738"  "43319"
    "65 to 74 years"                                                               "Europe" "78932"  "70407"
    "75 to 84 years"                                                               "Europe" "39375"  "37904"
    "85 years and over"                                                            "Europe" "12844"  "12022"
    "Median age (years)"                                                           "Europe" "40.4"   "39.0"  
    "18 years and over"                                                            "Europe" "556011" "523012"
    "65 years and over"                                                            "Europe" "131151" "120333"
    "Total population"                                                             "Europe" "717438" "674920"
    "One race"                                                                     "Europe" "702458" "661882"
    "White"                                                                        "Europe" "479955" "424115"
    end
    
    gen varorder = _n
    su varorder, meanonly
    local max = r(max)
    forval j = 1/`max' {
        local label`j' = title[`j']
    }
    
    reshape long country, i(varorder) j(which)
    drop title
    rename country whatever
    reshape wide whatever, i(which) j(varorder)
    forval j = 1/`max'  {
        label var whatever`j' "`label`j''"
    }
    
    d
    
    l
    You may need to vary the recipe for a fuller dataset.
    Last edited by Nick Cox; 15 Jul 2022, 05:44.

    Comment


    • #3
      This seems to work perfectly, thanks so much! My only problem is that there are 25 country variables and they are actually named "province01final" through to "province25final" (I changed the names above for simplicity, apologies if it is an inconvenience). I tried adapting your code for these names but I do not think it is possible using the code you suggest?

      Thank you very much again!


      Comment


      • #4
        see syntax 6 in
        Code:
        help rename group

        Comment


        • #5
          We need a better data example then.

          Comment


          • #6
            Apologies, here is a better data example more representative of my full dataset. Also, I would happily change the variable names myself but there are 50 files like this (for different regions)

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            
            input str90 title str6 region str20(province01final province02final province03final province04final province05final province07final province06final)
            
            "Agriculture, forestry, fishing and hunting"                               "Europe" "43425"    "104552"  "32387"   "27249"   "8673"     "28631"    "62193"   
            "Arts, entertainment, and recreation"                                      "Europe" "53852"    "35050"   "35333"   "20378"   "52738"    "104488"   "44626"   
            "Construction"                                                             "Europe" "914276"   "515718"  "424812"  "422028"  "708774"   "1143967"  "1056156" 
            "Educational services"                                                     "Europe" "158622"   "105565"  "90936"   "15508"   "142670"   "129000"   "115334"  
            "Finance and insurance"                                                    "Europe" "630779"   "486204"  "280346"  "327459"  "481835"   "2226237"  "1328485" 
            "Health care and social assistance"                                        "Europe" "1582770"  "1807147" "1175840" "1328449" "2045838"  "1746750"  "3394792" 
            "Industries not classified"                                                "Europe" "482"      "541"     "487"     "549"     "369"      "747"      "766"     
            "Information"                                                              "Europe" "232561"   "137708"  "87830"   "119330"  "331460"   "734721"   "297207"  
            "Management of companies and enterprises"                                  "Europe" "284185"   "145677"  "30506"   "234217"  "208931"   "716680"   "292229"  
            "Manufacturing"                                                            "Europe" "1887934"  "1260205" "2092910" "2335113" "2616026"  "994695"   "3076457" 
            "Mining, quarrying, and oil and gas extraction"                            "Europe" "46075"    "12281"   "37785"   "167647"  "21315"    "109728"   "37480"   
            "Other services (except public administration)"                            "Europe" "292194"   "302520"  "183013"  "193902"  "279796"   "456598"   "590038"  
            end

            Thank you very much again!!


            Comment


            • #7
              OK, so what should the dataset look like?

              Comment


              • #8
                It should look as follows (I used just a small amount of variables for simplicity):
                variable "Agriculture, forestry, fishing and hunting" "Arts, entertainment, and recreation" region
                province01final "43425" "53852" "Europe"
                province02final "104552" "35050" "Europe"
                province03final "32387" "35333" "Europe"
                province04final "27249" "20378" "Europe"
                ... ... ... ...
                Thank you very much again!

                Comment


                • #9
                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  
                  input str90 title str6 region str20(province01final province02final province03final province04final province05final province07final province06final)
                  
                  "Agriculture, forestry, fishing and hunting"                               "Europe" "43425"    "104552"  "32387"   "27249"   "8673"     "28631"    "62193"   
                  "Arts, entertainment, and recreation"                                      "Europe" "53852"    "35050"   "35333"   "20378"   "52738"    "104488"   "44626"   
                  "Construction"                                                             "Europe" "914276"   "515718"  "424812"  "422028"  "708774"   "1143967"  "1056156" 
                  "Educational services"                                                     "Europe" "158622"   "105565"  "90936"   "15508"   "142670"   "129000"   "115334"  
                  "Finance and insurance"                                                    "Europe" "630779"   "486204"  "280346"  "327459"  "481835"   "2226237"  "1328485" 
                  "Health care and social assistance"                                        "Europe" "1582770"  "1807147" "1175840" "1328449" "2045838"  "1746750"  "3394792" 
                  "Industries not classified"                                                "Europe" "482"      "541"     "487"     "549"     "369"      "747"      "766"     
                  "Information"                                                              "Europe" "232561"   "137708"  "87830"   "119330"  "331460"   "734721"   "297207"  
                  "Management of companies and enterprises"                                  "Europe" "284185"   "145677"  "30506"   "234217"  "208931"   "716680"   "292229"  
                  "Manufacturing"                                                            "Europe" "1887934"  "1260205" "2092910" "2335113" "2616026"  "994695"   "3076457" 
                  "Mining, quarrying, and oil and gas extraction"                            "Europe" "46075"    "12281"   "37785"   "167647"  "21315"    "109728"   "37480"   
                  "Other services (except public administration)"                            "Europe" "292194"   "302520"  "183013"  "193902"  "279796"   "456598"   "590038"  
                  end
                  
                  gen varorder = _n
                  su varorder, meanonly
                  local max = r(max)
                  forval j = 1/`max' {
                      local label`j' = title[`j']
                  }
                  drop title 
                  reshape long province@final, i(varorder) j(which) string 
                  rename provincefinal whatever
                  destring whatever, replace 
                  reshape wide whatever, i(which) j(varorder)
                  forval j = 1/`max'  {
                      label var whatever`j' "`label`j''"
                  }
                  
                  d
                  
                  l

                  Comment


                  • #10
                    Thank you very much Nick!

                    Comment

                    Working...
                    X