Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape wide to long when there is no ID variable

    Hello Statalisters,

    I am working with several hundred excel sheets that I want to import into Stata. They have a modicum of uniformity that allows me to import them but at the final step of reshaping them from wide to long I am stumbling. I have given you a sample of the data below.

    I want to reshape the data where each district has an observation of the variables starting with total. However, each state for which I have a single excel file have differing numbers of district. Which could be titled from G to CG. How do I go about reshaping this to reflect variable name district with all the values given under the name G to CG instead of existing varnames in long?

    It should also be noted that I initially import these from excel and keep only a few variables that I need using the following code :

    Code:
    bysort B (C): keep if C[1] == "district" | C[1] == "totalpublic" | C[1] == "totalpublic_inst" | C[1] == "totalpublic_instcol" | C[1] ==    "totalpvt" |
     C[1] == "totalpvt_inst" | C[1] == "totalpvt_instcol"
    Hereunder are two excel sheets from different states I have imported. I wanted the "observations" I imported above to be reshaped to be variables names. And existing "variables" ranging from G to CG depending on the states to be under the variable district. Hope this is not too confusing and within Statalist guidelines.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str196 C str7 F str16 G str13 H
    "district"            "Nicobar" "North and Middle" "South Andaman"
    "totalpublic"         ""        ""                 "4629"         
    "totalpublic_inst"     ""        ""                 ""             
    "totalpublic_instcol" ""        ""                 ""             
    "totalpvt"         ""        ""                 ""             
    "totalpvt_inst"     ""        ""                 ""             
    "totalpvt_instcol" ""        ""                 ""             
    end

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str196 C str8 E str7(F G) str10 H str6 I str9(J K) str7 L str6 M str9 N str14 O str7 P str9 Q str7 R
    "district"            "_Bihar" "Araria" "Arwal" "Aurangabad" "Banka" "Begusarai" "Bhagalpur" "Bhojpur" "Buxar" "Darbhanga" "East Champaran" "Gaya"  "Gopalganj" "Jamui"
    "totalpublic"         "535688" "20118"  "4082"  "27539"      "25147" "3061"      "34146"     "30658"   ""      "11016"     "31357"          "22317" "10651"     "12187"
    "totalpublic_inst"     "427820" "21058"  "5752"  "9724"       "26358" "104"       "4030"      "9707"    "11042" ""          "48925"          ""      ""          "16231"
    "totalpublic_instcol" "338565" "12567"  "4582"  "3603"       "25422" "98"        "2914"      "5610"    "7143"  ""          "47935"          ""      ""          "16231"
    "totalpvt"         ""       ""       ""      ""           ""      ""          ""          ""        ""      ""          ""               ""      ""          ""     
    "totalpvt_inst"     "14853"  ""       ""      "4017"       ""      ""          ""          ""        "381"   ""          ""               ""      ""          "2969" 
    "totalpvt_instcol" "12703"  ""       ""      "3955"       ""      ""          ""          ""        "310"   ""          ""               ""      ""          "2969" 
    end
    Lori

  • #2
    perhaps,
    Code:
    ssc install sxpose
    sxpose, clear firstnames

    Comment


    • #3
      Thank you @Øyvind, unfortunately this only partially works and results in significant data loss. As you can see in the data sample below, which is the exact data posted in section 1. I have lost all district names which would've have been my geographical identifier.
      Would you know how to adapt the code to keep district variable as a column.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str9 totalpublic str13 totalpublic_inst str17 totalpublic_instcol str9 totalpvt str13 totalpvt_inst str17 totalpvt_instcol
      ""     "" "" "" "" ""
      ""     "" "" "" "" ""
      "4629" "" "" "" "" ""
      end

      Comment


      • #4
        You should show how you want the final output to look like. Hazarding a guess:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str196 C str8 E str7(F G) str10 H str6 I str9(J K) str7 L str6 M str9 N str14 O str7 P str9 Q str7 R
        "district"            "_Bihar" "Araria" "Arwal" "Aurangabad" "Banka" "Begusarai" "Bhagalpur" "Bhojpur" "Buxar" "Darbhanga" "East Champaran" "Gaya"  "Gopalganj" "Jamui"
        "totalpublic"         "535688" "20118"  "4082"  "27539"      "25147" "3061"      "34146"     "30658"   ""      "11016"     "31357"          "22317" "10651"     "12187"
        "totalpublic_inst"     "427820" "21058"  "5752"  "9724"       "26358" "104"       "4030"      "9707"    "11042" ""          "48925"          ""      ""          "16231"
        "totalpublic_instcol" "338565" "12567"  "4582"  "3603"       "25422" "98"        "2914"      "5610"    "7143"  ""          "47935"          ""      ""          "16231"
        "totalpvt"         ""       ""       ""      ""           ""      ""          ""          ""        ""      ""          ""               ""      ""          ""    
        "totalpvt_inst"     "14853"  ""       ""      "4017"       ""      ""          ""          ""        "381"   ""          ""               ""      ""          "2969"
        "totalpvt_instcol" "12703"  ""       ""      "3955"       ""      ""          ""          ""        "310"   ""          ""               ""      ""          "2969"
        end
        
        foreach var of varlist *{
            rename `var' `=strtoname("`=`var'[1]'")'
        }
        drop in 1
        rename (_Bihar -Jamui) value=
        reshape long value, i(district) j(name) string
        reshape wide value, i(name) j(district) string
        rename value* *
        Res.:

        Code:
        . l, sep(0)
        
             +----------------------------------------------------------------------------------+
             |           name   totalp~c   t~c_inst   totalp..   totalpvt   t~t_inst   totalp.. |
             |----------------------------------------------------------------------------------|
          1. |         Araria      20118      21058      12567                                  |
          2. |          Arwal       4082       5752       4582                                  |
          3. |     Aurangabad      27539       9724       3603                  4017       3955 |
          4. |          Banka      25147      26358      25422                                  |
          5. |      Begusarai       3061        104         98                                  |
          6. |      Bhagalpur      34146       4030       2914                                  |
          7. |        Bhojpur      30658       9707       5610                                  |
          8. |          Buxar                 11042       7143                   381        310 |
          9. |      Darbhanga      11016                                                        |
         10. | East_Champaran      31357      48925      47935                                  |
         11. |           Gaya      22317                                                        |
         12. |      Gopalganj      10651                                                        |
         13. |          Jamui      12187      16231      16231                  2969       2969 |
         14. |         _Bihar     535688     427820     338565                 14853      12703 |
             +----------------------------------------------------------------------------------+
        
        .
        Last edited by Andrew Musau; 05 Oct 2022, 05:46.

        Comment


        • #5
          Andrew, the final output that you generated was what I wanted and I had used -strtoname- command initially, but since I need to loop over many excel sheets with different variables I feel like the code that you mentioned cpuld not work when I tried it
          i.e.
          Code:
           rename (_Bihar -Jamui) value=
          This would have to be file specific. I used the code @Øyvind, and generated the following which works perfectly unless there is some dataloss, which I have not found after a quick manual review.:

          Code:
          local var `" "A & N Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" "
          foreach x in `var' {
          
          import excel "$rawdata/`x'/L - `x'_2011.xls", sheet("Sheet1") clear
          replace C = "distname" in 6
          replace B = "2.0" in 6
          bysort B (C): keep if C[1] == "district" | C[1] == "totalpublic" | C[1] == "totalpublic_inst" | C[1] == "totalpublic_instcol" | C[1] == "totalpvt" | C[1] == "totalpvt_inst" | C[1] == "totalpvt_instcol"
          
          
          sxpose, clear firstnames
          
          
          save `x'. dta, replace
          
          }
          Just the save needs to be fixed in the loop given that there are a lot of ampresands and spaces in file names.

          Final output:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str16 distname str9 totalpublic str13 totalpublic_inst str17 totalpublic_instcol str9 totalpvt str13 totalpvt_inst str17 totalpvt_instcol  
          "nicobar"          ""     "" "" "" "" "" "A & N Islands"
          "north and middle" ""     "" "" "" "" "" "A & N Islands"
          "south andaman"    "4629" "" "" "" "" "" "A & N Islands"
          end
          Thank you both!
          Last edited by Lorien Nair; 05 Oct 2022, 06:09. Reason: Forgot to add code section

          Comment

          Working...
          X