Reshape wide to long when there is no ID variable

Lorien Nair

Join Date: May 2019
Posts: 115

Reshape wide to long when there is no ID variable

05 Oct 2022, 02:49

Hello Statalisters,

I am working with several hundred excel sheets that I want to import into Stata. They have a modicum of uniformity that allows me to import them but at the final step of reshaping them from wide to long I am stumbling. I have given you a sample of the data below.

I want to reshape the data where each district has an observation of the variables starting with total. However, each state for which I have a single excel file have differing numbers of district. Which could be titled from G to CG. How do I go about reshaping this to reflect variable name district with all the values given under the name G to CG instead of existing varnames in long?

It should also be noted that I initially import these from excel and keep only a few variables that I need using the following code :

Code:

bysort B (C): keep if C[1] == "district" | C[1] == "totalpublic" | C[1] == "totalpublic_inst" | C[1] == "totalpublic_instcol" | C[1] ==    "totalpvt" |
 C[1] == "totalpvt_inst" | C[1] == "totalpvt_instcol"

Hereunder are two excel sheets from different states I have imported. I wanted the "observations" I imported above to be reshaped to be variables names. And existing "variables" ranging from G to CG depending on the states to be under the variable district. Hope this is not too confusing and within Statalist guidelines.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str196 C str7 F str16 G str13 H
"district"            "Nicobar" "North and Middle" "South Andaman"
"totalpublic"         ""        ""                 "4629"         
"totalpublic_inst"     ""        ""                 ""             
"totalpublic_instcol" ""        ""                 ""             
"totalpvt"         ""        ""                 ""             
"totalpvt_inst"     ""        ""                 ""             
"totalpvt_instcol" ""        ""                 ""             
end

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str196 C str8 E str7(F G) str10 H str6 I str9(J K) str7 L str6 M str9 N str14 O str7 P str9 Q str7 R
"district"            "_Bihar" "Araria" "Arwal" "Aurangabad" "Banka" "Begusarai" "Bhagalpur" "Bhojpur" "Buxar" "Darbhanga" "East Champaran" "Gaya"  "Gopalganj" "Jamui"
"totalpublic"         "535688" "20118"  "4082"  "27539"      "25147" "3061"      "34146"     "30658"   ""      "11016"     "31357"          "22317" "10651"     "12187"
"totalpublic_inst"     "427820" "21058"  "5752"  "9724"       "26358" "104"       "4030"      "9707"    "11042" ""          "48925"          ""      ""          "16231"
"totalpublic_instcol" "338565" "12567"  "4582"  "3603"       "25422" "98"        "2914"      "5610"    "7143"  ""          "47935"          ""      ""          "16231"
"totalpvt"         ""       ""       ""      ""           ""      ""          ""          ""        ""      ""          ""               ""      ""          ""     
"totalpvt_inst"     "14853"  ""       ""      "4017"       ""      ""          ""          ""        "381"   ""          ""               ""      ""          "2969" 
"totalpvt_instcol" "12703"  ""       ""      "3955"       ""      ""          ""          ""        "310"   ""          ""               ""      ""          "2969" 
end

Lori

Tags: None

Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#2

05 Oct 2022, 05:35

perhaps,

Code:

ssc install sxpose sxpose, clear firstnames
Comment
Lorien Nair

Join Date: May 2019

Posts: 115
#3

05 Oct 2022, 05:43

Thank you @Øyvind, unfortunately this only partially works and results in significant data loss. As you can see in the data sample below, which is the exact data posted in section 1. I have lost all district names which would've have been my geographical identifier.
Would you know how to adapt the code to keep district variable as a column.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str9 totalpublic str13 totalpublic_inst str17 totalpublic_instcol str9 totalpvt str13 totalpvt_inst str17 totalpvt_instcol "" "" "" "" "" "" "" "" "" "" "" "" "4629" "" "" "" "" "" end
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10219

05 Oct 2022, 05:44

You should show how you want the final output to look like. Hazarding a guess:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str196 C str8 E str7(F G) str10 H str6 I str9(J K) str7 L str6 M str9 N str14 O str7 P str9 Q str7 R
"district"            "_Bihar" "Araria" "Arwal" "Aurangabad" "Banka" "Begusarai" "Bhagalpur" "Bhojpur" "Buxar" "Darbhanga" "East Champaran" "Gaya"  "Gopalganj" "Jamui"
"totalpublic"         "535688" "20118"  "4082"  "27539"      "25147" "3061"      "34146"     "30658"   ""      "11016"     "31357"          "22317" "10651"     "12187"
"totalpublic_inst"     "427820" "21058"  "5752"  "9724"       "26358" "104"       "4030"      "9707"    "11042" ""          "48925"          ""      ""          "16231"
"totalpublic_instcol" "338565" "12567"  "4582"  "3603"       "25422" "98"        "2914"      "5610"    "7143"  ""          "47935"          ""      ""          "16231"
"totalpvt"         ""       ""       ""      ""           ""      ""          ""          ""        ""      ""          ""               ""      ""          ""    
"totalpvt_inst"     "14853"  ""       ""      "4017"       ""      ""          ""          ""        "381"   ""          ""               ""      ""          "2969"
"totalpvt_instcol" "12703"  ""       ""      "3955"       ""      ""          ""          ""        "310"   ""          ""               ""      ""          "2969"
end

foreach var of varlist *{
    rename `var' `=strtoname("`=`var'[1]'")'
}
drop in 1
rename (_Bihar -Jamui) value=
reshape long value, i(district) j(name) string
reshape wide value, i(name) j(district) string
rename value* *

Res.:

Code:

. l, sep(0)

     +----------------------------------------------------------------------------------+
     |           name   totalp~c   t~c_inst   totalp..   totalpvt   t~t_inst   totalp.. |
     |----------------------------------------------------------------------------------|
  1. |         Araria      20118      21058      12567                                  |
  2. |          Arwal       4082       5752       4582                                  |
  3. |     Aurangabad      27539       9724       3603                  4017       3955 |
  4. |          Banka      25147      26358      25422                                  |
  5. |      Begusarai       3061        104         98                                  |
  6. |      Bhagalpur      34146       4030       2914                                  |
  7. |        Bhojpur      30658       9707       5610                                  |
  8. |          Buxar                 11042       7143                   381        310 |
  9. |      Darbhanga      11016                                                        |
 10. | East_Champaran      31357      48925      47935                                  |
 11. |           Gaya      22317                                                        |
 12. |      Gopalganj      10651                                                        |
 13. |          Jamui      12187      16231      16231                  2969       2969 |
 14. |         _Bihar     535688     427820     338565                 14853      12703 |
     +----------------------------------------------------------------------------------+

.

Last edited by Andrew Musau; 05 Oct 2022, 05:46.

Comment

Lorien Nair

Join Date: May 2019
Posts: 115

05 Oct 2022, 06:08

Andrew, the final output that you generated was what I wanted and I had used -strtoname- command initially, but since I need to loop over many excel sheets with different variables I feel like the code that you mentioned cpuld not work when I tried it
i.e.

Code:

 rename (_Bihar -Jamui) value=

This would have to be file specific. I used the code @Øyvind, and generated the following which works perfectly unless there is some dataloss, which I have not found after a quick manual review.:

Code:

local var `" "A & N Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" "
foreach x in `var' {

import excel "$rawdata/`x'/L - `x'_2011.xls", sheet("Sheet1") clear
replace C = "distname" in 6
replace B = "2.0" in 6
bysort B (C): keep if C[1] == "district" | C[1] == "totalpublic" | C[1] == "totalpublic_inst" | C[1] == "totalpublic_instcol" | C[1] == "totalpvt" | C[1] == "totalpvt_inst" | C[1] == "totalpvt_instcol"


sxpose, clear firstnames


save `x'. dta, replace

}

Just the save needs to be fixed in the loop given that there are a lot of ampresands and spaces in file names.

Final output:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str16 distname str9 totalpublic str13 totalpublic_inst str17 totalpublic_instcol str9 totalpvt str13 totalpvt_inst str17 totalpvt_instcol  
"nicobar"          ""     "" "" "" "" "" "A & N Islands"
"north and middle" ""     "" "" "" "" "" "A & N Islands"
"south andaman"    "4629" "" "" "" "" "" "A & N Islands"
end

Thank you both!

Last edited by Lorien Nair; 05 Oct 2022, 06:09. Reason: Forgot to add code section

Announcement

Reshape wide to long when there is no ID variable

Comment

Comment

Comment

Comment