Convert Data in Matrix Form to Long Format

Ahmad Mobariz

Join Date: Oct 2019
Posts: 26

Convert Data in Matrix Form to Long Format

31 Oct 2020, 18:49

Hello every one,

I have data in matrix form, where there are several rows that should ideally be independent columns. For example, I have something like this:

			id2	100	101	102	103	104	105	106	107
			name2	AA	BB	CC	DD	EE	FF	GG	HH
			location2	G1	G2	G3	G4	G5	G6	G7	G8
id1	name1	location1
1	A	l1	a1	a2	a3	a4	a5	a6	a7	a8	a9
2	B	l2	b1	b2	b3	b4	b5	b6	b7	b8	b9
3	C	l3	c1	c2	c3	c4	c5	c6	c7	c8	c9
4	D	l4	d1	d2	d3	d4	d5	d6	d7	d8	d9
5	E	l5	e1	e2	e3	e4	e5	e6	e7	e8	e9
6	F	l6	f1	f2	f3	f4	f5	f6	f7	f8	f9
7	G	l7	g1	g2	g3	g4	g5	g6	g7	g8	g9
8	H	l8	h1	h2	h3	h4	h5	h6	h7	h8	h9
9	I	l9	i1	i2	i3	i4	i5	i6	i7	i8	i9
10	J	l10	j1	j2	j3	j4	j5	j6	j7	j8	j9
11	K	l11	k1	k2	k3	k4	k5	k6	k7	k8	k9
12	L	l12	l1	l2	l3	l4	l5	l6	l7	l8	l9
13	M	l13	m1	m2	m3	m4	m5	m6	m7	m8	m9

If I had one row on top, I would ideally reshape it to long using id2 but I need all of the information on top of the matrix. I wonder if somebody could help me figure out this.

Thanks

Last edited by Ahmad Mobariz; 31 Oct 2020, 19:01.

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

01 Nov 2020, 05:05

Please give a concrete example using dataex as requested in the FAQ Advice. See https://www.statalist.org/forums/help#stata

Abstraction here leaves too much open to guesses about what your data are like -- variable names, storage types and actual values are all a matter for conjecture. .
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

01 Nov 2020, 09:29

Is it perhaps the case that the "data in matrix form" that you show is stored in an Excel worksheet?

If so, you should import the Excel worksheet into Stata, which is perhaps most easily done using the visual interface from the File > Import > Excel Spreadsheet menu selection. Do not use the first row of your data as variable names, let them default to the Excel column names. Then use dataex to present your example data as Nick requests.
1 like
Comment

Ahmad Mobariz

Join Date: Oct 2019
Posts: 26

01 Nov 2020, 21:35

Originally posted by William Lisowski View Post

Is it perhaps the case that the "data in matrix form" that you show is stored in an Excel worksheet?

If so, you should import the Excel worksheet into Stata, which is perhaps most easily done using the visual interface from the File > Import > Excel Spreadsheet menu selection. Do not use the first row of your data as variable names, let them default to the Excel column names. Then use dataex to present your example data as Nick requests.

Sorry, I thought the example I was better. As suggested, I am posting part of my actual data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 A str39 B str15(C D) str10 E str14 F str12 G str6 H str9 I str17(J K L M N O)
""          ""              ""             ""             ""           ""         ""             ""       ""        "District"       "Adraskan"          "Adraskan"          "Adraskan"          "Adraskan"          "Adraskan"         
""          ""              ""             ""             ""           ""         ""             ""       ""        "VC"             "1"                 "2"                 "3"                 "4"                 "5"                
""          ""              ""             ""             ""           ""         ""             ""       ""        "Geocode"        "24110038"          "24110114"          "24110031"          "24110069"          "24110065"         
""          ""              ""             ""             ""           ""         ""             ""       ""        "English Name "  "ALI ABAD"          "BANO SHAK"         "BORJ"              "CHAH JAJAT"        "CHAH KAMAR"       
""          ""              ""             ""             ""           ""         ""             ""       ""        "Alternate Name" "0"                 "0"                 "0"                 "0"                 "0"                
"school_id" "village"       "district"     "district_eng" "districtid" "province" "province_eng" "provid" ""        "Dari Name"      "علي آباد"   "بنوشک"        "برج"            "چاه جاجت"   "چاه کمر"    
"201100018" "بافتک"    "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.5807" "62.80201"       "24.03766811893237" "46.76896936544783" "18.37862965223474" "96.46007945751509" "90.98416061509954"
"201100005" "بر پایه" "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.5578" "62.691"         "17.17376361987761" "36.49227334902648" "14.48950185609007" "86.76001978755521" "81.65948859552604"
"201100029" "بند لکه" "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.4684" "62.6295"        "23.44274753571549" "32.75601038432977" "23.85479616122695" "83.7930123881325"  "79.72228159868898"
"201100035" "تخاب"      "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.6627" "62.89821"       "30.66423568886282" "56.60472394806315" "23.47695300888527" "104.4216158168641" "98.23946207155936"
end

As you can see, variables "District" to "Dari Name" are stacked on each other. I want to reshape so that I maintain all of the stacked information. Among them, VC and Geocode are the most important ones that I need to maintain.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

02 Nov 2020, 02:05

Thanks for posting an example. Sorry: I spent some time trying to understand what is going on here, and failed.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 Nov 2020, 14:02

Like Nick, I am unable to understand your ultimate goal, and how your spreadsheet information in the first 6 rows of columns K onward relate to the information in columns A-J of rows 7 onward. But in the hope that it can at least start you on your way, here is how I unstacked the information in the first 6 rows.

Code:

keep in 1/6
generate varname = strtoname(trim(J)) in 1/6
forvalues i=1/6 {
    local newvars `newvars' `=varname[`i']'
}
drop A-J
order varname

quietly ds varname, not
rename (`r(varlist)') (value_=)
reshape long value_, i(varname) j(column) string
reshape wide value_, i(column) j(varname) string
rename (value_*) (*)
order column `newvars'

list, clean noobs abbreviate(20)

Code:

. list, clean noobs abbreviate(20)

    column   District   VC    Geocode   English_Name   Alternate_Name   Dari_Name  
         K   Adraskan    1   24110038       ALI ABAD                0    علي آباد  
         L   Adraskan    2   24110114      BANO SHAK                0       بنوشک  
         M   Adraskan    3   24110031           BORJ                0         برج  
         N   Adraskan    4   24110069     CHAH JAJAT                0    چاه جاجت  
         O   Adraskan    5   24110065     CHAH KAMAR                0     چاه کمر

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 Nov 2020, 14:31

Building on the previous example, perhaps this is closer to your desired final result.

Code:

tempfile all
save `all'
// unstack the top 6 rows
keep in 1/6
generate varname = strtoname(trim(J)) in 1/6
local newvars
forvalues i=1/6 {
    local newvars `newvars' `=varname[`i']'
}
drop A-J
order varname

quietly ds varname, not
rename (`r(varlist)') (value_=)
reshape long value_, i(varname) j(column) string
reshape wide value_, i(column) j(varname) string
rename (value_*) (*)
order column `newvars'

tempfile top
save `top'

// get the school_id-specific data
use `all', clear
drop in 1/5
quietly ds A-J, not
rename (`r(varlist)') (value_=)
local newvars
foreach var of varlist A-H {
    local newvars `newvars' `=`var'[1]'
}
rename (A-H) (`newvars')
drop in 1
reshape long value_, i(`newvars') j(column) string
rename value_ Z

// merge with the other data
merge m:1 column using `top', assert(match)
drop _merge
sort school_id column

list in 1/2, abbreviate(20)

Code:

. list in 1/2, abbreviate(20)

     +--------------------------------------------------------------------------------------+
  1. | school_id | village | district | district_eng | districtid | province | province_eng |
     | 201100005 | بر پایه |   ادرسکن |     Adreskan |       2011 |     هرات |        Hirat |
     |--------------------------------------------------------------------------------------|
     | provid | column |       I |      J |                 Z | District  | VC  |  Geocode  |
     |      6 |      K | 33.5578 | 62.691 | 17.17376361987761 | Adraskan  |  1  | 24110038  |
     |--------------------------------------------------------------------------------------|
     |        English_Name        |        Alternate_Name        |        Dari_Name         |
     |            ALI ABAD        |                     0        |         علي آباد         |
     +--------------------------------------------------------------------------------------+

     +--------------------------------------------------------------------------------------+
  2. | school_id | village | district | district_eng | districtid | province | province_eng |
     | 201100005 | بر پایه |   ادرسکن |     Adreskan |       2011 |     هرات |        Hirat |
     |--------------------------------------------------------------------------------------|
     | provid | column |       I |      J |                 Z | District  | VC  |  Geocode  |
     |      6 |      L | 33.5578 | 62.691 | 36.49227334902648 | Adraskan  |  2  | 24110114  |
     |--------------------------------------------------------------------------------------|
     |        English_Name        |        Alternate_Name        |        Dari_Name         |
     |           BANO SHAK        |                     0        |            بنوشک         |
     +--------------------------------------------------------------------------------------+

Comment

Ahmad Mobariz

Join Date: Oct 2019

Posts: 26
#8

03 Nov 2020, 09:30

William Lisowski thanks a lot. This is helpful. The problem is actually a reshape wide to long problem in nature. It is the distance between schools and villages. Information about the schools is given in the on the rows in the left (schoolid - longitude), and the information about the village is provided on the top (stacked information in the rows). I want to transform it into a long format while maintaining all of the information for both schools and villages. Now, the problem is resolved. I did some manual work in the excel file and renamed the variables so that I could reshape the data with respect to each of the variables on top and merge them at the end.

Thank you. Sorry if this problem confused you!
Comment

Announcement

Convert Data in Matrix Form to Long Format

Comment

Comment

Comment

Comment

Comment

Comment

Comment