Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert Data in Matrix Form to Long Format

    Hello every one,

    I have data in matrix form, where there are several rows that should ideally be independent columns. For example, I have something like this:
    id2 100 101 102 103 104 105 106 107
    name2 AA BB CC DD EE FF GG HH
    location2 G1 G2 G3 G4 G5 G6 G7 G8
    id1 name1 location1
    1 A l1 a1 a2 a3 a4 a5 a6 a7 a8 a9
    2 B l2 b1 b2 b3 b4 b5 b6 b7 b8 b9
    3 C l3 c1 c2 c3 c4 c5 c6 c7 c8 c9
    4 D l4 d1 d2 d3 d4 d5 d6 d7 d8 d9
    5 E l5 e1 e2 e3 e4 e5 e6 e7 e8 e9
    6 F l6 f1 f2 f3 f4 f5 f6 f7 f8 f9
    7 G l7 g1 g2 g3 g4 g5 g6 g7 g8 g9
    8 H l8 h1 h2 h3 h4 h5 h6 h7 h8 h9
    9 I l9 i1 i2 i3 i4 i5 i6 i7 i8 i9
    10 J l10 j1 j2 j3 j4 j5 j6 j7 j8 j9
    11 K l11 k1 k2 k3 k4 k5 k6 k7 k8 k9
    12 L l12 l1 l2 l3 l4 l5 l6 l7 l8 l9
    13 M l13 m1 m2 m3 m4 m5 m6 m7 m8 m9
    If I had one row on top, I would ideally reshape it to long using id2 but I need all of the information on top of the matrix. I wonder if somebody could help me figure out this.

    Thanks
    Last edited by Ahmad Mobariz; 31 Oct 2020, 19:01.

  • #2
    Please give a concrete example using dataex as requested in the FAQ Advice. See https://www.statalist.org/forums/help#stata

    Abstraction here leaves too much open to guesses about what your data are like -- variable names, storage types and actual values are all a matter for conjecture. .

    Comment


    • #3
      Is it perhaps the case that the "data in matrix form" that you show is stored in an Excel worksheet?

      If so, you should import the Excel worksheet into Stata, which is perhaps most easily done using the visual interface from the File > Import > Excel Spreadsheet menu selection. Do not use the first row of your data as variable names, let them default to the Excel column names. Then use dataex to present your example data as Nick requests.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        Is it perhaps the case that the "data in matrix form" that you show is stored in an Excel worksheet?

        If so, you should import the Excel worksheet into Stata, which is perhaps most easily done using the visual interface from the File > Import > Excel Spreadsheet menu selection. Do not use the first row of your data as variable names, let them default to the Excel column names. Then use dataex to present your example data as Nick requests.
        Sorry, I thought the example I was better. As suggested, I am posting part of my actual data:


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str9 A str39 B str15(C D) str10 E str14 F str12 G str6 H str9 I str17(J K L M N O)
        ""          ""              ""             ""             ""           ""         ""             ""       ""        "District"       "Adraskan"          "Adraskan"          "Adraskan"          "Adraskan"          "Adraskan"         
        ""          ""              ""             ""             ""           ""         ""             ""       ""        "VC"             "1"                 "2"                 "3"                 "4"                 "5"                
        ""          ""              ""             ""             ""           ""         ""             ""       ""        "Geocode"        "24110038"          "24110114"          "24110031"          "24110069"          "24110065"         
        ""          ""              ""             ""             ""           ""         ""             ""       ""        "English Name "  "ALI ABAD"          "BANO SHAK"         "BORJ"              "CHAH JAJAT"        "CHAH KAMAR"       
        ""          ""              ""             ""             ""           ""         ""             ""       ""        "Alternate Name" "0"                 "0"                 "0"                 "0"                 "0"                
        "school_id" "village"       "district"     "district_eng" "districtid" "province" "province_eng" "provid" ""        "Dari Name"      "علي آباد"   "بنوشک"        "برج"            "چاه جاجت"   "چاه کمر"    
        "201100018" "بافتک"    "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.5807" "62.80201"       "24.03766811893237" "46.76896936544783" "18.37862965223474" "96.46007945751509" "90.98416061509954"
        "201100005" "بر پایه" "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.5578" "62.691"         "17.17376361987761" "36.49227334902648" "14.48950185609007" "86.76001978755521" "81.65948859552604"
        "201100029" "بند لکه" "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.4684" "62.6295"        "23.44274753571549" "32.75601038432977" "23.85479616122695" "83.7930123881325"  "79.72228159868898"
        "201100035" "تخاب"      "ادرسکن" "Adreskan"     "2011"       "هرات" "Hirat"        "6"      "33.6627" "62.89821"       "30.66423568886282" "56.60472394806315" "23.47695300888527" "104.4216158168641" "98.23946207155936"
        end
        As you can see, variables "District" to "Dari Name" are stacked on each other. I want to reshape so that I maintain all of the stacked information. Among them, VC and Geocode are the most important ones that I need to maintain.

        Comment


        • #5
          Thanks for posting an example. Sorry: I spent some time trying to understand what is going on here, and failed.

          Comment


          • #6
            Like Nick, I am unable to understand your ultimate goal, and how your spreadsheet information in the first 6 rows of columns K onward relate to the information in columns A-J of rows 7 onward. But in the hope that it can at least start you on your way, here is how I unstacked the information in the first 6 rows.
            Code:
            keep in 1/6
            generate varname = strtoname(trim(J)) in 1/6
            forvalues i=1/6 {
                local newvars `newvars' `=varname[`i']'
            }
            drop A-J
            order varname
            
            quietly ds varname, not
            rename (`r(varlist)') (value_=)
            reshape long value_, i(varname) j(column) string
            reshape wide value_, i(column) j(varname) string
            rename (value_*) (*)
            order column `newvars'
            
            list, clean noobs abbreviate(20)
            Code:
            . list, clean noobs abbreviate(20)
            
                column   District   VC    Geocode   English_Name   Alternate_Name   Dari_Name  
                     K   Adraskan    1   24110038       ALI ABAD                0    علي آباد  
                     L   Adraskan    2   24110114      BANO SHAK                0       بنوشک  
                     M   Adraskan    3   24110031           BORJ                0         برج  
                     N   Adraskan    4   24110069     CHAH JAJAT                0    چاه جاجت  
                     O   Adraskan    5   24110065     CHAH KAMAR                0     چاه کمر

            Comment


            • #7
              Building on the previous example, perhaps this is closer to your desired final result.
              Code:
              tempfile all
              save `all'
              // unstack the top 6 rows
              keep in 1/6
              generate varname = strtoname(trim(J)) in 1/6
              local newvars
              forvalues i=1/6 {
                  local newvars `newvars' `=varname[`i']'
              }
              drop A-J
              order varname
              
              quietly ds varname, not
              rename (`r(varlist)') (value_=)
              reshape long value_, i(varname) j(column) string
              reshape wide value_, i(column) j(varname) string
              rename (value_*) (*)
              order column `newvars'
              
              tempfile top
              save `top'
              
              // get the school_id-specific data
              use `all', clear
              drop in 1/5
              quietly ds A-J, not
              rename (`r(varlist)') (value_=)
              local newvars
              foreach var of varlist A-H {
                  local newvars `newvars' `=`var'[1]'
              }
              rename (A-H) (`newvars')
              drop in 1
              reshape long value_, i(`newvars') j(column) string
              rename value_ Z
              
              // merge with the other data
              merge m:1 column using `top', assert(match)
              drop _merge
              sort school_id column
              
              list in 1/2, abbreviate(20)
              Code:
              . list in 1/2, abbreviate(20)
              
                   +--------------------------------------------------------------------------------------+
                1. | school_id | village | district | district_eng | districtid | province | province_eng |
                   | 201100005 | بر پایه |   ادرسکن |     Adreskan |       2011 |     هرات |        Hirat |
                   |--------------------------------------------------------------------------------------|
                   | provid | column |       I |      J |                 Z | District  | VC  |  Geocode  |
                   |      6 |      K | 33.5578 | 62.691 | 17.17376361987761 | Adraskan  |  1  | 24110038  |
                   |--------------------------------------------------------------------------------------|
                   |        English_Name        |        Alternate_Name        |        Dari_Name         |
                   |            ALI ABAD        |                     0        |         علي آباد         |
                   +--------------------------------------------------------------------------------------+
              
                   +--------------------------------------------------------------------------------------+
                2. | school_id | village | district | district_eng | districtid | province | province_eng |
                   | 201100005 | بر پایه |   ادرسکن |     Adreskan |       2011 |     هرات |        Hirat |
                   |--------------------------------------------------------------------------------------|
                   | provid | column |       I |      J |                 Z | District  | VC  |  Geocode  |
                   |      6 |      L | 33.5578 | 62.691 | 36.49227334902648 | Adraskan  |  2  | 24110114  |
                   |--------------------------------------------------------------------------------------|
                   |        English_Name        |        Alternate_Name        |        Dari_Name         |
                   |           BANO SHAK        |                     0        |            بنوشک         |
                   +--------------------------------------------------------------------------------------+

              Comment


              • #8
                William Lisowski thanks a lot. This is helpful. The problem is actually a reshape wide to long problem in nature. It is the distance between schools and villages. Information about the schools is given in the on the rows in the left (schoolid - longitude), and the information about the village is provided on the top (stacked information in the rows). I want to transform it into a long format while maintaining all of the information for both schools and villages. Now, the problem is resolved. I did some manual work in the excel file and renamed the variables so that I could reshape the data with respect to each of the variables on top and merge them at the end.

                Thank you. Sorry if this problem confused you!

                Comment

                Working...
                X