Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating an adjacency matrix from two columns

    Dear Statalists,

    I have 2 columns (id and course)

    I would like to create two types of adjacency matrices look like these:

    1) Column headings for courses and row headings for id. The values inside the matrix equal 1 if the id attended the course, zero otherwise.
    341217 540103 340769 ......... 1199653 ........
    1977 1 1 1 .......... 0 ........
    243Q 0 0 0 .......... 1 ........
    ... ..... .... ... ... ... .........
    579Y 0 0 0 ....... 0 .......
    2) Both column headings and row headings for the ids, the values inside the matrix equals 1 if two ids attended the same course, zero otherwise. The diagonal equals zero (do not take into account the id itself).
    1977 243Q 3833 ......... 1199653 ........
    1977 0 0 0 .......... 0 ........
    243Q 0 0 0 .......... 1 ........
    ... ..... .... ... ... ... .........
    579Y 0 1 0 ....... 0 .......
    Can anyone help me with this, please?

    I really appreciate your help.

    Kind regards,
    Ken


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 id str7 course
    "1977" "341217" 
    "1977" "540103" 
    "1977" "340769" 
    "1977" "341355" 
    "1977" "341168" 
    "1977" "341583" 
    "1977" "341582" 
    "1977" "341374" 
    "1977" "341423" 
    "1977" "341118" 
    "1977" "341132" 
    "1977" "495892" 
    "1977" "341153" 
    "1977" "505279" 
    "1977" "341386" 
    "1977" "340825" 
    "1977" "340871" 
    "1977" "340791" 
    "1977" "340842" 
    "1977" "341438" 
    "1977" "59796"  
    "1977" "1617025"
    "243Q" "1199653"
    "243Q" "341811" 
    "243Q" "485459" 
    "243Q" "446218" 
    "243Q" "341598" 
    "243Q" "141620" 
    "243Q" "36503"  
    "243Q" "441141" 
    "243Q" "1134308"
    "243Q" "508730" 
    "243Q" "341830" 
    "243Q" "341835" 
    "243Q" "340891" 
    "243Q" "341846" 
    "243Q" "341863" 
    "3833" "343997" 
    "3833" "344018" 
    "235B" "343977" 
    "235B" "343997" 
    "235B" "344018" 
    "3833" "343982" 
    "235B" "343982" 
    "235B" "344099" 
    "3833" "344099" 
    "235B" "343983" 
    "3833" "343983" 
    "3833" "343977" 
    "1314" "335777" 
    "1314" "335735" 
    "1314" "335863" 
    "1314" "335561" 
    "1314" "1307343"
    "1314" "335960" 
    "1314" "1703887"
    "1314" "1328198"
    "1314" "335579" 
    "1314" "512602" 
    "1314" "335381" 
    "1314" "1327143"
    "1314" "1250954"
    "1314" "1617193"
    "1314" "375960" 
    "1314" "1672088"
    "1314" "1401213"
    "1314" "1207439"
    "1314" "335413" 
    "1314" "342457" 
    "1314" "377905" 
    "1314" "985031" 
    "1314" "1128721"
    "1314" "1250769"
    "1314" "335716" 
    "1314" "335849" 
    "1314" "335759" 
    "1314" "336031" 
    "1314" "1602281"
    "1314" "335949" 
    "1314" "336009" 
    "1314" "205455" 
    "1314" "335877" 
    "5535" "1054442"
    "5535" "886538" 
    "5535" "346818" 
    "5535" "1325730"
    "5535" "1461733"
    "5535" "1597384"
    "5535" "532609" 
    "5535" "1321045"
    "5535" "627086" 
    "5535" "1600118"
    "5535" "1596727"
    "5535" "1049576"
    "5535" "1367512"
    "5535" "1363494"
    "5535" "499359" 
    "5535" "1809673"
    "5535" "1538194"
    "5535" "455960" 
    "5535" "62737"  
    "5535" "1763041"
    "5535" "1800512"
    "5535" "1095323"
    "5535" "347338" 
    "5535" "1711058"
    "5535" "1313929"
    "5535" "1740625"
    "5535" "320934" 
    "5535" "343699" 
    "5535" "596870" 
    "5535" "1772171"
    "5535" "347340" 
    "5535" "837095" 
    "5535" "653402" 
    "5535" "372351" 
    "5535" "497183" 
    "5535" "325850" 
    "5535" "87682"  
    "5535" "1785362"
    "5535" "499576" 
    "5535" "83621"  
    "5535" "320920" 
    "5535" "455796" 
    "5535" "1203728"
    "5535" "1078585"
    "5535" "1858272"
    "5535" "1274835"
    "5535" "1656747"
    "5535" "499571" 
    "5535" "87647"  
    "5535" "545665" 
    "5535" "803480" 
    "5535" "321217" 
    "5535" "541624" 
    "5535" "831377" 
    "5535" "1801001"
    "5535" "1785120"
    "5535" "1380041"
    "5535" "87637"  
    "5535" "555878" 
    "5535" "55417"  
    "5535" "320933" 
    "5535" "1072273"
    "5535" "1517333"
    "5535" "498031" 
    "5535" "1795554"
    "5535" "1664080"
    "5535" "340013" 
    "5535" "1716233"
    "5535" "1879911"
    "5535" "221228" 
    "5535" "453830" 
    "5535" "548662" 
    "5535" "1890026"
    "5535" "510707" 
    "5535" "1089194"
    "5535" "272463" 
    "5535" "596233" 
    "5535" "1836733"
    "5535" "553759" 
    "5535" "1591164"
    "5535" "1371168"
    "5535" "206410" 
    "5535" "1072590"
    "5535" "501271" 
    "5535" "1335306"
    "554D" "373024" 
    "554D" "345898" 
    "554D" "499721" 
    "554D" "346598" 
    "554D" "346636" 
    "554D" "346690" 
    "554D" "342417" 
    "554D" "607964" 
    "554D" "517027" 
    "554D" "346683" 
    "554D" "346629" 
    "554D" "346615" 
    "554D" "346686" 
    "554D" "607961" 
    "554D" "345170" 
    "554D" "346569" 
    "554D" "457918" 
    "554D" "509120" 
    "579Y" "1669556"
    "579Y" "889136" 
    "579Y" "1404877"
    "579Y" "1487736"
    "579Y" "534434" 
    "579Y" "1831258"
    "579Y" "1523249"
    "579Y" "35454"  
    "579Y" "33410"  
    "579Y" "1251519"
    "579Y" "35458"  
    "579Y" "1826538"
    "579Y" "1597413"
    "579Y" "1091764"
    "579Y" "33354"  
    end

  • #2
    It doesn't seem like you want to use Mata.
    And then this is not the right forum.
    A Stata solution to get both matrices could be:
    Code:
    tempfile data
    save `data'
    
    generate n = 1
    collapse (sum) n, by(id course)
    reshape wide n, i(id) j(course) string
    mvencode *, mv(0)
    mkmat n*, matrix(mat1) rownames(id)
    matrix colnames mat1 = `=subinstr(`"`:rownames mat1'"', "n", "", .)'
    
    use `data', clear
    rename id id1
    merge m:m course using `data', nogenerate
    drop course
    duplicates drop
    generate n = 1
    reshape wide n, i(id1) j(id) string
    mvencode *, mv(0)
    mkmat n*, matrix(mat2) rownames(id1)
    matrix colnames mat2 = `=subinstr(`"`:rownames mat2'"', "n", "", .)'
    Kind regards

    nhb

    Comment


    • #3
      Hi Niels Henrik Bruun ,

      Thank you so much for your help. I read some suggestions and got confused if I need to use mata, that is why I asked my questions here.

      One more issue, I have quite big data, with 2,500 ids, and each id attended many courses. Stata could not create the matrices. Stata has not enough space for the matrices.

      Can you help me again with this issue, please?

      Thank you so much.
      Ken

      Comment


      • #4
        You might be able to set matrix size higher. Or use a larger version of Stata.
        In the end I would recommend saving as datasets instead of matrices.

        Regards nhb
        Kind regards

        nhb

        Comment


        • #5
          Thank you very much Niels Henrik Bruun. I really appreciate your help.

          Comment


          • #6
            Niels Henrik Bruun, I have a similar question to the original post. I have tried to work through your code with my data, but am coming up with issues and am struggling a bit.

            I am using Stata 15 and have 432 observations. I have a column of unique actor ID's and columns of 300+ forums, 0 if actor did not attend the forum and a 1 if the actor attended a forum. What I need to do is create a weighted adjacency matrix, an actor by actor matrix with cells populated by the number of forums actors coparticipate in. If you (or anyone) could help, I would be much indebted. Here is a sample of my data.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex clear input int Actorid byte(Forum2 Forum4 Forum6 Forum9 Forum17 Forum18 Forum19 Forum21) 1 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 23 0 1 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 29 0 0 0 0 0 0 0 0 36 0 0 0 0 0 0 0 0 38 0 0 0 0 0 0 0 0 42 0 0 0 0 0 0 0 0 46 0 0 0 0 0 0 0 0 60 0 0 0 1 0 0 0 0 65 0 0 0 0 0 0 0 0 71 0 1 0 0 0 0 0 0 79 0 0 0 0 0 0 0 0 end
            ------------------ copy up to and including the previous line ------------------

            Listed 15 out of 442 observations

            Thanks!

            Comment


            • #7
              Not sure why the dataex displayed like it did above, please see below:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int Actorid byte(Forum2 Forum4 Forum6 Forum9 Forum17 Forum18 Forum19 Forum21)
               1 0 0 0 0 0 0 0 0
               3 0 0 0 0 0 0 0 0
               8 0 0 0 0 0 0 0 0
              18 0 0 0 0 0 0 0 0
              23 0 1 0 0 0 0 0 0
              26 0 0 0 0 0 0 0 0
              29 0 0 0 0 0 0 0 0
              36 0 0 0 0 0 0 0 0
              38 0 0 0 0 0 0 0 0
              42 0 0 0 0 0 0 0 0
              46 0 0 0 0 0 0 0 0
              60 0 0 0 1 0 0 0 0
              65 0 0 0 0 0 0 0 0
              71 0 1 0 0 0 0 0 0
              79 0 0 0 0 0 0 0 0
              end
              ------------------ copy up to and including the previous line ------------------

              Listed 15 out of 442 observations

              Comment


              • #8
                Dear Aria
                You should start a new post to get attention from others than me.
                However, try this before you use the the code in #2:

                Code:
                reshape long Forum, i(Actorid) j(forum)
                keep if Forum
                drop Forum
                Then I think you data have the proper format
                Kind regards

                nhb

                Comment


                • #9
                  Hi Niels,

                  Thank you. I am new to this forum and will start a new post. My data was originally long, I transformed to wide to make an adjacency matrix (thought I needed to) and am still coming across errors. I'll try to mess around a little more. Thank you.

                  Comment


                  • #10
                    This is the code I am running (note a conformability error). What I would like to create is an actor by actor adjacency matrix where the number of forums actors coparticipate in populates the cells. I'm sorry for the novice question, I have been working at this for a few weeks.

                    clear
                    set more off

                    use "/Users//Desktop/Actor-Forum.dta"

                    tempfile data
                    save `data'

                    mvencode *, mv(0)
                    mkmat Forum*, matrix(mat1) rownames(Actorid)
                    matrix colnames mat1 = `=subinstr(`"`:rownames mat1'"', "n", "", .)' ****CONFORMABILITY ERROR

                    use `data', clear
                    rename popid popid1
                    merge m:m forumid using `data', nogenerate
                    drop forumid
                    duplicates drop
                    mvencode *, mv(0)
                    mkmat n*, matrix(mat2) rownames(popid1)
                    matrix colnames mat2 = `=subinstr(`"`:rownames mat2'"', "n", "", .)'

                    Comment

                    Working...
                    X