Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • runby: requirements for group variable?

    ** This post references a previous forum discussion (https://www.statalist.org/forums/node/1484556) but is posted separately because it seems to deal with a sufficiently separate sub-issue. Please advise or link if the post should be continued under the previous thread.

    I have been working with data using code that includes the very handy 'runby' module, which performs a program, looping over data by-groups. In my case, the code (written by Clyde Schechter and available here) performs matrix multiplication over a set of groups. The code works as expected for some of the data but there are many exceptions/errors. The error appears to result from some problem runby encounters when looping over certain group variables. For the below data, for example, the matrix multiplication operation is performed for three of the four groups: 1714081, 1714201 and 1714351 but *not* for one of the groups: 1714251, as indicated by the report appearing below the data.

    The problem seems to be with the specific values of the group variable (i.e. observations 1714251). However, I examined the data type for these observations and it matches the other observations. I also examined the data structure for other variables associated with the 'errant' observations and the structure seems no different than other observations. However, the problem is not solved by simply changing the group id values to 001, 002, 003, 004. Whatever the problem, it is systematic and occurs for 1,096 of the 1,136 groups my data. I'd appreciate any help.

    Thanks.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long groupid byte(v1 v2 v3 v4 v5 v6 v7 v8 v9 v10)
    1714081 0 0 0 1 0 0 1 0 0 0
    1714081 1 1 1 1 1 1 1 0 1 0
    1714081 0 0 0 0 0 0 0 0 0 0
    1714201 1 1 1 1 0 0 1 0 1 1
    1714201 0 0 0 0 0 0 0 0 0 0
    1714201 0 0 0 0 0 0 0 0 0 0
    1714201 1 1 1 1 1 1 1 0 0 1
    1714201 1 1 1 1 1 1 1 0 1 1
    1714201 1 1 1 1 1 1 1 0 1 1
    1714251 0 0 1 1 1 0 1 0 1 0
    1714251 1 1 1 1 0 1 1 0 0 1
    1714251 1 1 1 1 1 0 1 0 1 1
    1714251 1 1 1 1 0 0 1 0 1 0
    1714251 1 1 1 1 1 0 1 0 0 0
    1714251 1 1 1 1 1 1 1 0 0 1
    1714251 0 0 0 1 0 0 0 0 0 0
    1714251 0 0 0 0 0 0 1 0 0 0
    1714251 1 0 1 1 1 1 1 0 0 1
    1714251 1 1 1 1 1 0 1 0 1 0
    1714251 1 1 1 1 1 1 1 0 1 0
    1714351 0 1 1 0 0 0 0 0 0 0
    1714351 0 0 0 0 0 0 0 0 0 0
    1714351 1 1 1 1 1 0 1 0 0 1
    1714351 1 1 1 1 0 0 0 0 0 0
    1714351 1 0 0 1 0 0 0 0 0 1
    1714351 1 1 1 1 1 0 1 0 0 1
    1714351 0 1 0 1 0 0 0 0 1 0
    1714351 1 1 1 1 1 0 1 1 0 1
    1714351 0 1 1 0 0 0 1 0 0 0
    1714351 1 1 1 1 0 0 1 0 1 1
    end

    [CODE]

    . runby one_group, by(groupid)

    --------------------------------------
    Number of by-groups = 4
    by-groups with errors = 1
    by-groups with no data = 0
    Observations processed = 30
    Observations saved = 19
    --------------------------------------

  • #2
    runby is from SSC and is authored by Robert Picard and Clyde Schechter. Clyde may take a look at why runby is outputting an error message here, but in the meantime, I would pursue the following workaround.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long groupid byte(v1 v2 v3 v4 v5 v6 v7 v8 v9 v10)
    1714081 0 0 0 1 0 0 1 0 0 0
    1714081 1 1 1 1 1 1 1 0 1 0
    1714081 0 0 0 0 0 0 0 0 0 0
    1714201 1 1 1 1 0 0 1 0 1 1
    1714201 0 0 0 0 0 0 0 0 0 0
    1714201 0 0 0 0 0 0 0 0 0 0
    1714201 1 1 1 1 1 1 1 0 0 1
    1714201 1 1 1 1 1 1 1 0 1 1
    1714201 1 1 1 1 1 1 1 0 1 1
    1714251 0 0 1 1 1 0 1 0 1 0
    1714251 1 1 1 1 0 1 1 0 0 1
    1714251 1 1 1 1 1 0 1 0 1 1
    1714251 1 1 1 1 0 0 1 0 1 0
    1714251 1 1 1 1 1 0 1 0 0 0
    1714251 1 1 1 1 1 1 1 0 0 1
    1714251 0 0 0 1 0 0 0 0 0 0
    1714251 0 0 0 0 0 0 1 0 0 0
    1714251 1 0 1 1 1 1 1 0 0 1
    1714251 1 1 1 1 1 0 1 0 1 0
    1714251 1 1 1 1 1 1 1 0 1 0
    1714351 0 1 1 0 0 0 0 0 0 0
    1714351 0 0 0 0 0 0 0 0 0 0
    1714351 1 1 1 1 1 0 1 0 0 1
    1714351 1 1 1 1 0 0 0 0 0 0
    1714351 1 0 0 1 0 0 0 0 0 1
    1714351 1 1 1 1 1 0 1 0 0 1
    1714351 0 1 0 1 0 0 0 0 1 0
    1714351 1 1 1 1 1 0 1 1 0 1
    1714351 0 1 1 0 0 0 1 0 0 0
    1714351 1 1 1 1 0 0 1 0 1 1
    end
    
    tempfile data
    save `data'
    levelsof groupid, local(gid)
    foreach g in `gid'{
    keep if groupid==`g'
    mkmat v*, matrix(M`g')
    matrix MM`g' = M`g''*M`g'
    mat rown MM`g'= "MM`g'"
    matrix Stack = nullmat(Stack) \ MM`g'
    use `data', clear
    }

    Result:

    Code:
    . mat list Stack
    
    Stack[40,10]
                v1   v2   v3   v4   v5   v6   v7   v8   v9  v10
    MM1714081    1    1    1    1    1    1    1    0    1    0
    MM1714081    1    1    1    1    1    1    1    0    1    0
    MM1714081    1    1    1    1    1    1    1    0    1    0
    MM1714081    1    1    1    2    1    1    2    0    1    0
    MM1714081    1    1    1    1    1    1    1    0    1    0
    MM1714081    1    1    1    1    1    1    1    0    1    0
    MM1714081    1    1    1    2    1    1    2    0    1    0
    MM1714081    0    0    0    0    0    0    0    0    0    0
    MM1714081    1    1    1    1    1    1    1    0    1    0
    MM1714081    0    0    0    0    0    0    0    0    0    0
    MM1714201    4    4    4    4    3    3    4    0    3    4
    MM1714201    4    4    4    4    3    3    4    0    3    4
    MM1714201    4    4    4    4    3    3    4    0    3    4
    MM1714201    4    4    4    4    3    3    4    0    3    4
    MM1714201    3    3    3    3    3    3    3    0    2    3
    MM1714201    3    3    3    3    3    3    3    0    2    3
    MM1714201    4    4    4    4    3    3    4    0    3    4
    MM1714201    0    0    0    0    0    0    0    0    0    0
    MM1714201    3    3    3    3    2    2    3    0    3    3
    MM1714201    4    4    4    4    3    3    4    0    3    4
    MM1714251    8    7    8    8    6    4    8    0    4    4
    MM1714251    7    7    7    7    5    3    7    0    4    3
    MM1714251    8    7    9    9    7    4    9    0    5    4
    MM1714251    8    7    9   10    7    4    9    0    5    4
    MM1714251    6    5    7    7    7    3    7    0    4    3
    MM1714251    4    3    4    4    3    4    4    0    1    3
    MM1714251    8    7    9    9    7    4   10    0    5    4
    MM1714251    0    0    0    0    0    0    0    0    0    0
    MM1714251    4    4    5    5    4    1    5    0    5    1
    MM1714251    4    3    4    4    3    3    4    0    1    4
    MM1714351    6    5    5    6    3    0    4    1    1    5
    MM1714351    5    8    7    6    3    0    5    1    2    4
    MM1714351    5    7    7    5    3    0    5    1    1    4
    MM1714351    6    6    5    7    3    0    4    1    2    5
    MM1714351    3    3    3    3    3    0    3    1    0    3
    MM1714351    0    0    0    0    0    0    0    0    0    0
    MM1714351    4    5    5    4    3    0    5    1    1    4
    MM1714351    1    1    1    1    1    0    1    1    0    1
    MM1714351    1    2    1    2    0    0    1    0    2    1
    MM1714351    5    4    4    5    3    0    4    1    1    5

    Comment


    • #3
      You can use the verbose option with runby to help diagnose where the error comes from. I think that there's an error in Clyde's code (form this post). The following line:
      Code:
      local nrows: rowsof M`group'
      should be
      Code:
      local nrows: rowsof MM`group'
      I'm not sure why there was so much emphasis on row labels or why the results had to be accumulated in a matrix in the first place. You can perform the same thing and replace the data in memory with the results using:

      Code:
      clear all
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long groupid byte(v1 v2 v3 v4 v5 v6 v7 v8 v9 v10)
      1714081 0 0 0 1 0 0 1 0 0 0
      1714081 1 1 1 1 1 1 1 0 1 0
      1714081 0 0 0 0 0 0 0 0 0 0
      1714201 1 1 1 1 0 0 1 0 1 1
      1714201 0 0 0 0 0 0 0 0 0 0
      1714201 0 0 0 0 0 0 0 0 0 0
      1714201 1 1 1 1 1 1 1 0 0 1
      1714201 1 1 1 1 1 1 1 0 1 1
      1714201 1 1 1 1 1 1 1 0 1 1
      1714251 0 0 1 1 1 0 1 0 1 0
      1714251 1 1 1 1 0 1 1 0 0 1
      1714251 1 1 1 1 1 0 1 0 1 1
      1714251 1 1 1 1 0 0 1 0 1 0
      1714251 1 1 1 1 1 0 1 0 0 0
      1714251 1 1 1 1 1 1 1 0 0 1
      1714251 0 0 0 1 0 0 0 0 0 0
      1714251 0 0 0 0 0 0 1 0 0 0
      1714251 1 0 1 1 1 1 1 0 0 1
      1714251 1 1 1 1 1 0 1 0 1 0
      1714251 1 1 1 1 1 1 1 0 1 0
      1714351 0 1 1 0 0 0 0 0 0 0
      1714351 0 0 0 0 0 0 0 0 0 0
      1714351 1 1 1 1 1 0 1 0 0 1
      1714351 1 1 1 1 0 0 0 0 0 0
      1714351 1 0 0 1 0 0 0 0 0 1
      1714351 1 1 1 1 1 0 1 0 0 1
      1714351 0 1 0 1 0 0 0 0 1 0
      1714351 1 1 1 1 1 0 1 1 0 1
      1714351 0 1 1 0 0 0 1 0 0 0
      1714351 1 1 1 1 0 0 1 0 1 1
      end
      
      program one_group
          local this_group = groupid[1]
          mkmat v*, matrix(X)
          matrix XpX = X' * X
          drop _all
          svmat XpX
          gen gname = "g`this_group'"
      end
      
      runby one_group, by(groupid) verbose
      and the results:
      Code:
      . list, sepby(gname)
      
           +---------------------------------------------------------------------------------+
           | XpX1   XpX2   XpX3   XpX4   XpX5   XpX6   XpX7   XpX8   XpX9   XpX10      gname |
           |---------------------------------------------------------------------------------|
        1. |    1      1      1      1      1      1      1      0      1       0   g1714081 |
        2. |    1      1      1      1      1      1      1      0      1       0   g1714081 |
        3. |    1      1      1      1      1      1      1      0      1       0   g1714081 |
        4. |    1      1      1      2      1      1      2      0      1       0   g1714081 |
        5. |    1      1      1      1      1      1      1      0      1       0   g1714081 |
        6. |    1      1      1      1      1      1      1      0      1       0   g1714081 |
        7. |    1      1      1      2      1      1      2      0      1       0   g1714081 |
        8. |    0      0      0      0      0      0      0      0      0       0   g1714081 |
        9. |    1      1      1      1      1      1      1      0      1       0   g1714081 |
       10. |    0      0      0      0      0      0      0      0      0       0   g1714081 |
           |---------------------------------------------------------------------------------|
       11. |    4      4      4      4      3      3      4      0      3       4   g1714201 |
       12. |    4      4      4      4      3      3      4      0      3       4   g1714201 |
       13. |    4      4      4      4      3      3      4      0      3       4   g1714201 |
       14. |    4      4      4      4      3      3      4      0      3       4   g1714201 |
       15. |    3      3      3      3      3      3      3      0      2       3   g1714201 |
       16. |    3      3      3      3      3      3      3      0      2       3   g1714201 |
       17. |    4      4      4      4      3      3      4      0      3       4   g1714201 |
       18. |    0      0      0      0      0      0      0      0      0       0   g1714201 |
       19. |    3      3      3      3      2      2      3      0      3       3   g1714201 |
       20. |    4      4      4      4      3      3      4      0      3       4   g1714201 |
           |---------------------------------------------------------------------------------|
       21. |    8      7      8      8      6      4      8      0      4       4   g1714251 |
       22. |    7      7      7      7      5      3      7      0      4       3   g1714251 |
       23. |    8      7      9      9      7      4      9      0      5       4   g1714251 |
       24. |    8      7      9     10      7      4      9      0      5       4   g1714251 |
       25. |    6      5      7      7      7      3      7      0      4       3   g1714251 |
       26. |    4      3      4      4      3      4      4      0      1       3   g1714251 |
       27. |    8      7      9      9      7      4     10      0      5       4   g1714251 |
       28. |    0      0      0      0      0      0      0      0      0       0   g1714251 |
       29. |    4      4      5      5      4      1      5      0      5       1   g1714251 |
       30. |    4      3      4      4      3      3      4      0      1       4   g1714251 |
           |---------------------------------------------------------------------------------|
       31. |    6      5      5      6      3      0      4      1      1       5   g1714351 |
       32. |    5      8      7      6      3      0      5      1      2       4   g1714351 |
       33. |    5      7      7      5      3      0      5      1      1       4   g1714351 |
       34. |    6      6      5      7      3      0      4      1      2       5   g1714351 |
       35. |    3      3      3      3      3      0      3      1      0       3   g1714351 |
       36. |    0      0      0      0      0      0      0      0      0       0   g1714351 |
       37. |    4      5      5      4      3      0      5      1      1       4   g1714351 |
       38. |    1      1      1      1      1      0      1      1      0       1   g1714351 |
       39. |    1      2      1      2      0      0      1      0      2       1   g1714351 |
       40. |    5      4      4      5      3      0      4      1      1       5   g1714351 |
           +---------------------------------------------------------------------------------+
      
      .
      From there, you can move the results to a matrix using:
      Code:
      . mkmat XpX*, matrix(wanted) rownames(gname)
      
      . matrix list wanted
      
      wanted[40,10]
                 XpX1   XpX2   XpX3   XpX4   XpX5   XpX6   XpX7   XpX8   XpX9  XpX10
      g1714081      1      1      1      1      1      1      1      0      1      0
      g1714081      1      1      1      1      1      1      1      0      1      0
      g1714081      1      1      1      1      1      1      1      0      1      0
      g1714081      1      1      1      2      1      1      2      0      1      0
      g1714081      1      1      1      1      1      1      1      0      1      0
      g1714081      1      1      1      1      1      1      1      0      1      0
      g1714081      1      1      1      2      1      1      2      0      1      0
      g1714081      0      0      0      0      0      0      0      0      0      0
      g1714081      1      1      1      1      1      1      1      0      1      0
      g1714081      0      0      0      0      0      0      0      0      0      0
      g1714201      4      4      4      4      3      3      4      0      3      4
      g1714201      4      4      4      4      3      3      4      0      3      4
      g1714201      4      4      4      4      3      3      4      0      3      4
      g1714201      4      4      4      4      3      3      4      0      3      4
      g1714201      3      3      3      3      3      3      3      0      2      3
      g1714201      3      3      3      3      3      3      3      0      2      3
      g1714201      4      4      4      4      3      3      4      0      3      4
      g1714201      0      0      0      0      0      0      0      0      0      0
      g1714201      3      3      3      3      2      2      3      0      3      3
      g1714201      4      4      4      4      3      3      4      0      3      4
      g1714251      8      7      8      8      6      4      8      0      4      4
      g1714251      7      7      7      7      5      3      7      0      4      3
      g1714251      8      7      9      9      7      4      9      0      5      4
      g1714251      8      7      9     10      7      4      9      0      5      4
      g1714251      6      5      7      7      7      3      7      0      4      3
      g1714251      4      3      4      4      3      4      4      0      1      3
      g1714251      8      7      9      9      7      4     10      0      5      4
      g1714251      0      0      0      0      0      0      0      0      0      0
      g1714251      4      4      5      5      4      1      5      0      5      1
      g1714251      4      3      4      4      3      3      4      0      1      4
      g1714351      6      5      5      6      3      0      4      1      1      5
      g1714351      5      8      7      6      3      0      5      1      2      4
      g1714351      5      7      7      5      3      0      5      1      1      4
      g1714351      6      6      5      7      3      0      4      1      2      5
      g1714351      3      3      3      3      3      0      3      1      0      3
      g1714351      0      0      0      0      0      0      0      0      0      0
      g1714351      4      5      5      4      3      0      5      1      1      4
      g1714351      1      1      1      1      1      0      1      1      0      1
      g1714351      1      2      1      2      0      0      1      0      2      1
      g1714351      5      4      4      5      3      0      4      1      1      5
      
      .

      Comment


      • #4
        Thanks, Andrew, the code provided worked well and efficiently. Thanks also, Robert. Because Andrew's code worked, I did not have the opportunity to try yours and I cannot say anything about possible errors in the original code. I can say that the labels were important because I needed to next create a data file from the returned matrix for use in other calculations (i.e. as seen in exchange #8 of the previous post). Again, thanks.

        Comment


        • #5
          Suit yourself but my solution creates the final dataset you seem to want without having to mess with matrix row labeling. I added the last part to convert the data back to a matrix just in case that was the goal.

          Comment

          Working...
          X