Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collect layout sorting in a funny way

    Hi,
    I do not understand why -collect layout- seems to sort numeric levels based on the text conversion of the numeric level. This is a problem when there is a level 10 or higher.

    Comparing the two tables, in the second one the order of month has been sorted as though it is text.
    This happens whether or not the levels are labelled.

    How can I avoid this?
    I have not found a way to stop it.

    Code:
    . clear
    
    . set obs 1000
    Number of observations (_N) was 0, now 1,000.
    
    . gen month = runiformint(1,12)
    
    . gen group = runiformint(0,1)
    
    
    . compress
      variable month was float now byte
      variable group was float now byte
      (6,000 bytes saved)
    
    . 
    . table (var), stat(fvfrequency month) 
    
    -------------------------------------
             |  Factor-variable frequency
    ---------+---------------------------
    month=1  |                         70
    month=2  |                         80
    month=3  |                         90
    month=4  |                         71
    month=5  |                         91
    month=6  |                         91
    month=7  |                         82
    month=8  |                         99
    month=9  |                         77
    month=10 |                         89
    month=11 |                         74
    month=12 |                         86
    -------------------------------------
    
    . collect layout (var[month])(result[fvfrequency])
    
    Collection: Table
          Rows: var[month]
       Columns: result[fvfrequency]
       Table 1: 12 x 1
    
    -------------------------------------
             |  Factor-variable frequency
    ---------+---------------------------
    month=1  |                         70
    month=10 |                         89
    month=11 |                         74
    month=12 |                         86
    month=2  |                         80
    month=3  |                         90
    month=4  |                         71
    month=5  |                         91
    month=6  |                         91
    month=7  |                         82
    month=8  |                         99
    month=9  |                         77
    -------------------------------------
    
    . 
    . label define month   ///
    >          1 "Jan" 2 "Feb" 3 "Mar" 4 "Apr" 5 "May" 6 "Jun" ///
    >          7 "Jul" 8 "Aug" 9 "Sep" 10 "Oct" 11 "Nov" 12 "Dec"
    
    . label values month month
    
    . 
    . table (var)(group), stat(fvfrequency month) 
    
    ----------------------------
              |       group     
              |   0    1   Total
    ----------+-----------------
    month=Jan |  30   40      70
    month=Feb |  43   37      80
    month=Mar |  47   43      90
    month=Apr |  33   38      71
    month=May |  48   43      91
    month=Jun |  45   46      91
    month=Jul |  45   37      82
    month=Aug |  51   48      99
    month=Sep |  46   31      77
    month=Oct |  41   48      89
    month=Nov |  37   37      74
    month=Dec |  52   34      86
    ----------------------------
    
    . collect layout (var[month])(group#result[fvfrequency])
    
    Collection: Table
          Rows: var[month]
       Columns: group#result[fvfrequency]
       Table 1: 12 x 3
    
    ----------------------------------------------------------------------------------------------
              |                                        group                                      
              |                          0                           1                       Total
              |  Factor-variable frequency   Factor-variable frequency   Factor-variable frequency
    ----------+-----------------------------------------------------------------------------------
    month=Jan |                         30                          40                          70
    month=Oct |                         41                          48                          89
    month=Nov |                         37                          37                          74
    month=Dec |                         52                          34                          86
    month=Feb |                         43                          37                          80
    month=Mar |                         47                          43                          90
    month=Apr |                         33                          38                          71
    month=May |                         48                          43                          91
    month=Jun |                         45                          46                          91
    month=Jul |                         45                          37                          82
    month=Aug |                         51                          48                          99
    month=Sep |                         46                          31                          77
    ----------------------------------------------------------------------------------------------
    Maybe I am just not thinking straight on Friday afternoon before a long weekend...
    Thanks,
    Scott

  • #2
    Ha ha actually it was Thursday.
    Anyway still can't see why this is happening.

    Comment


    • #3
      Hi Scott,

      You can explicitly define the order like so:

      Code:
      clear
      set obs 1000
      set seed 483726
      gen month = runiformint(1,12)
      gen group = runiformint(0,1)
      compress
      
      table (var), stat(fvfrequency month)
        
      collect layout (var[1.month 2.month 3.month 4.month 5.month 6.month 7.month 8.month 9.month 10.month 11.month 12.month])(result[fvfrequency])
      Code:
      . table (var), stat(fvfrequency month)
      
      -------------------------------------
               |  Factor-variable frequency
      ---------+---------------------------
      month=1  |                         88
      month=2  |                         72
      month=3  |                        100
      month=4  |                         70
      month=5  |                         85
      month=6  |                         82
      month=7  |                         85
      month=8  |                         71
      month=9  |                         90
      month=10 |                         80
      month=11 |                         87
      month=12 |                         90
      -------------------------------------
      Code:
      . collect layout (var[1.month 2.month 3.month 4.month 5.month 6.month 7.month 8.
      > month 9.month 10.month 11.month 12.month])(result[fvfrequency])
      
      Collection: Table
            Rows: var[1.month 2.month 3.month 4.month 5.month 6.month 7.month
                  8.month 9.month 10.month 11.month 12.month]
         Columns: result[fvfrequency]
         Table 1: 12 x 1
      
      -------------------------------------
               |  Factor-variable frequency
      ---------+---------------------------
      month=1  |                         88
      month=2  |                         72
      month=3  |                        100
      month=4  |                         70
      month=5  |                         85
      month=6  |                         82
      month=7  |                         85
      month=8  |                         71
      month=9  |                         90
      month=10 |                         80
      month=11 |                         87
      month=12 |                         90
      -------------------------------------
      Code:
      label define month 1 "Jan" 2 "Feb" 3 "Mar" 4 "Apr" 5 "May" 6 "Jun" ///
                         7 "Jul" 8 "Aug" 9 "Sep" 10 "Oct" 11 "Nov" 12 "Dec"
      label values month month
      table (var)(group), stat(fvfrequency month)
      collect layout (var[1.month 2.month 3.month 4.month 5.month 6.month 7.month 8.month 9.month 10.month 11.month 12.month])(group#result[fvfrequency])
      Code:
      . collect layout (var[1.month 2.month 3.month 4.month 5.month 6.month 7.month 8.
      > month 9.month 10.month 11.month 12.month])(group#result[fvfrequency])
      
      Collection: Table
            Rows: var[1.month 2.month 3.month 4.month 5.month 6.month 7.month
                  8.month 9.month 10.month 11.month 12.month]
         Columns: group#result[fvfrequency]
         Table 1: 12 x 3
      
      ----------------------------------------------------------------------------------------------
                |                                        group                                      
                |                          0                           1                       Total
                |  Factor-variable frequency   Factor-variable frequency   Factor-variable frequency
      ----------+-----------------------------------------------------------------------------------
      month=Jan |                         40                          48                          88
      month=Feb |                         38                          34                          72
      month=Mar |                         59                          41                         100
      month=Apr |                         41                          29                          70
      month=May |                         34                          51                          85
      month=Jun |                         39                          43                          82
      month=Jul |                         49                          36                          85
      month=Aug |                         33                          38                          71
      month=Sep |                         42                          48                          90
      month=Oct |                         37                          43                          80
      month=Nov |                         48                          39                          87
      month=Dec |                         46                          44                          90
      ----------------------------------------------------------------------------------------------

      Comment


      • #4
        Note that you can also do this through the graphical interface with the table builder. Select the rows dropdown at the top of the screen and click edit. Notice in the popup window you can drag to reorder the levels. Also notice this tool allows you to build a rowspec from the dimensions in your collection, and the correct rowspec code is given at the top of the window. Seems like this might be useful for anyone who wants to learn a bit more about how rowspecs work.

        Comment


        • #5
          Thanks Daniel for the reminder, that does work of course. I did know about specifying rows (very useful for example, to leave out one level of a binary variable), but it seems a little tedious in this application.
          In my opinion the most logical order to present rows would be sorted by the underlying numeric code, rather than a text translation of the code. Even sorting by the text of the label would seem more logical than what it is doing.
          Or perhaps it could become an option to allow for sorting by label value (code), or label value (code) as text, or the label text itself.

          Comment


          • #6
            I agree, this behavior does not make much sense. Hopefully, this is patched soon. I'm sorry I couldn't be more help to you.

            Comment


            • #7
              Factor variable elements (levels) of dimensions like var (coleq, colname, roweq, rowname, across) keep track of the order in which the elements are consumed into the collection. Usually they are consumed in the factor variable's numerical order (as they typically are built by Stata's parsing code). This is verified by requesting the layout directly after table via

              Code:
              collect layout
              The use of var[month] in your layout is overriding this ordering. This specification is a shortcut that expands the levels of factor variable month (i.e. i.month) using the levels present in the collection. The string sorting happens automatically; tags are stored as strings in the collection. However, I see how this is annoying and believe that we can fix the factor variable expansion logic in collect layout to use the above tracking order.

              I can make no promises on the timing, but hope to have this fixed in a future Stata 18 update.

              Comment


              • #8
                Stata 18 update on 04oct2023, item 7 (and 8) in help whatsnew.
                7. collect layout now uses the numerical sort order of factor-variable levels in
                dimensions across, coleq, colname, roweq, rowname, and var. This affects layout
                specifications like var[fvar] and var[i.fvar] that rely on the layout logic to
                expand and order the levels of factor-variable fvar within dimension var. The
                old behavior of using the string sort order is not preserved under version
                control.

                8. collect levelsof after table and dtable now uses the numerical sort order of
                dimension levels constructed from numerical variables. The old behavior of using
                the string sort order is not preserved under version control.

                Comment

                Working...
                X