Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining categorical variables

    I am trying to summarise a categorical variable in stata that has been asked repeatedly in a cohort study. I would like to pool the results for posseting for these three months, i.e. the never category at each of the time points would be added, as would each of the other categories. I’m after a new variable that represents the distribution of posseting in these categories for the three months combined. None of the various egen commands do what I require and whilst I suspect that there may be a straight forward solution I have not been able to deduce what this is so far.

    Many thanks,

    Michael

    . tab tcposset_q4m

    posseted since |
    3m visit | Freq. Percent Cum.
    -----------------+-----------------------------------
    never | 94 7.83 7.83
    monthly or less | 103 8.58 16.42
    weekly | 107 8.92 25.33
    2-4 times a week | 185 15.42 40.75
    5-6 times a week | 109 9.08 49.83
    daily | 230 19.17 69.00
    more than daily | 372 31.00 100.00
    -----------------+-----------------------------------
    Total | 1,200 100.00

    . tab tcposset_q5m

    posseted since |
    3m visit | Freq. Percent Cum.
    -----------------+-----------------------------------
    never | 117 10.07 10.07
    monthly or less | 96 8.26 18.33
    weekly | 134 11.53 29.86
    2-4 times a week | 209 17.99 47.85
    5-6 times a week | 102 8.78 56.63
    daily | 214 18.42 75.04
    more than daily | 290 24.96 100.00
    -----------------+-----------------------------------
    Total | 1,162 100.00

    . tab tcposset_q6m

    posseted since |
    3m visit | Freq. Percent Cum.
    -----------------+-----------------------------------
    never | 174 15.30 15.30
    monthly or less | 160 14.07 29.38
    weekly | 171 15.04 44.42
    2-4 times a week | 194 17.06 61.48
    5-6 times a week | 96 8.44 69.92
    daily | 178 15.66 85.58
    more than daily | 164 14.42 100.00
    -----------------+-----------------------------------
    Total | 1,137 100.00


    I would like to pool the results for posseting for these three months, i.e. the never category at each of the time points would be added, as would each of the other categories. I’m after a new variable that represents the distribution of posseting in these categories for the three months combined. Does this make sense?!

  • #2
    Not to me. Underneath the value labels are presumably numeric values, say 1 to 6. What are you rules for combining 1 to 6? There are in principle 6 cubed = 216 possible joint values, so how are they to be reduced to a composite? If it is just straight addition, which you seem to be implying, then that is

    Code:
     
    gen tcposset =  tcposset_q4m + tcposset_q5m + tcposset_q6m
    
    egen tcposset = rowtotal(tcposset_q?m) 
    


    But it can't be that, as you have explained that egen does not help.

    Comment


    • #3
      Thanks for the prompt response! To clarify the new variable representing posseting during the three months combined which I am trying to create would have the following values:

      Never (which as you has the underlying numeric value 1) = 385 (94+117+174)
      Monthly or less (numeric value 2) = 359 (103+96+160)
      etc etc

      The percentage distribution of this new variable would represent the relative frequency of posseting over the three month period combined.

      I hope that helps.


      Comment


      • #4
        I see: you want a combined contingency table. That isn't anything to do with a new variable that could be consistent with your present data structure. Think of this way: in which observations would those values go?

        tabm (from tab_chi (SSC)) is one way to get that table. Here's a sandbox and a demonstration.

        Code:
        clear 
        set obs 1200 
        set seed 2803 
        
        forval j = 1/3 { 
             gen y`j' = ceil(6 * runiform()) 
        } 
        
        * next line done just once
        ssc inst tab_chi 
        
        
        . tabm y?, transpose  
        
                   |             variable
            values |        y1         y2         y3 |     Total
        -----------+---------------------------------+----------
                 1 |       191        196        202 |       589 
                 2 |       193        193        218 |       604 
                 3 |       202        200        204 |       606 
                 4 |       202        221        176 |       599 
                 5 |       206        201        199 |       606 
                 6 |       206        189        201 |       596 
        -----------+---------------------------------+----------
             Total |     1,200      1,200      1,200 |     3,600
        If you want to do anything else with the results, tabm has an option to save that table as a new dataset.

        Comment


        • #5
          Thanks Nick,

          tabm did exactly what I needed. I then used the tabi command to compare the two study groups. Much appreciated.

          Michael

          Comment

          Working...
          X