Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem creating a frequency distribution table with just percentages for two variables

    Hi,

    So I have a dataset containing two variables that take integer values, and want to create a table that for each variable splits the frequencies in to <30, 31-35, 36-40 and >40, but rather than just a count I want these frequencies as a percentage of the total number of observations for each variable.
    so the table should look like this:
    var1 var2
    % <30
    % 31-35
    % 36-40
    %>40
    I also want to be able to export this table, ideally in to excel.
    Any help would be very much appreciated! I have created a new variable that takes the value 1 to 4 for each var1 and var2 for each category, but can only seem to get a table like this without %s (and no way of exporting) or with%s but in separate tables.

    Thank you!

  • #2
    There is probably a canned solution for this which I am not aware of. However, you can use collapse + creating a string variable to achieve this.


    Code:
    clear
    input float(var category)
    34 2
    47 4
    42 4
    38 3
    41 4
    32 2
    32 2
    33 2
    29 1
    34 2
    31 2
    44 4
    50 4
    43 4
    32 2
    37 3
    31 2
    47 4
    40 3
    37 3
    end
    
    collapse (percent) var, by(category)
    label define cat 1 "% <30" 2 "% 31-35"3 "% 36-40" 4 "% >40"
    label values category cat
    gen percentage = string(var)+ "%"
    drop var
    export excel using "test", firstrow(variables)

    Click image for larger version

Name:	excel.png
Views:	1
Size:	17.1 KB
ID:	1430827
    Last edited by Andrew Musau; 20 Feb 2018, 14:16.

    Comment


    • #3
      Hi Andrew, thanks for your help!
      However as I have almost 10,000 observations for each of the two variables, I think this would be very time consuming- please correct me if I'm wrong though!

      Comment


      • #4
        However as I have almost 10,000 observations for each of the two variables, I think this would be very time consuming
        10,000 is too small a number to affect the time it takes to create the table. However, you need to classify the observations into your defined conditions whether you choose to create the table manually or automate the process. Here is an illustration using 20,000 observations

        Code:
        set seed 1234
        set obs 20000
        gen var1= runiformint(20,50 )
        gen var2= runiformint(25,45 )
        
        *DEFINE CONDITIONS
        forvalues i=1/2{
        gen cond`i'= cond(var`i'<=30, 1, .)
        }
        
        forvalues i=1/2{
        replace cond`i'= cond(inrange(var`i', 31, 35), 2, cond`i')
        }
        
        forvalues i=1/2{
        replace cond`i'= cond(inrange(var`i', 36, 40), 3, cond`i')
        }
        
        forvalues i=1/2{
        replace cond`i'= cond(var`i'>40, 4, cond`i')
        }
        
        *GENERATE TABLE
        preserve
        collapse (percent) var2, by(cond2)
        rename cond2 condition
        tempfile v2
        save `v2'
        
        restore
        collapse (percent) var1, by(cond1)
        rename cond1 condition
        merge 1:1 condition using `v2'
        drop _merge
        label define cat 1 "% ≤30" 2 "% 31-35"3 "% 36-40" 4 "% >40"
        label values condition cat
        
        forvalues i=1/2{
        gen pcvar`i' = string(var`i')+ "%"
        }
        drop var*
        export excel using "test", firstrow(variables) replace



        Code:
             +------------------------------+
             | condit~n    pcvar1    pcvar2 |
             |------------------------------|
          1. |    % ≤30    35.06%   28.345% |
          2. |  % 31-35    16.64%   23.785% |
          3. |  % 36-40   16.205%   23.695% |
          4. |    % >40   32.095%   24.175% |
             +------------------------------+



        Comment


        • #5
          Hi Andrew,

          Great, thank you for that!

          Comment

          Working...
          X