Line graph of percent of frequencies within categories instead of bar graph

Abdullah Algarni

Join Date: Jul 2022

Posts: 66
#1

Line graph of percent of frequencies within categories instead of bar graph

22 Jan 2023, 22:23

Dear statalist,

I have a dataset of skin cancer over ten years. I want to plot a line graph (instead of a bar graph) of the percent of frequencies within categories. Unfortunately, I can not find it in stata. Any help would be appreciated.

Thank you,
Abdullah

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int year 2020 2017 2019 2016 2019 2020 2019 2018 2020 2015 end label values year labels5 label def labels5 2015 "2015", modify label def labels5 2016 "2016", modify label def labels5 2017 "2017", modify label def labels5 2018 "2018", modify label def labels5 2019 "2019", modify label def labels5 2020 "2020", modify

Sincerely regards,
Abdullah Algarni
[email protected]
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36058
#2

23 Jan 2023, 02:15

It seems that you want something a bit like

Code:

bysort year : gen count = _N egen tag = tag(year) line count year if tag, sort

The percent in your example looks like the % of all cases in each year. If that is what you want it is

Code:

gen percent = 100 * count / _N
Comment
Abdullah Algarni

Join Date: Jul 2022

Posts: 66
#3

23 Jan 2023, 08:55

Thank you Nick,

This is how the graph is plotted by the command you have provided (red line); Is there a way to make it similar to the blue graph attached here?

Thank you

Sincerely regards,
Abdullah Algarni
[email protected]
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#4

23 Jan 2023, 10:09

The graph you have shown depends on some scheme you have set; I don't know which one. The graph you show on received radiation implies for your own graph

set scheme s2color

twoway conneted count year, xla(2011/2021)
1 like
Comment
Abdullah Algarni

Join Date: Jul 2022

Posts: 66
#5

23 Jan 2023, 15:09

Thank you so much,
It works!

Sincerely regards,
Abdullah Algarni
[email protected]
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#6

28 Jan 2023, 04:25

In #4 conneted should be connected
1 like
Comment

Abdullah Algarni

Join Date: Jul 2022
Posts: 66

28 Jan 2023, 14:57

One more question

:

First, my data structure are:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte year float cefepime_sr
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
1 1
1 1
1 1
1 1
end
label values year Year
label def Year 1 "2013", modify
label values cefepime_sr SR
label def SR 0 "Sensitive", modify
label def SR 1 "Resistant", modify

I want to create a percent variable reflecting a valid percentage (i.e., count only non-missing observations) of bacteria resistant to antibiotic/year (so we have two groups, year and specific antibiotic [e.g., cefepime]).

To do that, I first generate a count variable as follows:

Code:

. bysort year cefepime_sr: gen count_cefepime = _N if cefepime_sr ==1
(550 missing values generated)

. label variable count_cefepime "Count of cefepime-resistant isolates/year"

. fre count_cefepime, format(1)

count_cefepime -- Count of cefepime-resistant isolates/year
-----------------------------------------------------------
              |      Freq.    Percent      Valid       Cum.
--------------+--------------------------------------------
Valid   162   |        162        5.7        7.1        7.1
        184   |        184        6.5        8.1       15.2
        199   |        199        7.0        8.7       23.9
        208   |        208        7.4        9.1       33.1
        229   |        229        8.1       10.1       43.1
        234   |        234        8.3       10.3       53.4
        251   |        251        8.9       11.0       64.4
        265   |        265        9.4       11.6       76.0
        268   |        268        9.5       11.8       87.8
        278   |        278        9.8       12.2      100.0
        Total |       2278       80.6      100.0           
Missing .     |        550       19.4                      
Total         |       2828      100.0                      
-----------------------------------------------------------

After that, I created percent_cefepime variable as follows:

Code:

. bysort year: gen percent_cefepime = 100 * count_cefepime / _N if cefepime_sr ==1
(550 missing values generated)

. label variable percent_cefepime "Percent of cefepime isolates/year"

. fre percent_cefepime, format(1)

percent_cefepime -- Percent of cefepime isolates/year
--------------------------------------------------------------
                 |      Freq.    Percent      Valid       Cum.
-----------------+--------------------------------------------
Valid   48.35821 |        162        5.7        7.1        7.1
        69.96198 |        184        6.5        8.1       15.2
        73.20442 |        265        9.4       11.6       26.8
        85.66553 |        251        8.9       11.0       37.8
        87.2807  |        199        7.0        8.7       46.6
        87.69716 |        278        9.8       12.2       58.8
        88.30189 |        234        8.3       10.3       69.1
        89.03654 |        268        9.5       11.8       80.8
        93.46939 |        229        8.1       10.1       90.9
        94.97717 |        208        7.4        9.1      100.0
        Total    |       2278       80.6      100.0           
Missing .        |        550       19.4                      
Total            |       2828      100.0                      
--------------------------------------------------------------

However, it accounts for missing observations. The first observation in the above table of percentage should be 91.53% not 48.36% as follows:

Code:

. fre cefepime_sr if year==1

cefepime_sr -- S or R to cefepime
-----------------------------------------------------------------
                    |      Freq.    Percent      Valid       Cum.
--------------------+--------------------------------------------
Valid   0 Sensitive |         15       4.48       8.47       8.47
        1 Resistant |        162      48.36      91.53     100.00
        Total       |        177      52.84     100.00           
Missing .           |        158      47.16                      
Total               |        335     100.00                      
-----------------------------------------------------------------

How can I fix that problem??

Thank you
Abdullah

Sincerely regards,
Abdullah Algarni
[email protected]

Comment

Nick Cox

Join Date: Mar 2014
Posts: 36058

28 Jan 2023, 17:57

Code:

_N

is the number of observations in each group, which in your second calculation will always include missing values regardless of any

Code:

if

qualifier.

In this case egen, mean() does exactly what you want as it automatically ignores missings and the non-missing values you are averaging over are just 0 or 1. You just need also a factor of 100. Note that

mean(100 * some_expression)

is legal as an egen function call, but

100 * mean(some_expression)

is not legal. Another way to get your mean is through separate counts of numerator and denominator. If any variable is 0 or 1, the total is necessarily also the count of occurrences of 1.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte year float cefepime_sr
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
1 1
1 1
1 1
1 1
1 . 
2 0 
2 1
2 . 
end
label values year Year
label def Year 1 "2013" 2 "2014", modify
label values cefepime_sr SR
label def SR 0 "Sensitive", modify
label def SR 1 "Resistant", modify

egen wanted1 = mean(100 * cefepime_sr), by(year)

egen numer = total(cefepime_sr == 1), by(year)
egen denom = total(inlist(cefepime_sr, 0, 1)), by(year)
gen wanted2 = 100 * numer / denom 

tabdisp year, c(wanted1 numer denom wanted2)

----------------------------------------------------------
     year |    wanted1       numer       denom     wanted2
----------+-----------------------------------------------
     2013 |         25           5          20          25
     2014 |         50           1           2          50
----------------------------------------------------------

Comment

Abdullah Algarni

Join Date: Jul 2022

Posts: 66
#9

30 Jan 2023, 01:10

Thank you Nick, it works very well

Abdullah

Sincerely regards,
Abdullah Algarni
[email protected]
Comment

Announcement