Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Line graph of percent of frequencies within categories instead of bar graph

    Dear statalist,

    I have a dataset of skin cancer over ten years. I want to plot a line graph (instead of a bar graph) of the percent of frequencies within categories. Unfortunately, I can not find it in stata. Any help would be appreciated.

    Thank you,
    Abdullah

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year
    2020
    2017
    2019
    2016
    2019
    2020
    2019
    2018
    2020
    2015
    end
    label values year labels5
    label def labels5 2015 "2015", modify
    label def labels5 2016 "2016", modify
    label def labels5 2017 "2017", modify
    label def labels5 2018 "2018", modify
    label def labels5 2019 "2019", modify
    label def labels5 2020 "2020", modify
    Click image for larger version

Name:	Screenshot 2023-01-23 at 08.21.05.png
Views:	1
Size:	1.12 MB
ID:	1698237
    Sincerely regards,
    Abdullah Algarni
    [email protected]

  • #2
    It seems that you want something a bit like


    Code:
    bysort year : gen count = _N 
    egen tag = tag(year) 
    line count year if tag, sort 
    The percent in your example looks like the % of all cases in each year. If that is what you want it is

    Code:
    gen percent = 100 * count / _N

    Comment


    • #3
      Thank you Nick,

      This is how the graph is plotted by the command you have provided (red line); Is there a way to make it similar to the blue graph attached here?
      Click image for larger version

Name:	Screenshot 2023-01-23 at 18.53.51.png
Views:	1
Size:	877.9 KB
ID:	1698320
      Click image for larger version

Name:	image_19806.jpg
Views:	1
Size:	142.5 KB
ID:	1698321


      Thank you
      Sincerely regards,
      Abdullah Algarni
      [email protected]

      Comment


      • #4
        The graph you have shown depends on some scheme you have set; I don't know which one. The graph you show on received radiation implies for your own graph

        set scheme s2color

        twoway conneted count year, xla(2011/2021)

        Comment


        • #5
          Thank you so much,
          It works!
          Sincerely regards,
          Abdullah Algarni
          [email protected]

          Comment


          • #6
            In #4 conneted should be connected

            Comment


            • #7
              One more question :

              First, my data structure are:
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input byte year float cefepime_sr
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 0
              1 1
              1 1
              1 1
              1 1
              1 1
              end
              label values year Year
              label def Year 1 "2013", modify
              label values cefepime_sr SR
              label def SR 0 "Sensitive", modify
              label def SR 1 "Resistant", modify
              I want to create a percent variable reflecting a valid percentage (i.e., count only non-missing observations) of bacteria resistant to antibiotic/year (so we have two groups, year and specific antibiotic [e.g., cefepime]).

              To do that, I first generate a count variable as follows:
              Code:
              . bysort year cefepime_sr: gen count_cefepime = _N if cefepime_sr ==1
              (550 missing values generated)
              
              . label variable count_cefepime "Count of cefepime-resistant isolates/year"
              
              . fre count_cefepime, format(1)
              
              count_cefepime -- Count of cefepime-resistant isolates/year
              -----------------------------------------------------------
                            |      Freq.    Percent      Valid       Cum.
              --------------+--------------------------------------------
              Valid   162   |        162        5.7        7.1        7.1
                      184   |        184        6.5        8.1       15.2
                      199   |        199        7.0        8.7       23.9
                      208   |        208        7.4        9.1       33.1
                      229   |        229        8.1       10.1       43.1
                      234   |        234        8.3       10.3       53.4
                      251   |        251        8.9       11.0       64.4
                      265   |        265        9.4       11.6       76.0
                      268   |        268        9.5       11.8       87.8
                      278   |        278        9.8       12.2      100.0
                      Total |       2278       80.6      100.0           
              Missing .     |        550       19.4                      
              Total         |       2828      100.0                      
              -----------------------------------------------------------
              After that, I created percent_cefepime variable as follows:
              Code:
              . bysort year: gen percent_cefepime = 100 * count_cefepime / _N if cefepime_sr ==1
              (550 missing values generated)
              
              . label variable percent_cefepime "Percent of cefepime isolates/year"
              
              . fre percent_cefepime, format(1)
              
              percent_cefepime -- Percent of cefepime isolates/year
              --------------------------------------------------------------
                               |      Freq.    Percent      Valid       Cum.
              -----------------+--------------------------------------------
              Valid   48.35821 |        162        5.7        7.1        7.1
                      69.96198 |        184        6.5        8.1       15.2
                      73.20442 |        265        9.4       11.6       26.8
                      85.66553 |        251        8.9       11.0       37.8
                      87.2807  |        199        7.0        8.7       46.6
                      87.69716 |        278        9.8       12.2       58.8
                      88.30189 |        234        8.3       10.3       69.1
                      89.03654 |        268        9.5       11.8       80.8
                      93.46939 |        229        8.1       10.1       90.9
                      94.97717 |        208        7.4        9.1      100.0
                      Total    |       2278       80.6      100.0           
              Missing .        |        550       19.4                      
              Total            |       2828      100.0                      
              --------------------------------------------------------------
              However, it accounts for missing observations. The first observation in the above table of percentage should be 91.53% not 48.36% as follows:
              Code:
              . fre cefepime_sr if year==1
              
              cefepime_sr -- S or R to cefepime
              -----------------------------------------------------------------
                                  |      Freq.    Percent      Valid       Cum.
              --------------------+--------------------------------------------
              Valid   0 Sensitive |         15       4.48       8.47       8.47
                      1 Resistant |        162      48.36      91.53     100.00
                      Total       |        177      52.84     100.00           
              Missing .           |        158      47.16                      
              Total               |        335     100.00                      
              -----------------------------------------------------------------
              How can I fix that problem??

              Thank you
              Abdullah
              Sincerely regards,
              Abdullah Algarni
              [email protected]

              Comment


              • #8
                Code:
                _N
                is the number of observations in each group, which in your second calculation will always include missing values regardless of any
                Code:
                 if
                qualifier.

                In this case egen, mean() does exactly what you want as it automatically ignores missings and the non-missing values you are averaging over are just 0 or 1. You just need also a factor of 100. Note that

                mean(100 * some_expression)

                is legal as an egen function call, but

                100 * mean(some_expression)

                is not legal. Another way to get your mean is through separate counts of numerator and denominator. If any variable is 0 or 1, the total is necessarily also the count of occurrences of 1.


                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte year float cefepime_sr
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 0
                1 1
                1 1
                1 1
                1 1
                1 1
                1 . 
                2 0 
                2 1
                2 . 
                end
                label values year Year
                label def Year 1 "2013" 2 "2014", modify
                label values cefepime_sr SR
                label def SR 0 "Sensitive", modify
                label def SR 1 "Resistant", modify
                
                egen wanted1 = mean(100 * cefepime_sr), by(year)
                
                egen numer = total(cefepime_sr == 1), by(year)
                egen denom = total(inlist(cefepime_sr, 0, 1)), by(year)
                gen wanted2 = 100 * numer / denom 
                
                tabdisp year, c(wanted1 numer denom wanted2)
                
                ----------------------------------------------------------
                     year |    wanted1       numer       denom     wanted2
                ----------+-----------------------------------------------
                     2013 |         25           5          20          25
                     2014 |         50           1           2          50
                ----------------------------------------------------------

                Comment


                • #9
                  Thank you Nick, it works very well


                  Abdullah
                  Sincerely regards,
                  Abdullah Algarni
                  [email protected]

                  Comment

                  Working...
                  X