Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using a string variable as xlabel for plots

    Dear all,

    I would like to plot the weekly values of my three variables (1) amount of tests performed (2) amount of positive tests (3) positivity rate. So far, my x-axis showed weeks. However, as this project is now running for 6 months, the x-axis is getting a bit cluttered. Note that I want to keep plotting weekly values, but I would like the xlabels to show string values "01-2021", "02-2021" etc. or "Jan-2021", "Feb-2021, (I have not made up my mind yet), so my question to you is how I can use strings in combination with xlabels.

    See below some fake data that resembles my dataset. I use Stata 16.1.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(total_tests total_testspos positivity_rate week_end) str7 my_date
    120 30        .25 22416 "05-2021"
    110 25  .22727273 22423 "05-2021"
     90 13  .14444445 22430 "05-2021"
    155 15   .0967742 22437 "06-2021"
    124 14  .11290322 22444 "06-2021"
    101  0          0 22451 "06-2021"
    113  0          0 22458 "06-2021"
     80  0          0 22465 "07-2021"
    140  1 .007142857 22472 "07-2021"
     98  5  .05102041 22479 "07-2021"
    195  2  .01025641 22486 "07-2021"
    109  5  .04587156 22493 "08-2021"
    111  8 .072072074 22500 "08-2021"
    121 11   .0909091 22507 "08-2021"
    126 15  .11904762 22514 "08-2021"
    end
    format %td week_end
    I already did some digging on Statalist and based on other posts about xlabels, I created the encoded variable my_date_new and for the plot I referred to the label of this variable my_date_new for the xlabel. As you can see, the data is not plotted as expected, but I am not quite sure how to solve this.

    Code:
    encode my_date, gen(my_date_new) label(my_date_new)
    
    levelsof my_date_new, local(graph_dates) clean
    display "`graph_dates'"
    
    twoway     (scatter total_testspos week_end, c(l)  m(i) ytitle("Frequency",color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(longdash)) || ///
            (scatter total_tests week_end, c(l) m(i) ytitle("Frequency", color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(solid)) || ///
            (scatter positivity_rate week_end, c(l) m(i) ytitle("Positivity rate", color(orange) axis(2)) lcolor(orange) mcolor(orange) msize(1.5) lwidth(thick) lpattern(solid) yaxis(2)), ///
            graphregion(color(white)) ///
            xlabel(`graph_dates', valuelabel angle(45))  ///
            ttitle("TEST xx frequencies and positivity rates") ///
            legend(row(3) lab(1 "no. of positive tests XX") lab(2 "no. of XX test done") lab(3 "Test XX positivity rate (%)")size(small))
    Would you be able to point me in the right direction?

    Thank you and best regards,

    Moniek


  • #2
    You do not need value labels here. Just specify the relevant format. For what you ask, consider:

    Code:
     
    di %tdm-CY td(23aug2021)
    di %tdN-CY td(23aug2021)
    Res.:

    Code:
    . di %tdm-CY td(23aug2021)
    Aug-2021
    
    . di %tdN-CY td(23aug2021)
    08-2021

    Comment


    • #3
      Dear Andrew,

      Thank you for your reply!

      Can I still ask you though how this can be combined with my wish to keep plotting weekly values? (this is important as we need to act swiftly based on the positivity rate)

      I would like to keep plotting the variable week_end, but depict month-year as x-labels.

      When I simply change the format of my week_end variable in the xlabel (see snippet below), I get the month June and August plotted twice for some reason.

      Code:
      twoway     (scatter total_testspos week_end, c(l)  m(i) ytitle("Frequency",color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(longdash)) || ///
              (scatter total_tests week_end, c(l) m(i) ytitle("Frequency", color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(solid)) || ///
              (scatter positivity_rate week_end, c(l) m(i) ytitle("Positivity rate", color(orange) axis(2)) lcolor(orange) mcolor(orange) msize(1.5) lwidth(thick) lpattern(solid) yaxis(2)), ///
              graphregion(color(white)) ///
              xlabel(, format("%tdm-Cy") angle(45)) ///
              ttitle("TEST xx frequencies and positivity rates") ///
              legend(row(3) lab(1 "no. of positive tests XX") lab(2 "no. of XX test done") lab(3 "Test XX positivity rate (%)")size(small))
      I therefore created a month-year variable, saved its unique values in a local 'graph_dates' and call this local in xlabel instead. However, it adds the xlabel Jan1962, which causes all data points to appear at the extreme right of the plot, even though Jan1962 is not specified within my `graph_dates' local. This occurs regardless of whether I have added your suggestion about the relevant format.

      Code:
      gen mofd_date = mofd(week_end)
      format %tdm-Cy mofd_date
      
      levelsof mofd_date, local(graph_dates) clean
      display "`graph_dates'"
      
      twoway     (scatter total_testspos week_end, c(l)  m(i) ytitle("Frequency",color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(longdash)) || ///
              (scatter total_tests week_end, c(l) m(i) ytitle("Frequency", color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(solid)) || ///
              (scatter positivity_rate week_end, c(l) m(i) ytitle("Positivity rate", color(orange) axis(2)) lcolor(orange) mcolor(orange) msize(1.5) lwidth(thick) lpattern(solid) yaxis(2)), ///
              graphregion(color(white)) ///
              xlabel(`graph_dates', format("%tdm-Cy") angle(45)) ///
              ttitle("TEST xx frequencies and positivity rates") ///
              legend(row(3) lab(1 "no. of positive tests XX") lab(2 "no. of XX test done") lab(3 "Test XX positivity rate (%)")size(small))
      Thank you for further brainstorming with me!

      Best regards,

      Moniek



      Comment


      • #4
        If the problem is the double displays, you want to override Stata's defaults and specify your own values, e.g., mid-month dates.

        Code:
        twoway     (scatter total_testspos week_end, c(l)  m(i) ytitle("Frequency",color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(longdash)) || ///
                (scatter total_tests week_end, c(l) m(i) ytitle("Frequency", color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(solid)) || ///
                (scatter positivity_rate week_end, c(l) m(i) ytitle("Positivity rate", color(orange) axis(2)) lcolor(orange) mcolor(orange) msize(1.5) lwidth(thick) lpattern(solid) yaxis(2)), ///
                graphregion(color(white)) ///
                xlabel(`=td(15may2021)'(30)`=td(15aug2021)', format("%tdm-Cy") angle(45)) ///
                ttitle("TEST xx frequencies and positivity rates") ///
                legend(row(3) lab(1 "no. of positive tests XX") lab(2 "no. of XX test done") lab(3 "Test XX positivity rate (%)")size(small))

        Does this solve your problem?

        Comment


        • #5
          Dear Andrew,

          I considered this method, however I need to run this code regularly and I don't want to have to add a new `=td(15xxx2021)' on a monthly basis. Was hoping for something automated that I will not have to tweak in the future.

          Any suggestions?

          Thank you and best regards,

          Moniek

          Comment


          • #6
            Something like this should work:

            Code:
            qui sum week_end
            local start= td(15-`=month(r(min))'-`=year(r(min))')
            local end= td(15-`=month(r(max))'-`=year(r(max))')
            
            twoway     (scatter total_testspos week_end, c(l)  m(i) ytitle("Frequency",color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(longdash)) || ///
                    (scatter total_tests week_end, c(l) m(i) ytitle("Frequency", color(ebblue) axis(1)) lcolor(ebblue) mcolor(ebblue) msize(1.5) lwidth(thick) lpattern(solid)) || ///
                    (scatter positivity_rate week_end, c(l) m(i) ytitle("Positivity rate", color(orange) axis(2)) lcolor(orange) mcolor(orange) msize(1.5) lwidth(thick) lpattern(solid) yaxis(2)), ///
                    graphregion(color(white)) ///
                    xlabel(`start'(30) `end', format("%tdm-Cy") angle(45)) ///
                    ttitle("TEST xx frequencies and positivity rates") ///
                    legend(row(3) lab(1 "no. of positive tests XX") lab(2 "no. of XX test done") lab(3 "Test XX positivity rate (%)")size(small))

            So you will be checking what the earliest date is, and taking the mid-month date and the same for the latest date. No manual input will be needed.

            Comment


            • #7
              Dear Andrew,

              Thanks a million! This was exactly what I was trying (and failing) to achieve.

              Many thanks and best wishes,

              Moniek

              Comment

              Working...
              X