Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • heatplot assistance

    Hello dear statalist members,

    I'm working with the heatplot command and I have an issue with the lowest and highest values of the data. For my dataset, these values are 0 and 1; I use 0.05 as a step in the cuts option, but when the value is 0 the plot does not appear the symbol/colour that corresponds to the lowest category. Instead, it shows nothing (as if the value is missing) and it appears the symbols/ colours only for the values > 0.00. Do you have any idea why this happens and how I could fix it? Thank you in advance for your time to consider my issue.

    Best wishes,
    L.

  • #2
    Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    It is not the case that every problem is solved by someone who knows the answer immediately on reading the problem. Often experimentation is needed to confirm their hypothesis. Members are largely willing to invest the effort in that experimentation. To experiment with a problem, the member would first have to be certain they completely understand the problem, and then create an example that exhibits the same behavior. Members are less willing to do that and many prefer to spend their effort on problems that follow more closely the recommendations in the FAQ.

    At a minimum, it would help were you to show the command you ran. Better still, you should create a toy dataset that has the same problem, and share it and the command.

    But do take the time to read the FAQ so you can make better use of Statalist. I'm afraid some of your earlier posts likely went unanswered for the reasons outlined here, and while I will skip over the question as it's now presented, I can at least help by pointing out how you can improve your posts.

    Perhaps however another member will be familiar with this problem and will be able to post an answer without needing to experiment.

    Comment


    • #3
      Hello dear statalist members,

      I'm working with the
      Code:
      heatplot
      command and I have an issue with the lowest and highest values of the data. For my dataset, these values are 0 and 1 and represent prevalence of health conditions. I use 0.05 as a step in the cuts option. So my command is this:
      Code:
       heatplot Prevalence i.Condition Group, xdiscrete(0.9) ///
      yscale(noline rev) ///
      ylabel(, nogrid labsize(*0.5)) ///
      xlabel(`x1'(1)`x2',labsize(*0.6) angle(vertical) nogrid) ///
      color(plasma) ///  
      cuts(0(0.05)1) ///
      keylabels(, range(0.01)) ///
      size  ///
      legend(size(vsmall) subtitle( "Prevelance")) ///
      The command runs and I can see the heat plot but when the value of the prevalence is 0 the plot does not appear the symbol/colour that corresponds to this. Instead, it shows nothing (as if the value is missing) and it appears the symbols/ colours only for the prevalences > 0.00. Do you have any idea why this happens and how I could fix it? Thank you in advance for your time to consider my issue.

      Best wishes,
      L.

      Comment


      • #4
        @William, thank you for your reply. I thought that it was something simple but you're right. Thank you again.

        Comment


        • #5
          heatplot is from SSC, as you are asked to explain (FAQ Advice #12). I would put more emphasis on William's point #2, but it's your call.

          Better still, you should create a toy dataset that has the same problem, and share it and the command.

          Comment


          • #6
            Dear statalist members,

            I updated my question including a toy dataset.
            I'm working with the dataset showing here:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte Group double Prevalence long Condition
            1         0 1
            2  .0115074 1
            1 .01571368 2
            2 .01269292 2
            1         0 3
            2 .00898399 3
            1 .00627889 4
            2 .01236564 4
            1 .00007035 6
            2 .00140824 5
            1 .00471033 5
            2  .0104079 6
            1 .00438781 7
            2 .01618576 7
            end
            label values Condition C
            label def C 1 "bli", modify
            label def C 2 "can", modify
            label def C 3 "dem", modify
            label def C 4 "hrf", modify
            label def C 5 "psm", modify
            label def C 6 "pso", modify
            label def C 7 "scz", modify
            and I run the following command in order to create a heat plot.

            Code:
            heatplot
            I have an issue with the lowest and highest values of the data. For my dataset, these values are 0 and 1 and represent prevalence of health conditions. I use 0.05 as a step in the cuts option. So my command is this:
            Code:
            Code:
            heatplot Prevalence i.Condition Group, xdiscrete(0.9) /// yscale(noline rev) /// ylabel(, nogrid labsize(*0.5)) /// xlabel(`x1'(1)`x2',labsize(*0.6) angle(vertical) nogrid) /// color(plasma) /// cuts(0(0.05)1) /// keylabels(, range(0.01)) /// size /// legend(size(vsmall) subtitle( "Prevelance"))
            The command runs and I can see the heat plot but when the value of the prevalence is 0 the plot does not appear the symbol/colour that corresponds to this. Instead, it shows nothing (as if the value is missing) and it appears the symbols/ colours only for the prevalences > 0.00. Do you have any idea why this happens and how I could fix it? Thank you in advance for your time to consider my issue.

            Best wishes,
            L.

            Comment


            • #7
              Thank you. When I run that I get

              Click image for larger version

Name:	Graph.png
Views:	1
Size:	60.0 KB
ID:	1624975

              which makes it apparent what the problem is.

              The output of help heatplot tells us that the size option causes the size of the color fields (the boxes in your graph) to be scaled by the value of the z variable (Prevalence in your heatplot). I've never seen that option in use before.

              When Prevalence is 0 the box is scaled to a size of 0 and thus nothing is shown.

              Comment


              • #8
                Thanks for the data example. You want to specify how the sizes are to be determined, using the option -srange()-, e.g.,

                Code:
                srange(0.01 1.01)
                The default renders 0 invisible.

                srange(lo [up]) sets the range of relative sizes of the color fields. srange() is only relevant if size() or sizeprop has been specified. Let v, v>=0, be the
                variable to which the field sizes should be proportional (e.g. relative frequencies). The field sizes are then computed as lo + v/max(v) * (up - lo). The
                default is lo=0 and up=1, that is, the smallest possible field has size 0 (invisible) and the largest field has size 1 (full size). Specify, for example,
                srange(0.5) to set the size of the smallest possible field to 0.5 (half of full size).

                Comment


                • #9
                  Heat plots can be very helpful and heatplot is a great command. Neither means that such a plot is a good idea for these data.

                  I have no idea what these toy data are or even if they are fictitious or facetious. But they are what we have as example, and I am going to guess that they are something medical.

                  I guess further that up to 7 significant figures is more precision or detail than anyone can justify or wants to interpret and that the fact that these numbers are all nearer 0 than 1 is standard and that it is the difference between them that is the big deal.

                  I used
                  tabplot from the Stata Journal

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input byte Group double Prevalence long Condition
                  1         0 1
                  2  .0115074 1
                  1 .01571368 2
                  2 .01269292 2
                  1         0 3
                  2 .00898399 3
                  1 .00627889 4
                  2 .01236564 4
                  1 .00007035 6
                  2 .00140824 5
                  1 .00471033 5
                  2  .0104079 6
                  1 .00438781 7
                  2 .01618576 7
                  end
                  label values Condition C
                  label def C 1 "bli", modify
                  label def C 2 "can", modify
                  label def C 3 "dem", modify
                  label def C 4 "hrf", modify
                  label def C 5 "psm", modify
                  label def C 6 "pso", modify
                  label def C 7 "scz", modify
                  
                  set scheme s1color
                  gen toshow = round(Prevalence * 1e4)
                  tabplot Group  Condition [iw=toshow] , showval separate(Group) subtitle(Prevalence per 10000)

                  Click image for larger version

Name:	prevalance.png
Views:	1
Size:	19.6 KB
ID:	1624985



                  The problem with a heat plot is that underlying any graphic: whatever has been encoded by graphic elements has to be decoded by readers. Specifically with a heat plot the encoding is a colour scheme that bins values and can only be understood with mental back and forth between legend and graph..

                  Why encode by colour what can be encoded easily by length or height and also easily decoded? (Heat plots are, conversely, much more useful with far too many categories for bars to be a good idea.)

                  More can be done:

                  The order of columns here is just alphabetical. There is probably another order that makes much more sense.

                  The colouring by group is just aesthetic. Colouring by condition could be more attractive -- or more distracting. Neither is essential.
                  Last edited by Nick Cox; 26 Aug 2021, 08:12.

                  Comment

                  Working...
                  X