Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues with twoway contour , heatmap

    Producing heatmaps in Stata 14.2 I ran into two problems with . Consider the following MWE:

    Code:
    clear
    set obs 16
    gen x=mod(_n-1,4)+1
    gen y=floor((_n-1)/4)+1
    gen z=mod(x+y,2)
    local ccuts ccuts(.1 .2 .3 .4 .5 .6 .7 .8 .9)
    local ccolor ccolor(black blue green dknavy eggshell ltblue pink emerald cyan white )
    twoway contour z y x , `ccuts' `ccolor' heatmap
    Problem 1: This seems to be just a programming bug. The code produces the error message :
    Code:
    cstyles[10].color.setstyle, style(white): class type not found
    .
    There is a simple fix: run the do-file without defining the local macro ccolor once (* local ccolor ... on line 7). After that one can go back to the initial version and the do-file works as it should.


    Problem 2: The figure below shows that the heatmap produces unequally sized areas for the data points. I do not see any reason why the data point (1,1) should be represented with only a quarter of the area compared to data point (2,2). I suggest that the default behavior should be to produce equally sized areas (with the graph starting from 0.5, up to 4.5.

    Click image for larger version

Name:	Graph.png
Views:	1
Size:	14.9 KB
ID:	1391576



  • #2
    Problem 2: The figure below shows that the heatmap produces unequally sized areas for the data points. I do not see any reason why the data point (1,1) should be represented with only a quarter of the area compared to data point (2,2). I suggest that the default behavior should be to produce equally sized areas (with the graph starting from 0.5, up to 4.5.
    Your example data is not well suited for a contour plot. With 16 data points and the z variable varying over 2 levels, you might just as well present the data in a table or plot a side by side scatter plot.

    Code:
    scatter y x, by(z)
    It should be obvious that your concerns disappear if you have more datapoints that are spread across the x-y plane (preferably continuous data). Then do you get efficiency in visualizing data using a contour plot.

    Comment


    • #3
      The example does not represent my statistical problem, but was construed as a MWE to highlight the problem. The real problem is a 11x11 coordinate system, where x and y take up only integer values (while z is continuous). For this kind of application it is not very elegant that the dots at the border have smaller areas than the others.

      Comment


      • #4
        An alternative graph for the kind of data you describe would be a scatterplot with weighted markers: where the size of the markers represent levels of your z variable.

        http://www.stata.com/support/faqs/gr...ghted-markers/


        You may also want to take a look at the user written program -sepscatter- from SSC

        http://www.statalist.org/forums/foru...lable-from-ssc


        The problem you raise is valid but highlights a problem of too few observations.

        Comment


        • #5
          I faced the same problem (no.2) few days ago. My temporary solution was to let both x and y range from 0 to 5 (instead of 1 to 4) by adding some fake data points. However, this solution results in wider margins because of the fake points 0 and 5. Therefore, I'd appreciate it if someone could provide a more elegant solution.

          Moreover, I would also like to ask how I can customise the axis labels of clegend. For instant, when I present the levels of the variable z by 50 different colours, clegend shows all 50 levels in its axis, and thus the 50 labels squeeze together. Is there any way to control the axis labels of clegend? More concrete, in Christian's example, how can I only label of levels of 0.1, 0.5, and 0.9 in clegend?

          Thanks.

          Comment


          • #6
            Moreover, I would also like to ask how I can customise the axis labels of clegend. For instant, when I present the levels of the variable z by 50 different colours, clegend shows all 50 levels in its axis, and thus the 50 labels squeeze together. Is there any way to control the axis labels of clegend? More concrete, in Christian's example, how can I only label of levels of 0.1, 0.5, and 0.9 in clegend?
            Code:
            zlabel(0.1 0.5 0.9)
            Regarding your first question, you are responsible for any manipulation of your data.
            Last edited by Andrew Musau; 10 May 2017, 05:51.

            Comment


            • #7
              Christian and Chi-lin: Regarding Problem 2, and for what it's worth, I've also needed to do clunky workarounds when creating heatmap graphs because of the way the top and bottom values of x and y are treated. Maybe there is an elegant solution, but I've not found it. Perhaps post this as a Stata 15 Wishlist topic?

              Comment


              • #8
                I've encountered the same issue creating heatmaps from correlation matrices where the x and y values were integers that defined an x,y coordinate pair for the variables and the z axis contained the correlation coefficients.

                Comment


                • #9
                  I'm going to take the liberty of placing a link to this thread in the "Stata15 Wishlist". I'm hardly competent to know if this would be something easy to remedy with some sort of option like
                  Code:
                  twoway contour …, discrete
                  , but my gut instinct would be that it couldn't be that hard.

                  Comment


                  • #10
                    I wonder if anyone found a solution for problem 2?

                    Comment


                    • #11
                      Would heatplot offer an alternative solution:
                      Code:
                      package heatplot from http://fmwww.bc.edu/repec/bocode/h
                      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      
                      TITLE
                            'HEATPLOT': module to create heat plots and hexagon plots
                      
                      DESCRIPTION/AUTHOR(S)
                            
                             heatplot creates heat plots from variables or matrices. One
                            example of a heat plot is a two-dimensional histogram in which
                            the frequencies of combinations of binned Y and X are displayed
                            as rectangular (or hexagonal) fields using a color gradient.
                            Another example is a plot of a trivariate distribution where the
                            color gradient is used to visualize the (average) value of Z
                            within bins of Y and X. Yet another example is a plot that
                            displays the contents of a matrix, say, a correlation matrix or a
                            spacial weights matrix, using a color gradient.

                      Comment


                      • #12
                        Thank you Martyn! Problem solved
                        Now I am figuring how how to play with colour schemes.
                        BW

                        Comment

                        Working...
                        X