Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A little help with Heatmap and commands related

    Dear statalists, I always read your advice and now I need a specific help, if possible.

    I have a dataset with some variables, but in this case I would like to focus on: the Group variable, that contains 3 group (HC, BF-, BF+) and the gene expression variable, that contains zscore data from my genes. I think Heatmap can show the difference in expression of my genes among groups. In order to do so
    After creating a zscore of my genes data I proceeded with this code

    reshape long z_, I(CODE) j(Gene) string
    encode Gruppo, gen(Gruppo_num)
    encode Gene, gen(Gene_num)
    hmap z_ Gruppo_num Gene_num, color(reds)

    but I have no results because stata can't generate the graph.
    I wonder if there is another way in order to generate a hat map.
    I tried also
    twoway contour long z_ Gruppo_num Gene_num, heatmap

    but the graph is so bad.
    Can you help me in order to try to generate a good heat graph and see if this is for real the best way to show my data if gene expression in my 3 groups?
    Thanks a lot

  • #2
    Hi Elio, welcome to Statalist!

    Please go through the Statalist FAQ, especially #12. And do consider your post from the perspective of the reader.
    • how is the reader supposed to replicate your problem without some example data from you?
    • without knowing the data or the exact error generated, what can the reader learn from "because state can't generate the graph"?
    • without seeing your graph, what can the reader make of a statement like "the graph is so bad"?
    In short, why would you expect any meaningful response to your post?

    Comment


    • #3
      Hemanshu Kumar gives frank but I think utterly fair advice.

      You have 3 groups and z-scores for each gene. That suggests to me something like a profile plot or dot plot. Encoding magnitude by colour intensity is rarely as successful as its proponents seem to hope. But much depends on

      how many genes we're talking about

      whether gene numbering is arbitrary or has scientific meaning (your encode will just yield alphanumeric sorting)

      how far you want to focus on specific genes as compared with some general pattern

      and so on.

      Comment


      • #4

        I will try to be more clear. After I use the
        twoway contour long z_ Gruppo_num Gene_num, heatmap
        Stata gives me this graph




        Click image for larger version

Name:	Schermata 2025-02-13 alle 15.11.06.png
Views:	2
Size:	871.9 KB
ID:	1772659

        However I would like something like this one
        Click image for larger version

Name:	Schermata 2025-02-13 alle 15.14.51.png
Views:	2
Size:	134.1 KB
ID:	1772661

        I have 7 gene expression, 3 groups and score of gene expression, transformed to long from wide, giving this result after transformation

        reshape long z_, i(CODE) j(Gene) string
        (note: j = IFNALPHAmRNA IFNAR1mRNA IFNAR2mRNA IFNBETAmRNA ISG15mRNA ISG56mRNA UNC93B1mRNA)

        Data wide -> long
        -----------------------------------------------------------------------------
        Number of obs. 35 -> 245
        Number of variables 221 -> 216
        j variable (7 values) -> Gene
        xij variables: Gene-expression1, Gene-expression2, Gene-expression3 etc.

        I can't describe the error with hmap command because stata try to generate the graph, but although loading in generation of the chart for half or one hour, just nothing appear and continue to load.
        I tried using just a dot plot for my data, however I don't think it is the right way to show my them, because I would like to show how expression of a gene (a continuous data) vary among groups and try to show clusters and patterns of expression among my 3 groups.


        Attached Files

        Comment


        • #5
          If I understand correctly you have z scores in a 3 x 7 array. I still recommend a profile or dot plot, showing value labels and using scientifically or clinically sensible sort order. To show more I would need a data example: see Hemanshu Kumar's link to FAQ Advice #12 in #2 of this thread.

          Comment


          • #6
            This is the dataex after transformation from wide to long. Thanks for helping


            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double z_ long(Gruppo_num Gene_num)
               3.169179376360953 3 1
             -1.1323372304728825 3 2
             -1.5850098279853462 3 3
                1.88085575817867 3 4
              -.8995726233284671 3 5
               2.501236621992729 3 6
             -1.4895210355230895 3 7
              -.2579666685396823 3 1
            -.003676125639154604 3 2
               .1071359496299042 3 3
              -.5394920987957598 3 4
              .49930006183104886 3 5
             -.05069297294438386 3 6
             -.30294594900929483 3 7
             -.30532666647635326 3 1
              -.4840853336160408 3 2
              .06868309651254333 3 3
             -.33190702481729506 3 4
              -.8570887586490585 3 5
              -.4638801806571499 3 6
             -.24306016555676377 3 7
              -.4929844235699497 3 1
              -.6375519800023571 3 2
               .0839843239661045 3 3
              -.4308939719530998 3 4
              -.6671351854004521 3 5
              -.6713186688150485 3 6
              .06646770679456096 3 7
              .05540131953533644 3 1
              -.5781119896188455 3 2
              -.8720599106904778 3 3
              .08339300127327866 3 4
             -.10187934548964143 3 5
               .1851890273064244 3 6
             -.29759563565159264 3 7
             -.36195860146918696 3 1
              1.7350235478606284 3 2
              1.3447829153801332 3 3
              -.3970036402953043 3 4
              1.7910060540370842 3 5
               -.149453112957819 3 6
              .15451363655565772 3 7
              .09122271814609184 3 1
              2.1152145527587796 3 2
              1.9881577132538903 3 3
              .21247661904900778 3 4
              1.1185952133706716 3 5
                .941929021703666 3 6
              -.4061488391164188 3 7
              -.4132492880224612 3 1
              1.1919539369789702 3 2
               1.542975260411882 3 3
              -.3677839294106579 3 4
              .08766603345804426 3 5
             -.26909647891645266 3 6
             -.33466133865135916 3 7
             .055401319535340526 3 1
              .21705284157965485 3 2
              -.1690588925626785 3 3
            -.060594292005382105 3 4
               .8925742089849635 3 5
             -.19559454705698387 3 6
               -.878805551997333 3 7
               1.777866014318057 3 1
             -.40277271391008707 3 2
             .045929201920346746 3 3
              1.3707307263341746 3 4
              -.9549060567428004 3 5
               .5892234013583248 3 6
             -1.1801818392451102 3 7
              2.0887782563607376 3 1
             -1.4118023066355216 3 2
             -1.9883416711056943 3 3
              4.0664211043180964 3 4
             -1.2336681349596144 3 5
               2.849940108008495 3 6
             -1.7757023312547313 3 7
                 2.2633716054298 3 1
              -1.345302667831052 3 2
             -1.7819065352314225 3 3
              2.1064272493412584 3 4
               .5327016631274175 3 5
              1.8700505663654956 3 6
             -1.6598954715800376 3 7
              -.5174807026609172 1 1
              -.7731591841878861 1 2
              -.4171236288885825 1 3
              -.5023738976065352 1 4
              -.5414897783077025 1 5
              -.8353956030449888 1 6
              -.4210215081169135 1 7
              -.7816801916865697 2 1
              .14844938044566408 2 2
               .4488955669220739 2 3
              -.6866095522577673 2 4
               .5735551612605914 2 5
              -.5010093786807142 2 6
               .6335838774303268 2 7
              -.5483981795539848 2 1
                .082412069980542 2 2
            end
            label values Gruppo_num Gruppo_num
            label def Gruppo_num 1 "BF+", modify
            label def Gruppo_num 2 "BF-", modify
            label def Gruppo_num 3 "HC", modify
            label values Gene_num Gene_num
            label def Gene_num 1 "IFNALPHAmRNA", modify
            label def Gene_num 2 "IFNAR1mRNA", modify
            label def Gene_num 3 "IFNAR2mRNA", modify
            label def Gene_num 4 "IFNBETAmRNA", modify
            label def Gene_num 5 "ISG15mRNA", modify
            label def Gene_num 6 "ISG56mRNA", modify
            label def Gene_num 7 "UNC93B1mRNA", modify

            Comment


            • #7
              I can't describe the error with hmap command because stata try to generate the graph, but although loading in generation of the chart for half or one hour, just nothing appear and continue to load.
              I think you mistake grammar of -hmap-. Probably you write it as hamp z_ Gruppo_num Gene_num, however, this is wrong. The right syntax should be hamp Gruppo_num Gene_num z_. That is to say, you should put the "heat" variable of interest in the last.

              Code:
              hmap Gruppo_num Gene_num z_, noscatter
              Click image for larger version

Name:	Graph.png
Views:	1
Size:	104.2 KB
ID:	1772697

              Comment


              • #8
                PS. If you want to draw dendrogram like plot in #4, you'd better use R instead. As far as I know, there's no Stata package could draw dendrogram in heatplot. https://statisticsglobe.com/heatmap-in-r

                Comment


                • #9
                  Thanks for posting a data example. I confess readily that I know very little about this kind of data or what is scientifically or clinically interesting or important in its analysis. More statistically, I note much variability that the heat map suppresses, guess that the data example isn't complete, and wonder whether there is a more direct way to show the data and what they imply.

                  The principle of a heat map is that people can detect patterns and anomalies by decoding colour differences, but that rarely works well for me.

                  I am far from clear that this particular graph is helpful, let alone ideal, but I'd encourage more open-mindedness about how to show the data. The use of medians is a little arbitrary but more of a default summary when faced with erratic-seeming data.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input double z_ long(Gruppo_num Gene_num)
                     3.169179376360953 3 1
                   -1.1323372304728825 3 2
                   -1.5850098279853462 3 3
                      1.88085575817867 3 4
                    -.8995726233284671 3 5
                     2.501236621992729 3 6
                   -1.4895210355230895 3 7
                    -.2579666685396823 3 1
                  -.003676125639154604 3 2
                     .1071359496299042 3 3
                    -.5394920987957598 3 4
                    .49930006183104886 3 5
                   -.05069297294438386 3 6
                   -.30294594900929483 3 7
                   -.30532666647635326 3 1
                    -.4840853336160408 3 2
                    .06868309651254333 3 3
                   -.33190702481729506 3 4
                    -.8570887586490585 3 5
                    -.4638801806571499 3 6
                   -.24306016555676377 3 7
                    -.4929844235699497 3 1
                    -.6375519800023571 3 2
                     .0839843239661045 3 3
                    -.4308939719530998 3 4
                    -.6671351854004521 3 5
                    -.6713186688150485 3 6
                    .06646770679456096 3 7
                    .05540131953533644 3 1
                    -.5781119896188455 3 2
                    -.8720599106904778 3 3
                    .08339300127327866 3 4
                   -.10187934548964143 3 5
                     .1851890273064244 3 6
                   -.29759563565159264 3 7
                   -.36195860146918696 3 1
                    1.7350235478606284 3 2
                    1.3447829153801332 3 3
                    -.3970036402953043 3 4
                    1.7910060540370842 3 5
                     -.149453112957819 3 6
                    .15451363655565772 3 7
                    .09122271814609184 3 1
                    2.1152145527587796 3 2
                    1.9881577132538903 3 3
                    .21247661904900778 3 4
                    1.1185952133706716 3 5
                      .941929021703666 3 6
                    -.4061488391164188 3 7
                    -.4132492880224612 3 1
                    1.1919539369789702 3 2
                     1.542975260411882 3 3
                    -.3677839294106579 3 4
                    .08766603345804426 3 5
                   -.26909647891645266 3 6
                   -.33466133865135916 3 7
                   .055401319535340526 3 1
                    .21705284157965485 3 2
                    -.1690588925626785 3 3
                  -.060594292005382105 3 4
                     .8925742089849635 3 5
                   -.19559454705698387 3 6
                     -.878805551997333 3 7
                     1.777866014318057 3 1
                   -.40277271391008707 3 2
                   .045929201920346746 3 3
                    1.3707307263341746 3 4
                    -.9549060567428004 3 5
                     .5892234013583248 3 6
                   -1.1801818392451102 3 7
                    2.0887782563607376 3 1
                   -1.4118023066355216 3 2
                   -1.9883416711056943 3 3
                    4.0664211043180964 3 4
                   -1.2336681349596144 3 5
                     2.849940108008495 3 6
                   -1.7757023312547313 3 7
                       2.2633716054298 3 1
                    -1.345302667831052 3 2
                   -1.7819065352314225 3 3
                    2.1064272493412584 3 4
                     .5327016631274175 3 5
                    1.8700505663654956 3 6
                   -1.6598954715800376 3 7
                    -.5174807026609172 1 1
                    -.7731591841878861 1 2
                    -.4171236288885825 1 3
                    -.5023738976065352 1 4
                    -.5414897783077025 1 5
                    -.8353956030449888 1 6
                    -.4210215081169135 1 7
                    -.7816801916865697 2 1
                    .14844938044566408 2 2
                     .4488955669220739 2 3
                    -.6866095522577673 2 4
                     .5735551612605914 2 5
                    -.5010093786807142 2 6
                     .6335838774303268 2 7
                    -.5483981795539848 2 1
                      .082412069980542 2 2
                  end
                  label values Gruppo_num Gruppo_num
                  label def Gruppo_num 1 "BF+", modify
                  label def Gruppo_num 2 "BF-", modify
                  label def Gruppo_num 3 "HC", modify
                  label values Gene_num Gene_num
                  label def Gene_num 1 "IFNALPHAmRNA", modify
                  label def Gene_num 2 "IFNAR1mRNA", modify
                  label def Gene_num 3 "IFNAR2mRNA", modify
                  label def Gene_num 4 "IFNBETAmRNA", modify
                  label def Gene_num 5 "ISG15mRNA", modify
                  label def Gene_num 6 "ISG56mRNA", modify
                  label def Gene_num 7 "UNC93B1mRNA", modify
                  
                  egen median = median(z_), by(Ge Gr)
                  
                  scatter z median Gruppo_num, ms(Oh Dh) msize(*1 *2) by(Gene_num, compact row(1) note("") legend(pos(12))) xsc(r(0.8 3.2)) xla(1/3, glp(solid) tlc(none) valuelabel) ytitle(z score) yla(-2/4) xtitle("") subtitle(, size(small)) legend(order(1 "values" 2 "median")  row(1))

                  Click image for larger version

Name:	genes.png
Views:	1
Size:	75.2 KB
ID:	1772708

                  Comment


                  • #10
                    Dear Nick Cox, how about using -zmap- in this case?
                    Code:
                    sum z_, detail
                    zmap z Gruppo_num Gene_num, breaks(`r(p5)' `r(p10)' `r(p20)' `r(p50)' `r(p70)' `r(p90)' `r(p95)') ysc(on) xsc(on) xlabel(1/7, valuelabel) ytitle("Gruppo_num") ylabel(1/3, valuelabel) msymbol(S ..) mcolor(stc1)
                    Click image for larger version

Name:	Graph.png
Views:	1
Size:	163.5 KB
ID:	1772715

                    Comment


                    • #11
                      Chen Samulsion Thanks for your interest in my zmap command (from SSC; cf. https://www.stata.com/statalist/arch.../msg01084.html)

                      -- but it can only show one point for each (row = group, column = gene) pair.

                      Much of the rationale for my previous answer is that there are repeated observations for many such pairs. (I suspect that the data example in #6 is far from complete and truncated because dataex was used with its default, without use of count().)

                      Comment


                      • #12
                        Thanks for helping,
                        After Chen Samulsion explanation I re-created the code using hmap Gene_num Gruppo_num z_, noscatter
                        Click image for larger version

Name:	Hmap BF.jpg
Views:	1
Size:	35.2 KB
ID:	1772739




                        Although without a legend is not so clear, I think I will e forced to use heatplot of hexplot.

                        Tanks also to Nick Cox because yes, a plot is necessary in order to show clear data, however I was focused on hmap because heatplots are very common in order to show gene expression in subjects group, as an example see this article in nature portfolio Queiroz, M.A.F., Brito, W.R.S., Pereira, K.A.S. et al. Severe COVID-19 and long COVID are associated with high expression of STING, cGAS and IFN-α. Sci Rep 14, 4974 (2024). https://doi.org/10.1038/s41598-024-55696-0
                        And as you can see in the chart some genes expression change in intensity from a group to another in a specific order (es. INFAR2 or UNC93B)

                        do you think that maybe a violin plot could help me showing these data in their continuous form in a better way than just dot plot?
                        However yes, I used dataex without count() but the graph is quite the same when I write your code

                        Click image for larger version

Name:	Zscores graph.jpg
Views:	1
Size:	68.6 KB
ID:	1772740



                        Last edited by Elio Gentilini; 14 Feb 2025, 15:40.

                        Comment


                        • #13
                          I can’t imagine that a violin plot would help with data like yours. As presented in your example the number of values for each gene, group pair varies but is between 1 and about 10, so trying to get a kernel density estimate out of such subsamples appears to be not needed, not practical, or both. But perhaps I misunderstand what you are considering.

                          Comment

                          Working...
                          X