Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -tabplot- display category with zero observations

    I want to create a series of graphs with -tabplot- from Stata Journal. The categories on the x- and y-axis should be fixed in order to recognise the patterns better. But when I select certain observations a number of categories will have no obsverations. At the moment the graph displays bars at categories where there are no observations. I do not know which trick I need to do...

    The example includes a variable (luf_t_m) as subjects, a variable with sources (fwl_i) and a variable with publication types (pubtype) and an observation id-variable (core_id). Overall, I would like to present subjects by sources and to see if the contribution of sources differ by publication type I limit it e.g. to books (pubtype==2).


    Here is the code which is quite lengthy because I manipulate the labels but I wanted it to be as realistic as possible to my own code:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int luf_t_m byte fwl_i int core_id byte pubtype
    290 0 28285 5
    400 0 28647 1
    490 0 28155 1
    490 0 28785 1
    400 0 28816 1
    410 0 28033 1
    360 0 27547 5
    490 0 27724 5
    490 0 28213 1
    490 0 27942 1
    235 0 28623 1
    410 0 28665 1
    490 0 28657 2
    490 0 28443 1
    490 0 28401 1
    490 1  7265 1
    490 1 11203 1
    490 1 10855 1
    410 1  8839 1
    490 1  3950 1
    360 1  4943 1
    370 1  3223 1
    490 1  6075 1
    360 1  3080 1
    400 1  1754 1
    490 1  2647 1
    400 1  2940 1
    540 1  2615 1
    410 1  3139 1
    370 1  1878 1
    765 2  3042 4
    470 2 10726 1
    490 2  3795 1
    470 2 11427 1
    490 2  5144 1
    315 2 12418 1
    490 2  7471 1
    315 2  7865 1
     80 2 12483 1
    370 2  9428 1
    490 2  2295 1
    540 2 10839 1
    400 2  9460 1
    490 2  2707 1
    235 2  2620 1
    235 3 10572 4
    315 3  6129 1
    290 3  4978 3
    765 3  1282 3
    765 3  1944 4
    370 3  4010 1
    490 3  8096 1
    490 3  2127 1
    490 3 12588 1
    765 3  8276 4
    470 3  1364 1
    400 3  8975 1
    400 3  5257 3
    490 3  7081 4
    400 3  5471 1
    490 4 25586 3
    410 4 22196 3
    490 4 17802 3
    490 4 26623 1
    490 4 26847 1
    490 4 21351 2
    490 4 17913 1
    540 4 18126 4
    490 4 26690 2
    490 4 24218 1
    490 4 23466 2
    490 4 25590 3
    490 4 26252 3
    490 4 27232 5
    320 4 21154 4
    490 5 30654 5
    490 5 30313 5
     80 5 31104 1
    490 5 29812 5
    490 5 29674 5
    410 5 31106 1
    490 5 31088 5
    490 5 31991 1
     80 5 30397 4
    490 5 31117 5
    490 5 31744 1
    470 5 30186 5
    490 5 31692 5
    490 5 29984 1
    690 5 30510 5
    470 6 14649 1
    470 6 13276 1
    400 6 15603 1
    765 6 13483 2
    315 6 15609 1
    400 6 14861 1
    490 6 16870 1
    490 6 13492 1
    315 6 29577 1
    490 6 13454 3
    490 6 14154 1
    490 6 16530 3
    490 6 29604 1
    370 6 16879 1
    490 6 16688 3
    490 7 13995 5
    400 7 14817 1
    490 7 13180 1
    490 7 13807 5
    340 7 16231 1
    490 7 14227 1
    470 7 13458 1
    490 7 15130 1
    400 7 14140 1
    490 7 14004 5
    490 7 15097 5
    490 7 15081 5
    490 7 17367 5
    470 7 14717 5
    490 7 14698 1
    end
    label values luf_t_m l_luf_t
    label def l_luf_t 50 "G", modify
    label def l_luf_t 80 "L", modify
    label def l_luf_t 235 "So", modify
    label def l_luf_t 290 "Wi", modify
    label def l_luf_t 315 "Ps", modify
    label def l_luf_t 360 "Ph", modify
    label def l_luf_t 370 "Ch", modify
    label def l_luf_t 400 "Bi", modify
    label def l_luf_t 410 "Ge", modify
    label def l_luf_t 470 "Klt", modify
    label def l_luf_t 490 "Klp", modify
    label def l_luf_t 520 "Za", modify
    label def l_luf_t 540 "Ve", modify
    label def l_luf_t 610 "Ag", modify
    label def l_luf_t 690 "Ma", modify
    label def l_luf_t 765 "In", modify
    label values fwl_i l_fwl_i
    label def l_fwl_i 0 "not in x", modify
    label def l_fwl_i 1 "in x", modify
    label def l_fwl_i 2 "in x1", modify
    label def l_fwl_i 3 "in x2", modify
    label def l_fwl_i 4 "in x3", modify
    label def l_fwl_i 5 "in x4", modify
    label def l_fwl_i 6 "in x5", modify
    label def l_fwl_i 7 "in x6", modify
    label values pubtype l_pubtype
    label def l_pubtype 1 "Article", modify
    label def l_pubtype 2 "Book", modify
    label def l_pubtype 3 "Book Chapter", modify
    label def l_pubtype 4 "Proceedings", modify
    label def l_pubtype 5 "Other", modify
    
    
    ********************* Selection of Publication type
    keep if pubtype==2
    *********************
    
    
    *******************************
    label copy l_fwl_i l_fwl_i_cs
    label copy l_luf_t l_luf_t_cs
    la val fwl_i l_fwl_i_cs
    la val luf_t_m l_luf_t_cs
    
    
    *levelsof fwl_i, local(fwli)
    
    foreach fwl of num 0/7 {
        qui sum core_id if fwl_i == `fwl'
        local ls = `r(N)'
        local s = `"`ls'"'
        local vl: label l_fwl_i `fwl'
        local vlt = `"`vl' (N=`s')"'
        la de l_fwl_i_cs `fwl' `"`vlt'"', modify
        local fwln`fwl' = `"(N=`s')"'
        local fwls`fwl' = `"`vl'"'
    }
    
    levelsof luf_t_m, local(ltm)
    
    foreach lt of local ltm {
        qui sum core_id if luf_t_m == `lt'
        local ls = `r(N)'
        local s = `"`ls'"'
        local vl: label l_luf_t `lt'
        local vlt = `"`vl' (N=`s')"'
        la de l_luf_t_cs `lt' `"`vlt'"', modify
    }
    
    
    qui sum core_id, d
    local total = `r(N)'
    
    tabplot luf_t_m fwl_i , separate(fwl_i)  percent(luf_t_m) showval(,mlabs(vsmall) mlabg(zero) ) xtitle("") ytitle("") subtitle(`"N=`total'"') ///
                xlabel(1 `""`fwls0'" "`fwln0'""'  2 `""`fwls1'" "`fwln1'""' 3 `""`fwls2'" "`fwln2'""' 4 `""`fwls3'" "`fwln3'""' 5 `""`fwls4'" "`fwln4'""' 6 `""`fwls5'" "`fwln5'""' 7 `""`fwls6'" "`fwln6'""' 8 `""`fwls7'" "`fwln7'""', labs(vsmall)) ylabel(, labs(vsmall)) name(tabp_dataex, replace) ///
                xsc(titlegap(*1.1)) height(.6) ///
                note("", span)
    tabp_dataex.png

    Any idea, where I missed something.





  • #2
    This is hard for me to follow, and I'm the program author. My responsibility is perhaps (1) to explain tabplot clearly (2) that it doesn't calculate incorrectly.

    With a minimal


    Code:
    tabplot luf_t_m fwl_i, showval scheme(s1color)
    I get this -- which I think is consistent with what you've shown, given that you're calculating percents. But in general -- and I think you understand this -- tabplot won't show a bar and won't show text where no observation exists. Its viewpoint is nothing exists, so nothing to show. If you want something different, you may need to clone tabplot and rewrite it. Or, what may be easier is use contract, zero to get a consistent dataset and then start from there.

    Click image for larger version

Name:	tabplot_marc.png
Views:	1
Size:	15.1 KB
ID:	1596949

    Comment


    • #3
      Excuse me, if I was not clear enough, but if I present similar graphs (subjects by sources - 18x8) throughout a document that only differs by the selection of observations. I think it is easier for the reader if the categories he/she finds at graph 1 are at the same position in graph 2 even if they are empty.
      I consider your two options. But I also consider option - by(pubtype) - as a possible solution whereby I might loose the N-count in the label. I check the trade offs.

      Comment


      • #4
        Consider also

        Code:
         sysuse auto, clear
        (1978 Automobile Data)
        
        . tabplot rep78, xasis xla(1 2 3 4 5 6 "never occurs") xsc(r(. 6.5))
        and similarly some combination of yasis yla() to insist on what is shown on any axis.

        Comment


        • #5
          Great, never thought -xasis- would have this effect. But it works.
          Now I do have a different problem. With -separate(fwl_i)- I specify different colors to the categories. But if they are missing then color p1 goes to the first category that has an observation. So categories on x-axis get different colors from graph to graph. I would like to avoid that. After a break I will look at the barlook-options if -bar(1, color(p1))- or anything similar will work.

          Comment


          • #6
            I fear that you will find the same problem with the separate() option. It's not geared to recognise what might be in the data but isn't.

            A fudge might be to contract first, and then nudge all zeros to a very small quantity. Then suppress plotting of bars that don't (really) exist. Here's an example. Note: any percents will be very slightly off. Yet another fudge is to calculate percents yourself.


            Code:
            sysuse auto, clear
            contract foreign rep78, zero
            gen double _freq2 = cond(_freq == 0, 1e-9, _freq) 
            tabplot rep78 if foreign [iw=_freq2],  min(1) separate(rep78) bar3(color(red)) bar4(color(blue)) bar5(color(black)) showval(_freq)

            Comment


            • #7
              I have tested your solution but find another obstacle in the use of (row) percentages. When I have an empty row, every category in this row shows 12.5% (or in the example below 20%). So indeed, calculating the percentages myself is another work around.
              See
              Code:
              sysuse auto, clear
              split make, parse(" ")
              encode make1, gen(brand)
              contract brand foreign rep78, zero
              gen double _freq2 = cond(_freq == 0, 1e-9, _freq)
              tabplot brand rep78 if foreign == 0 [iw=_freq2], percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) mlabg(zero) )name(f0, replace) xasis

              Comment


              • #8
                Nick Cox I am not sure if it is worth a new thread but I have difficulties in exporting the -tabplot- graphs well into Word or other programs. I do export the graph from
                Code:
                sysuse auto, clear
                split make, parse(" ")
                encode make1, gen(brand)
                contract brand foreign rep78, zero
                gen double _freq2 = cond(_freq == 0, 1e-9, _freq)
                tabplot brand rep78 if foreign == 0 [iw=_freq2], percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) ) name(f0, replace) xasis
                graph export tabp_f0.png, as(png) name(f0) replace width(4800)
                graph export tabp_f0.emf, as(emf) name(f0) replace
                graph export tabp_f0.svg, as(svg) name(f0) replace
                The marker values partly overlap with the bars (exception is -png- export that has other problems...). And in some cases it looks like the bars do not start at the same height - see this example from a real world example (exported with -png- option)
                Click image for larger version

Name:	tabp_pubt_screen.png
Views:	1
Size:	24.7 KB
ID:	1599204




                Have you seen something like this earlier and do you know a work around?
                Last edited by Marc Kaulisch; 22 Mar 2021, 08:55.

                Comment


                • #9
                  So I enhanced the code above to get a replicable example from my real world example (graph name f1):
                  Code:
                  sysuse auto, clear
                  split make, parse(" ")
                  encode make1, gen(brand)
                  contract brand foreign rep78, zero
                  gen double _freq2 = cond(_freq == 0, 1e-9, _freq)
                  tabplot brand rep78 if foreign == 0 [iw=_freq2], percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) ) name(f0, replace) xasis
                  
                  clonevar fa = _freq
                  replace fa = 1 if brand == 2 & foreign==0 & inlist(rep78,4)
                  replace fa = 5 if brand == 2 & foreign==0 & inlist(rep78,2,3,5)
                  replace fa = 1 if brand == 2 & foreign==1 & rep78==4
                  replace fa = 6 if brand == 3 & foreign==0 & inlist(rep78,2,3,5)
                  replace fa = 1 if brand == 3 & foreign==0 & inlist(rep78,4)
                  replace fa = 8 if brand == 3 & foreign==1 & inlist(rep78,2,3,5)
                  replace fa = 2 if brand == 9 & foreign==0 & inlist(rep78,2,3)
                  replace fa = 10 if brand == 9 & foreign==1 & rep78==2
                  
                  gen double fa2 = cond(fa == 0, 1e-9, fa)
                  
                  tabplot brand rep78 [iw=fa2], by(foreign) percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) ) name(f1, replace) xasis height(0.6)
                  
                  graph export tabp_f1.png, as(png) name(f1) replace width(4800)
                  graph export tabp_f1.emf, as(emf) name(f1) replace
                  graph export tabp_f1.svg, as(svg) name(f1) replace
                  The bar height and visual claity differs depending on the format I use:

                  SVG-Export:
                  Click image for larger version

Name:	f1_audi_svg.png
Views:	1
Size:	12.3 KB
ID:	1599214



                  EMF-Export:
                  Click image for larger version

Name:	f1_audi_emf.png
Views:	2
Size:	11.4 KB
ID:	1599215



                  PNG-Export:
                  Click image for larger version

Name:	f1_audi_png.png
Views:	2
Size:	7.2 KB
ID:	1599218




                  Striking visual findings are: the bar height difference between -svg- and -emf-, the yellow bar in -png- that is not on the same level as the others and the marker label are interfering with the bars in -svg- and -emf- but not in -png-.
                  Attached Files

                  Comment


                  • #10
                    graph export is part of official Stata and what's disappointing here.

                    All I can report is that

                    * I routinely post graphs here and also on Cross Validated using .png and they always look fine to me.

                    * I've worked often one-to-one with students and colleagues and they have been happy with the results of transferring into Word, usually by copy and paste under Windows. That's more than grateful tact because (at least historically) I am usually sitting at the same computer and can see for myself that it looks fine.in their Word document.

                    This seems to tally with your report: To my eyes the png looks better.

                    But to back up: the issue here is showing text and the size of the text is sensitive not only to option defaults but also to other choices and implications of those choices, including how many rows and columns there are in the display. There is an offset() suboption within showval() to move text up or down (or left or right as the case may be) as well as more orthodox handles such as mlabsize().

                    Comment


                    • #11
                      I forgot the offset-option. It works like a charm.

                      It really looks like to be a graph export issue. I have inspected the various export options (svg, emf, png) and it appears that the bar height is exported differently between svg/png and emf. In my real-world-example, in svg/png the bar height of 1.2 and 3.1 are visually the same where as in emf they look proportionally represented.

                      Comment


                      • #12
                        I have inspected a svg-file Stata produces with graph export svg. I find that the svg contains a second rectangle that has a different property than the first rectangle and it seems that this second rectangle disturbs the visual representation of the graph:
                        Value 1.2:
                        <text x="2065.43" y="686.70" style="font-family:'Myriad Pro';font-size:97.45px;fill:#000000" text-anchor="middle">1.2</text>
                        Coreesponding rectangles:
                        <rect x="1984.64" y="601.21" width="161.74" height="3.20" style="fill:#FF4A2F"/>
                        <rect x="1989.32" y="599.73" width="152.39" height="6.16" style="fill:none;stroke:#FF4A2F;stroke-width:9.35"/>
                        Value 3.2:
                        <text x="2065.43" y="927.22" style="font-family:'Myriad Pro';font-size:97.45px;fill:#000000" text-anchor="middle">3.2</text>
                        Corresponding rectangles:
                        <rect x="1984.64" y="836.68" width="161.74" height="8.25" style="fill:#FF4A2F"/>
                        <rect x="1989.32" y="840.25" width="152.39" height="1.11" style="fill:none;stroke:#FF4A2F;stroke-width:9.35"/>

                        Value 3.1:
                        <text x="2065.43" y="1167.74" style="font-family:'Myriad Pro';font-size:97.45px;fill:#000000" text-anchor="middle">3.1</text>
                        Corresponding rectangles:
                        <rect x="1984.64" y="1077.37" width="161.74" height="8.08" style="fill:#FF4A2F"/>
                        <rect x="1989.32" y="1080.77" width="152.39" height="1.27" style="fill:none;stroke:#FF4A2F;stroke-width:9.35"/>
                        So I am not sure what is going on here. But the second rectangle does not reflect the coressponding values and it seems that these are used to visual represent in png, svg and graph editor but not in emf which uses a different rendering engine.

                        Comment


                        • #13
                          #12 Interesting. I would take that to StataCorp Technical Services.

                          Comment


                          • #14
                            I have contacted StataCorp Technical Service. I've looked deeper in it and it looks like the second rectangle is the "outline width" (bar properties).
                            If I remove the outline width and I get
                            <rect x="1984.64" y="601.21" width="161.74" height="3.20" style="fill:#FF4A2F"/>
                            <rect x="1984.64" y="601.21" width="161.74" height="3.20" style="fill:none;stroke:#FF4A2F;stroke-width:0.00"/>
                            So it seems the value of the bar outline is mis-calculated when outline width is defined.

                            I wonder whether this is a result of my graph scheme or not. Which says: linewidth pbar thin
                            Edit: Further testing: Okay setting -linewidth pbar none- in my scheme it works good with svg.
                            Last edited by Marc Kaulisch; 24 Mar 2021, 03:34.

                            Comment


                            • #15
                              With further testing I see that "stroke-width:" is set gloablly and in one instance it is set to 5.00 and so if the rectangle is smaller than 5.00 it shows as if it is 5.00 because the stroke is larger than the bar height itself.
                              Edit: Re-tested it Manipulating linewidth p1bar p2bar does change stroke-width in svg...
                              Last edited by Marc Kaulisch; 24 Mar 2021, 04:48.

                              Comment

                              Working...
                              X