-tabplot- display category with zero observations

Marc Kaulisch

Join Date: Jan 2016
Posts: 184

-tabplot- display category with zero observations

09 Mar 2021, 08:09

I want to create a series of graphs with -tabplot- from Stata Journal. The categories on the x- and y-axis should be fixed in order to recognise the patterns better. But when I select certain observations a number of categories will have no obsverations. At the moment the graph displays bars at categories where there are no observations. I do not know which trick I need to do...

The example includes a variable (luf_t_m) as subjects, a variable with sources (fwl_i) and a variable with publication types (pubtype) and an observation id-variable (core_id). Overall, I would like to present subjects by sources and to see if the contribution of sources differ by publication type I limit it e.g. to books (pubtype==2).

Here is the code which is quite lengthy because I manipulate the labels but I wanted it to be as realistic as possible to my own code:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int luf_t_m byte fwl_i int core_id byte pubtype
290 0 28285 5
400 0 28647 1
490 0 28155 1
490 0 28785 1
400 0 28816 1
410 0 28033 1
360 0 27547 5
490 0 27724 5
490 0 28213 1
490 0 27942 1
235 0 28623 1
410 0 28665 1
490 0 28657 2
490 0 28443 1
490 0 28401 1
490 1  7265 1
490 1 11203 1
490 1 10855 1
410 1  8839 1
490 1  3950 1
360 1  4943 1
370 1  3223 1
490 1  6075 1
360 1  3080 1
400 1  1754 1
490 1  2647 1
400 1  2940 1
540 1  2615 1
410 1  3139 1
370 1  1878 1
765 2  3042 4
470 2 10726 1
490 2  3795 1
470 2 11427 1
490 2  5144 1
315 2 12418 1
490 2  7471 1
315 2  7865 1
 80 2 12483 1
370 2  9428 1
490 2  2295 1
540 2 10839 1
400 2  9460 1
490 2  2707 1
235 2  2620 1
235 3 10572 4
315 3  6129 1
290 3  4978 3
765 3  1282 3
765 3  1944 4
370 3  4010 1
490 3  8096 1
490 3  2127 1
490 3 12588 1
765 3  8276 4
470 3  1364 1
400 3  8975 1
400 3  5257 3
490 3  7081 4
400 3  5471 1
490 4 25586 3
410 4 22196 3
490 4 17802 3
490 4 26623 1
490 4 26847 1
490 4 21351 2
490 4 17913 1
540 4 18126 4
490 4 26690 2
490 4 24218 1
490 4 23466 2
490 4 25590 3
490 4 26252 3
490 4 27232 5
320 4 21154 4
490 5 30654 5
490 5 30313 5
 80 5 31104 1
490 5 29812 5
490 5 29674 5
410 5 31106 1
490 5 31088 5
490 5 31991 1
 80 5 30397 4
490 5 31117 5
490 5 31744 1
470 5 30186 5
490 5 31692 5
490 5 29984 1
690 5 30510 5
470 6 14649 1
470 6 13276 1
400 6 15603 1
765 6 13483 2
315 6 15609 1
400 6 14861 1
490 6 16870 1
490 6 13492 1
315 6 29577 1
490 6 13454 3
490 6 14154 1
490 6 16530 3
490 6 29604 1
370 6 16879 1
490 6 16688 3
490 7 13995 5
400 7 14817 1
490 7 13180 1
490 7 13807 5
340 7 16231 1
490 7 14227 1
470 7 13458 1
490 7 15130 1
400 7 14140 1
490 7 14004 5
490 7 15097 5
490 7 15081 5
490 7 17367 5
470 7 14717 5
490 7 14698 1
end
label values luf_t_m l_luf_t
label def l_luf_t 50 "G", modify
label def l_luf_t 80 "L", modify
label def l_luf_t 235 "So", modify
label def l_luf_t 290 "Wi", modify
label def l_luf_t 315 "Ps", modify
label def l_luf_t 360 "Ph", modify
label def l_luf_t 370 "Ch", modify
label def l_luf_t 400 "Bi", modify
label def l_luf_t 410 "Ge", modify
label def l_luf_t 470 "Klt", modify
label def l_luf_t 490 "Klp", modify
label def l_luf_t 520 "Za", modify
label def l_luf_t 540 "Ve", modify
label def l_luf_t 610 "Ag", modify
label def l_luf_t 690 "Ma", modify
label def l_luf_t 765 "In", modify
label values fwl_i l_fwl_i
label def l_fwl_i 0 "not in x", modify
label def l_fwl_i 1 "in x", modify
label def l_fwl_i 2 "in x1", modify
label def l_fwl_i 3 "in x2", modify
label def l_fwl_i 4 "in x3", modify
label def l_fwl_i 5 "in x4", modify
label def l_fwl_i 6 "in x5", modify
label def l_fwl_i 7 "in x6", modify
label values pubtype l_pubtype
label def l_pubtype 1 "Article", modify
label def l_pubtype 2 "Book", modify
label def l_pubtype 3 "Book Chapter", modify
label def l_pubtype 4 "Proceedings", modify
label def l_pubtype 5 "Other", modify


********************* Selection of Publication type
keep if pubtype==2
*********************


*******************************
label copy l_fwl_i l_fwl_i_cs
label copy l_luf_t l_luf_t_cs
la val fwl_i l_fwl_i_cs
la val luf_t_m l_luf_t_cs


*levelsof fwl_i, local(fwli)

foreach fwl of num 0/7 {
    qui sum core_id if fwl_i == `fwl'
    local ls = `r(N)'
    local s = `"`ls'"'
    local vl: label l_fwl_i `fwl'
    local vlt = `"`vl' (N=`s')"'
    la de l_fwl_i_cs `fwl' `"`vlt'"', modify
    local fwln`fwl' = `"(N=`s')"'
    local fwls`fwl' = `"`vl'"'
}

levelsof luf_t_m, local(ltm)

foreach lt of local ltm {
    qui sum core_id if luf_t_m == `lt'
    local ls = `r(N)'
    local s = `"`ls'"'
    local vl: label l_luf_t `lt'
    local vlt = `"`vl' (N=`s')"'
    la de l_luf_t_cs `lt' `"`vlt'"', modify
}


qui sum core_id, d
local total = `r(N)'

tabplot luf_t_m fwl_i , separate(fwl_i)  percent(luf_t_m) showval(,mlabs(vsmall) mlabg(zero) ) xtitle("") ytitle("") subtitle(`"N=`total'"') ///
            xlabel(1 `""`fwls0'" "`fwln0'""'  2 `""`fwls1'" "`fwln1'""' 3 `""`fwls2'" "`fwln2'""' 4 `""`fwls3'" "`fwln3'""' 5 `""`fwls4'" "`fwln4'""' 6 `""`fwls5'" "`fwln5'""' 7 `""`fwls6'" "`fwln6'""' 8 `""`fwls7'" "`fwln7'""', labs(vsmall)) ylabel(, labs(vsmall)) name(tabp_dataex, replace) ///
            xsc(titlegap(*1.1)) height(.6) ///
            note("", span)

Any idea, where I missed something.

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

09 Mar 2021, 09:41

This is hard for me to follow, and I'm the program author. My responsibility is perhaps (1) to explain tabplot clearly (2) that it doesn't calculate incorrectly.

With a minimal

Code:

tabplot luf_t_m fwl_i, showval scheme(s1color)

I get this -- which I think is consistent with what you've shown, given that you're calculating percents. But in general -- and I think you understand this -- tabplot won't show a bar and won't show text where no observation exists. Its viewpoint is nothing exists, so nothing to show. If you want something different, you may need to clone tabplot and rewrite it. Or, what may be easier is use contract, zero to get a consistent dataset and then start from there.
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#3

10 Mar 2021, 01:44

Excuse me, if I was not clear enough, but if I present similar graphs (subjects by sources - 18x8) throughout a document that only differs by the selection of observations. I think it is easier for the reader if the categories he/she finds at graph 1 are at the same position in graph 2 even if they are empty.
I consider your two options. But I also consider option - by(pubtype) - as a possible solution whereby I might loose the N-count in the label. I check the trade offs.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

10 Mar 2021, 01:55

Consider also

Code:

sysuse auto, clear (1978 Automobile Data) . tabplot rep78, xasis xla(1 2 3 4 5 6 "never occurs") xsc(r(. 6.5))

and similarly some combination of yasis yla() to insist on what is shown on any axis.
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#5

10 Mar 2021, 03:52

Great, never thought -xasis- would have this effect. But it works.
Now I do have a different problem. With -separate(fwl_i)- I specify different colors to the categories. But if they are missing then color p1 goes to the first category that has an observation. So categories on x-axis get different colors from graph to graph. I would like to avoid that. After a break I will look at the barlook-options if -bar(1, color(p1))- or anything similar will work.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

10 Mar 2021, 04:16

I fear that you will find the same problem with the separate() option. It's not geared to recognise what might be in the data but isn't.

A fudge might be to contract first, and then nudge all zeros to a very small quantity. Then suppress plotting of bars that don't (really) exist. Here's an example. Note: any percents will be very slightly off. Yet another fudge is to calculate percents yourself.

Code:

sysuse auto, clear contract foreign rep78, zero gen double _freq2 = cond(_freq == 0, 1e-9, _freq) tabplot rep78 if foreign [iw=_freq2], min(1) separate(rep78) bar3(color(red)) bar4(color(blue)) bar5(color(black)) showval(_freq)
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#7

10 Mar 2021, 07:13

I have tested your solution but find another obstacle in the use of (row) percentages. When I have an empty row, every category in this row shows 12.5% (or in the example below 20%). So indeed, calculating the percentages myself is another work around.
See

Code:

sysuse auto, clear split make, parse(" ") encode make1, gen(brand) contract brand foreign rep78, zero gen double _freq2 = cond(_freq == 0, 1e-9, _freq) tabplot brand rep78 if foreign == 0 [iw=_freq2], percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) mlabg(zero) )name(f0, replace) xasis
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#8

22 Mar 2021, 08:52

Nick Cox I am not sure if it is worth a new thread but I have difficulties in exporting the -tabplot- graphs well into Word or other programs. I do export the graph from

Code:

sysuse auto, clear split make, parse(" ") encode make1, gen(brand) contract brand foreign rep78, zero gen double _freq2 = cond(_freq == 0, 1e-9, _freq) tabplot brand rep78 if foreign == 0 [iw=_freq2], percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) ) name(f0, replace) xasis graph export tabp_f0.png, as(png) name(f0) replace width(4800) graph export tabp_f0.emf, as(emf) name(f0) replace graph export tabp_f0.svg, as(svg) name(f0) replace

The marker values partly overlap with the bars (exception is -png- export that has other problems...). And in some cases it looks like the bars do not start at the same height - see this example from a real world example (exported with -png- option)

Have you seen something like this earlier and do you know a work around?

Last edited by Marc Kaulisch; 22 Mar 2021, 08:55.
Comment

Marc Kaulisch

Join Date: Jan 2016
Posts: 184

22 Mar 2021, 09:47

So I enhanced the code above to get a replicable example from my real world example (graph name f1):

Code:

sysuse auto, clear
split make, parse(" ")
encode make1, gen(brand)
contract brand foreign rep78, zero
gen double _freq2 = cond(_freq == 0, 1e-9, _freq)
tabplot brand rep78 if foreign == 0 [iw=_freq2], percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) ) name(f0, replace) xasis

clonevar fa = _freq
replace fa = 1 if brand == 2 &amp; foreign==0 &amp; inlist(rep78,4)
replace fa = 5 if brand == 2 &amp; foreign==0 &amp; inlist(rep78,2,3,5)
replace fa = 1 if brand == 2 &amp; foreign==1 &amp; rep78==4
replace fa = 6 if brand == 3 &amp; foreign==0 &amp; inlist(rep78,2,3,5)
replace fa = 1 if brand == 3 &amp; foreign==0 &amp; inlist(rep78,4)
replace fa = 8 if brand == 3 &amp; foreign==1 &amp; inlist(rep78,2,3,5)
replace fa = 2 if brand == 9 &amp; foreign==0 &amp; inlist(rep78,2,3)
replace fa = 10 if brand == 9 &amp; foreign==1 &amp; rep78==2

gen double fa2 = cond(fa == 0, 1e-9, fa)

tabplot brand rep78 [iw=fa2], by(foreign) percent(brand) min(1) separate(rep78) showval(,mlabs(vsmall) ) name(f1, replace) xasis height(0.6)

graph export tabp_f1.png, as(png) name(f1) replace width(4800)
graph export tabp_f1.emf, as(emf) name(f1) replace
graph export tabp_f1.svg, as(svg) name(f1) replace

The bar height and visual claity differs depending on the format I use:

SVG-Export:

Click image for larger version

Name: f1_audi_svg.png
Views: 1
Size: 12.3 KB
ID: 1599214

EMF-Export:

Click image for larger version

Name: f1_audi_emf.png
Views: 2
Size: 11.4 KB
ID: 1599215

PNG-Export:

Click image for larger version

Name: f1_audi_png.png
Views: 2
Size: 7.2 KB
ID: 1599218

Striking visual findings are: the bar height difference between -svg- and -emf-, the yellow bar in -png- that is not on the same level as the others and the marker label are interfering with the bars in -svg- and -emf- but not in -png-.

Attached Files

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35698
#10

22 Mar 2021, 10:24

graph export is part of official Stata and what's disappointing here.

All I can report is that

* I routinely post graphs here and also on Cross Validated using .png and they always look fine to me.

* I've worked often one-to-one with students and colleagues and they have been happy with the results of transferring into Word, usually by copy and paste under Windows. That's more than grateful tact because (at least historically) I am usually sitting at the same computer and can see for myself that it looks fine.in their Word document.

This seems to tally with your report: To my eyes the png looks better.

But to back up: the issue here is showing text and the size of the text is sensitive not only to option defaults but also to other choices and implications of those choices, including how many rows and columns there are in the display. There is an offset() suboption within showval() to move text up or down (or left or right as the case may be) as well as more orthodox handles such as mlabsize().
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#11

23 Mar 2021, 01:52

I forgot the offset-option. It works like a charm.

It really looks like to be a graph export issue. I have inspected the various export options (svg, emf, png) and it appears that the bar height is exported differently between svg/png and emf. In my real-world-example, in svg/png the bar height of 1.2 and 3.1 are visually the same where as in emf they look proportionally represented.
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#12

24 Mar 2021, 02:21

I have inspected a svg-file Stata produces with graph export svg. I find that the svg contains a second rectangle that has a different property than the first rectangle and it seems that this second rectangle disturbs the visual representation of the graph:

Value 1.2:
<text x="2065.43" y="686.70" style="font-family:'Myriad Pro';font-size:97.45px;fill:#000000" text-anchor="middle">1.2</text>
Coreesponding rectangles:
<rect x="1984.64" y="601.21" width="161.74" height="3.20" style="fill:#FF4A2F"/>
<rect x="1989.32" y="599.73" width="152.39" height="6.16" style="fill:none;stroke:#FF4A2F;stroke-width:9.35"/>
Value 3.2:
<text x="2065.43" y="927.22" style="font-family:'Myriad Pro';font-size:97.45px;fill:#000000" text-anchor="middle">3.2</text>
Corresponding rectangles:
<rect x="1984.64" y="836.68" width="161.74" height="8.25" style="fill:#FF4A2F"/>
<rect x="1989.32" y="840.25" width="152.39" height="1.11" style="fill:none;stroke:#FF4A2F;stroke-width:9.35"/>

Value 3.1:
<text x="2065.43" y="1167.74" style="font-family:'Myriad Pro';font-size:97.45px;fill:#000000" text-anchor="middle">3.1</text>
Corresponding rectangles:
<rect x="1984.64" y="1077.37" width="161.74" height="8.08" style="fill:#FF4A2F"/>
<rect x="1989.32" y="1080.77" width="152.39" height="1.27" style="fill:none;stroke:#FF4A2F;stroke-width:9.35"/>

So I am not sure what is going on here. But the second rectangle does not reflect the coressponding values and it seems that these are used to visual represent in png, svg and graph editor but not in emf which uses a different rendering engine.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#13

24 Mar 2021, 02:57

#12 Interesting. I would take that to StataCorp Technical Services.
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#14

24 Mar 2021, 03:28

I have contacted StataCorp Technical Service. I've looked deeper in it and it looks like the second rectangle is the "outline width" (bar properties).
If I remove the outline width and I get

<rect x="1984.64" y="601.21" width="161.74" height="3.20" style="fill:#FF4A2F"/>
<rect x="1984.64" y="601.21" width="161.74" height="3.20" style="fill:none;stroke:#FF4A2F;stroke-width:0.00"/>

So it seems the value of the bar outline is mis-calculated when outline width is defined.

I wonder whether this is a result of my graph scheme or not. Which says: linewidth pbar thin
Edit: Further testing: Okay setting -linewidth pbar none- in my scheme it works good with svg.

Last edited by Marc Kaulisch; 24 Mar 2021, 03:34.
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#15

24 Mar 2021, 04:43

With further testing I see that "stroke-width:" is set gloablly and in one instance it is set to 5.00 and so if the rectangle is smaller than 5.00 it shows as if it is 5.00 because the stroke is larger than the bar height itself.
Edit: Re-tested it Manipulating linewidth p1bar p2bar does change stroke-width in svg...

Last edited by Marc Kaulisch; 24 Mar 2021, 04:48.
Comment

Announcement

-tabplot- display category with zero observations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment