Graphing issue

Stephanie Galen

Join Date: Mar 2018
Posts: 7

02 Apr 2018, 21:59

Hello all:

I have run into an issue in preparing a graph. I am able to graph the data that I want using the code that I have posted below, however, I am having trouble labeling my x-axis. I am graphing the b coefficient for a regression across a series of countries. I would like the country names to be the on the x axis, but I am only able to generate the graph using the country id, as when I try to generate the graph with the country names I receive the following error message: "string variables not allowed in varlist; country is a string variable". Is there a way that I can change the labels from id's to countries in the graph editor after the fact? I would really like to be able to have each country labeled on the x-axis if at all possible. Thank you.

Code:

gen significance=p_intinf<0.05

sepscatter b_intinf id if id!=14 & id!=5, separate(significance) msize(3 3),

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str34 country byte id float(b_intinf p_intinf)
"Australia"       1   .9716391 .000011603004
"Austria"         2   .4513713    .016493829
"Belgium"         3  .08817817      .0290304
"Canada"          4   .2308944    .012255648
"Chile"           5   20.06056      .5559472
"Denmark"         6  1.3176367    .000404277
"Finland"         7  2.0038514  5.121343e-09
"France"          8   .1371311   .0001762412
"Germany"         9  .02527034     .05414221
"Greece"         10    2.46497    .017690912
"Hungary"        11   .1819101      .8980632
"Ireland"        13   2.392673   .0001423292
"Israel"         14  31.529716     .09694891
"Italy"          15   .2460618 2.3089585e-06
"Japan"          16   .2516834   6.99161e-09
"Mexico"         19    1.24943     .27244684
"Netherlands"    20   .1206307    .020662295
"New Zealand"    21   6.356767    .005591494
"Norway"         22  1.3108352 .000012435406
"Portugal"       24   7.714956 3.1533016e-06
"Spain"          25  .23721674   .0023673815
"Sweden"         26  .50136775   .0015550116
"Turkey"         28   .7287502      .6851235
"United Kingdom" 29   .1651836    .002850089
"United States"  30 .013510913     .01402276
end
label var id "Id" 
label var b_intinf "(mean) b_intinf" 
label var p_intinf "(mean) p_intinf"

Tags: graph, label, scatter, string, syntax

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

02 Apr 2018, 22:16

Well, I don't know about -sepscatter-; it's not part of official Stata. But on the assumption that it is a wrapper for -graph two way scatter- and that it passes options through, you can do this as follows:

Code:

encode country, gen(id2) graph twoway scatter b_intinf id2 if !inlist(id2, 14, 5), xlabel(1(1)25, angle(90) valuelabel)

My guess is that if you stick that same -xlabel(...)- option onto your -sepscatter- command, you will get the axis labeling you are looking for.

By the way, I don't think the comma at the very end of that -sepscatter- command belongs there.
Comment
Stephanie Galen

Join Date: Mar 2018

Posts: 7
#3

02 Apr 2018, 22:29

I found -sepscatter- at https://www.statalist.org/forums/for...lable-from-ssc

Sepscatter allows me to distinguish statistically significant values through marker color/shape.

The code you provided runs beautifully when I type it in as written.

However, when I attempt to insert it into the sepscatter command it returns the error invalid 'xlabel'.

Code:

gen significance=p_intinf<0.05 encode country, gen(id2) sepscatter b_intinf id if id!=14 & id!=5, separate(significance) msize(3 3), xlabel(1(1)25, angle(90) valuelabel)

Last edited by Stephanie Galen; 02 Apr 2018, 22:32.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#4

03 Apr 2018, 05:04

I have comments on various levels.

First off, Clyde was indirectly underlining a long-standing request that people explain the provenance of community-contributed (user-written) commands that they refer to. See
https://www.statalist.org/forums/help#stata

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

If you are using community-contributed (also known as user-written) commands, explain that and say where they came from: theStata Journal, SSC, or other archives. This helps (often crucially) in explaining your precise problem, and it alerts readers to commands that may be interesting or useful to them.

Here are some examples:
I am using xtreg in Stata 13.1.
I am using estout from SSC in Stata 13.1.

So, sepscatter is from SSC.

Second, there is a syntax error in what you what wrote. You have an illegal extra comma before the xlabel() call. (Strictly, a comma isn't illegal in that position, but it is when you follow it with extra options: options all belong the same side of the comma.)

Looking at your command you didn't, I guess, intend what you wrote. If you don't want to include two countries they should be excluded from the encode. And the encode is useless here if you don't use its result. Note that Clyde did suggest plotting versus the new variable id2.

That all leads to a corrected command. Here is everything together for anyone who wants to experiment. My use of scheme s1color is just personal taste.

Code:

clear input str34 country byte id float(b_intinf p_intinf) "Australia" 1 .9716391 .000011603004 "Austria" 2 .4513713 .016493829 "Belgium" 3 .08817817 .0290304 "Canada" 4 .2308944 .012255648 "Chile" 5 20.06056 .5559472 "Denmark" 6 1.3176367 .000404277 "Finland" 7 2.0038514 5.121343e-09 "France" 8 .1371311 .0001762412 "Germany" 9 .02527034 .05414221 "Greece" 10 2.46497 .017690912 "Hungary" 11 .1819101 .8980632 "Ireland" 13 2.392673 .0001423292 "Israel" 14 31.529716 .09694891 "Italy" 15 .2460618 2.3089585e-06 "Japan" 16 .2516834 6.99161e-09 "Mexico" 19 1.24943 .27244684 "Netherlands" 20 .1206307 .020662295 "New Zealand" 21 6.356767 .005591494 "Norway" 22 1.3108352 .000012435406 "Portugal" 24 7.714956 3.1533016e-06 "Spain" 25 .23721674 .0023673815 "Sweden" 26 .50136775 .0015550116 "Turkey" 28 .7287502 .6851235 "United Kingdom" 29 .1651836 .002850089 "United States" 30 .013510913 .01402276 end label var id "Id" label var b_intinf "(mean) b_intinf" label var p_intinf "(mean) p_intinf" gen significance=p_intinf<0.05 encode country if id!= 14 & id !=5, gen(id2) set scheme s1color sepscatter b_intinf id2 if id!=14 & id!=5, /// separate(significance) msize(3 3) xlabel(1(1)23, angle(90) valuelabel) name(G0)

Third, it's not personal bias against sepscatter that makes me say that this graph doesn't work well. You would need to tinker with the legend and axis titles to make it even semi-civilised, but the bigger deal is the design of countries in alphabetical order on the horizontal axis. Alphabetical order works well for dictionaries and directories, but not generally otherwise. And axis labels at a vertical angle are the last resort of the desperate (me too, but only very occasionally).

I have to speculate on your behalf that

1. The variable labels don't mean what they say. These are, I guess, just slopes and P-values from a series of regression results. I don't know where (mean) came from.

2. You don't have good substantive or scientific reasons for excluding Israel and Chile: More likely, the slopes obtained are just too awkward for your graph.

It's a standard maxim that it's better to change the graph to suit the data, not the data to suit the graph.

My main suggestions are to order the countries on whatever is most interest and to put the countries on the vertical axis. Here are some examples:

Code:

label var b "Slope" label var p "P-value" sort b gen order = _n * labmask is from Stata Journal labmask order, values(country) scatter order b , yla(1/25, grid ang(h) valuelabel noticks) xsc(log alt r(0.01 .)) xla(0.01 0.1 1 10 100) ms(Oh) name(G2, replace) ytitle("") scatter order p , yla(1/25, grid ang(h) valuelabel noticks) xsc(log alt) xla(0.001 0.01 0.1 1) xli(0.05) ms(Oh) name(G1, replace) ytitle("") graph combine G1 G2, imargin(small)

The repetition of name of countries can easily be avoided by using the Graph Editor to hide one set of names.

If you wanted to keep to the idea of flagging results above and below P = 0.05, that can also be done:

Code:

separate order, by(p <= 0.05) scatter order order? b , yla(1/25, grid ang(h) valuelabel noticks) /// xsc(log alt r(0.01 .)) xla(0.01 0.1 1 10 100) ms(none Oh +) name(G3, replace) ytitle("") /// legend(order(2 "{it:P} {&le} 0.05" 3 "{it:P} > 0.05"))
Comment

Tim Morris

Join Date: Apr 2014
Posts: 92

03 Apr 2018, 06:11

Nick,

labmask. I have often thought how this would be a great command, and how strange that it isn't just 'there' in Stata. Of course you have written it years ago – thanks!
For your preferred (I think) graph, rather than do it in the graph editor, I would want to do it as follows.

Code:

clear
input str14 country byte id float(intinf1 intinf2)
"Australia"       1   .9716391 .000011603004
"Austria"         2   .4513713    .016493829
"Belgium"         3  .08817817      .0290304
"Canada"          4   .2308944    .012255648
"Chile"           5   20.06056      .5559472
"Denmark"         6  1.3176367    .000404277
"Finland"         7  2.0038514  5.121343e-09
"France"          8   .1371311   .0001762412
"Germany"         9  .02527034     .05414221
"Greece"         10    2.46497    .017690912
"Hungary"        11   .1819101      .8980632
"Ireland"        13   2.392673   .0001423292
"Israel"         14  31.529716     .09694891
"Italy"          15   .2460618 2.3089585e-06
"Japan"          16   .2516834   6.99161e-09
"Mexico"         19    1.24943     .27244684
"Netherlands"    20   .1206307    .020662295
"New Zealand"    21   6.356767    .005591494
"Norway"         22  1.3108352 .000012435406
"Portugal"       24   7.714956 3.1533016e-06
"Spain"          25  .23721674   .0023673815
"Sweden"         26  .50136775   .0015550116
"Turkey"         28   .7287502      .6851235
"United Kingdom" 29   .1651836    .002850089
"United States"  30 .013510913     .01402276
end

sort intinf1
gen byte order = _n

* labmask is from Stata Journal
labmask order, values(country)

* reshape puts b and p in the same variable (intinf, named by me in line #2). bp can be used in the by() option below
reshape long intinf, i(id) j(bp)
lab def bp 1 "Slope" 2 "p-value"
lab val bp bp

scatter order intinf,  ///
    by(bp, xrescale noiyaxes note("")) ///
    yla(1/25, grid ang(h) valuelabel noticks) xsc(log alt r(0.01 .)) xla(0.01 0.1 1 10 100) ms(Oh) name(G3, replace) xtitle("") ytitle("")

I think you didn't do this because the xlabels become awkward when you use -xsc(log)-, so need to be specified explicitly, but this can't be done separately even with the xrescale suboption (the graph editor calls them xaxis1[1] and xaxis1[2] rather than xaxis1 and xaxis2).

My question is: Can this be done without resorting to the graph editor?

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

03 Apr 2018, 07:52

Tim: You're right.

I had some false starts at this, including with multidot (SSC), but the combination of two variables with very different ranges and log scales always seemed to raise more small problems with axis labelling than I wanted. Also, I tried producing two graphs, one with and one without country names, and then combining them, but the gap between them remained stubbornly large.

labmask evidently goes back to 2002.

I don't know what I prefer here, as the results look too labile to be publishable, unless it's to undermine a published hypothesis.

EDIT

Stephanie's previous thread https://www.statalist.org/forums/for...ing-the-matrix seems to be the context here:

25 countries x 40 years and various plain regressions with different predictors.

Some Mickey Mouse points to ponder and rebut:

1. Mixing data for large economies with small economies (or so I guess) is often a fragile business if any of the variables is size-related. My bias is that I expect to see log scales making more scientific sense.

2. 40 is not a very large sample size for a regression with more than one or two predictors. (We know that you too would prefer more data.)

3. I see lots of variables that are some kind of *inf. It's not expected that your variable names make instant sense to outsiders, but beware inbuilt negative correlations if one of those going down means that the others go up on average.

Last edited by Nick Cox; 03 Apr 2018, 08:27.
Comment
Stephanie Galen

Join Date: Mar 2018

Posts: 7
#7

04 Apr 2018, 23:04

Thank you so much for your help Nick! You really provided so much more aid than I originally requested and my graphs are much better for it. This forum really is invaluable for Stata newcomers.

Context for my request if you were curious: The data set examines the effect of international trade and global value chains on domestic inflation over a 40 year period.
Comment

Announcement

Graphing issue

Comment

Comment

Comment

Comment

Comment

Comment