Looking for appropriate way to plot graph [Stata 18.0]

Anna Binger

Join Date: Jul 2024

Posts: 26
#1

Looking for appropriate way to plot graph [Stata 18.0]

06 Aug 2024, 09:37

Hey everyone,

I've just begun with Stata, and I have absolutely no clue how to solve this.
I have a drawing for a graph that I need to plot (see attachment). In the graph I need to depict errors in probands' estimations compared to the real world value for 8 different questions.

My data structure is the following:
The probands were asked for their estimations on 8 different topics -> I have 8 variables that show percentage values for each proband (>100 observations)
Probands differ between each other by treatment (two treatments) and region (two regions).
For each of the 8 topics I also have one real world value (in percent).

The graph is supposed to differentiate between the two regions (Region 1 on the left, region 2 on the right).
For each of the 8 variables I need one dotted line with
- real world value (as the baseline)
- belief percentage for treatment==0: mean with confidence interval
- belief percentage for treatment==1: mean with confidence interval
all on the same line.

So far, I have calculated:
- means
- standard errors
- lower and upper confidence interval level
for all 8 variables.

I have also generated new variables:

Code:

foreach v of varlist v1 v2 v3 v4 v5 v6 v7 v8 { gen meanS_`v' = mean_`v' if north==0 gen meanN_`v' = mean_`v' if north==1 }

(same for the other statistics)
in order to differentiate between regions in the graph, not sure if neccessary.

Based on the drawing and the data structure, does anyone on the top of their head has an idea how to plot this graph?
If anything is unclear, I'm happy to provide more detailed information.

Thank you!

Anna

Attached Files
Tags: None
ericmelse

Join Date: May 2014

Posts: 434
#2

06 Aug 2024, 22:11

Dear Anna,

Welcome to the Statalist. You could take a look at coefplot using this weblink, using:

Code:

* Set up ssc install coefplot, replace h coefplot // Check the help file

which offers a lot of flexibility to create graphs, plots, of the 'above kind'.

Do also read The Stata Journal paper of the author of this user community contributed command.
Also his working paper is a 'must read' that I can recommend.

http://publicationslist.org/eric.melse
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35639

07 Aug 2024, 04:08

If coefplot solves this, that's great. Otherwise, here is some technique, after which the graphics is a twoway call.

Code:

clear 
set obs 200 
gen direction = cond(_n <= 100, "North", "South") 
gen treatment = mod(_n, 2)
set seed 314159 

gen var1 = rnormal(0, 1)
gen var2 = rnormal(1, 1)

save sillyexample  

forval j = 1/2 { 
    use sillyexample, clear 
    statsby, by(treatment direction) : ci mean var`j'
    gen which = `j'
    save ci`j' 
}

append using ci1 

list

Most of my code is setting up a data example, absent one in #1 here. You would need to change dataset and variable names, and the last command would be something like

Code:

append using ci1 ci2 ci3 ci4 ci5 ci6 ci7

Comment

Anna Binger

Join Date: Jul 2024
Posts: 26

07 Aug 2024, 08:03

Thank you, Eric (#2).
My first idea was coefplot, too, however I've only used it after regress or margins and am unsure if it's applicable to my type of data.

This is an example how my data looked like in the beginning:

	expat_percent	expat_true	expat_percent_error	treatment	north
1	.	10.83	.	0	0
2	9	10.83	-1.83	0	0
3	22	10.83	11.17	0	1
4	.	10.83	.	0	1
...	...	...	...	...	...
140	31	10.83	20.17	1	1

- expat_percent: Participants were asked for their estimation how many expats live in a country (in percent)
- expat_percent_error = expat_percent - 10.83
I need to visualize the estimation errors of my participants, so expact_percent_error is the first of 8 variables that are of interest.

Because mean and confidence interval for the expat_percent error will be shown, I was adviced to use the collapse command:

Code:

preserve
collapse (mean) mean_* = expat_percent_error (semean) se_y = expat_percent_error, by(treatment north)
save mean_and_se, replace
restore
merge m:1 treatment north using "~/mean_and_se.dta", assert(3) nogen

gen yu = mean_expat_percent_error + 1.96 * se_y
gen yl = mean_expat_percent_error - 1.96 * se_y

So the data I got now is

	expat_percent	expat_true	expat_percent_error	treatment	north	mean_expat_percent_error	yu	yl
1	.	10.83	.	0	0	10.02	12.34	8.73
2	9	10.83	-1.83	0	0	10.02	12.34	8.73
3	22	10.83	11.17	0	1	8.74	11.29	6.01
4	.	10.83	.	0	1	8.74	11.29	6.01
...	...	...	...	...	...	...	...	...
140	31	10.83	20.17	1	1	8.85	10.92	6.23

In order to use coefplot, how can I proceed in this case? (I did try to do regress and margins but it doesn't seem to be the right way)
Would a twoway scatterplot work instead?

Thank you!

Comment

ericmelse

Join Date: May 2014

Posts: 434
#5

07 Aug 2024, 09:29

Dear Anna,
You are right about the 'classical' functionality of coefplot that involves the processing of result data after regress or margins.
But, there is a third input source that coefplot can process: a matrix.
It is a somewhat more involved exercise but it allows for flexible coding and I use it a lot.
I think that you can use the collapse command to get the data to work with for your plot (I have not considered your data example as such).
But, from #4, should I conclude that you want to plot 140 variables?
Your paper sketch mentions 8 variables(?).

http://publicationslist.org/eric.melse
1 like
Comment

Anna Binger

Join Date: Jul 2024
Posts: 26

08 Aug 2024, 04:29

Hi Eric,

thanks for your reply and sorry for the confusion.
Reg #4: 140 is my number of observations in the original data set.
I just realized I wasn’t clear enough in my description, so let me clarify!
My original dataset:

140 observations
Various variables (incl treatment and north)
8 variables of interest (first is expat_percent_error, called the rest v2-v8)
expat_percent: Participants were asked for their estimation how many expats live in a country (in percent)
expat_percent_error = expat_percent - 10.83
same for the 7 other variables of interest.

observations	Treatment	north	expat_percent	expat_true	expat_percent_error	v2_percent	V2_true	V2_percent_error	Same for v3-v7
1	0	0	.	10.83	.
2	0	0	9	10.83	-1.83
3	0	1	22	10.83	11.17
4	0	1	.	10.83	.
...	…	…	...	...	...
140	1	1	31	10.83	20.17

Because mean and confidence interval for the expat_percent_error (and v2-v8) will be shown, I was adviced to use the collapse command:

Code:

collapse (mean) mean_* = expat_percent_error (semean) se_* = expat_percent_error, by(treatment north)

After collapse, I had

4 observations
16 variables (mean & semean for all 8 variables of interest)

Observations	mean_expat_percent_error	se_ expat_percent_error	…same for v2-v8…
1 (treatment=0, north=0)
2 (treatment=0, north=1)
3 (treatment=1, north=0)
4 (treatment=1, north=1)

I then generated confidence interval limits:

Code:

gen yu_* = mean_expat_percent_error + 1.96 * se_*
gen yl_* = mean_expat_percent_error - 1.96 * se_*

for all 8 variables.

Meaning I got the following data:

4 observations
24 variables (mean, yu & yl for all 8 variables of interest)

Observations	mean_expat_percent_error	yu_expat_percent_error	yl_expat_percent_error	...same for v2-v8…
1 (treatment=0, north=0)	10.02	12.34	8.73
2 (treatment=0, north=1)	8.74	11.29	6.01
3 (treatment=1, north=0)	10.05	12.47	9.02
4 (treatment=1, north=1)	8.85	10.92	6.23

I can use this data set for plotting the graph if there is an appropriate way.
I will try the coefplot matrix solution you suggested.

My last step was probably very unnecessary, but I thought I needed it:
Merged original data set (140 obs) with data from above.

Code:

merge m:1 treatment north using "~/data_from_above.dta", assert(3) nogen

Result:

Observation	Treatment	north	expat_percent	expat_true	expat_percent_error	mean_expat_percent_error	yu	yl	…Same for 7 other variables…
1	0	0	.	10.83	.	10.02	12.34	8.73
2	0	0	9	10.83	-1.83	10.02	12.34	8.73
3	0	1	22	10.83	11.17	8.74	11.29	6.01
4	0	1	.	10.83	.	8.74	11.29	6.01
...	…	…	...	...	...	...	...	...
140	1	1	31	10.83	20.17	8.85	10.92	6.23

The graph is supposed to look like this:

Click image for larger version

Name: updated drawing graph.png
Views: 1
Size: 138.6 KB
ID: 1761020

My ideas were

Coefplot (did not know about the matrix option, which is why I did the merge with the original data set- thought I had to somehow regress for it to work)
Twoway scatterplot (but I couldn’t manage to show all 8 variables on the y-axis, with treatment=0 and treatment=1 on the same line for all 8 variables)

Hope this description is more clear.

Thank you!

Last edited by Anna Binger; 08 Aug 2024, 04:33.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10186

08 Aug 2024, 05:56

Your data example is not well-organized. Presumably, you have all combinations of region (North, South) and treatment (Control, Treatment) per unit of observation. People are less likely to respond if you do not put in the work to create a satisfactory example. I will make an exception since you are new. Note that unless the treatment and control groups exhibit significant differences, placing the confidence intervals in the same row would make the presentation difficult to follow. You need offsets. Additionally, there is no need for a marker for the true values since they are constant; the vertical line is sufficient.

Code:

clear
input int obs byte expat_percent double(expat_true expat_percent_error) byte(treatment north) double(mean_expat_percent_error yu yl)
  1  . 10.83     . 0 0 10.02 12.34 8.73
  1  . 10.83     . 1 0 12.02 14.34 10.73
  1  . 10.83     . 0 1 13.02 13.34 9.73
  1  . 10.83     . 1 1 13.02 15.34 11.73
  3 22 10.83 11.17 1 1  10.74 13.29 8.01
  3 22 10.83 11.17 0 1  8.74 11.29 6.01
  3 22 10.83 11.17 1 0  11.74 14.29 9.01
  3 22 10.83 11.17 0 0  9.74 12.29 7.01
140 31 10.83 20.17 1 0  8.85 10.92 6.23
140 31 10.83 20.17 1 1  11.85 13.92 9.23
140 31 10.83 20.17 0 0  7.85 9.92 5.23
140 31 10.83 20.17 0 1  10.85 12.92 8.23
end

cap noisily{
    mkmat expat_true-yl if north & !treatment, mat(coefs10) rowname(obs)
    mkmat expat_true-yl if north & treatment, mat(coefs11) rowname(obs)
    mkmat expat_true-yl if !north & !treatment, mat(coefs00) rowname(obs)
    mkmat expat_true-yl if !north & treatment, mat(coefs01) rowname(obs)
}

coefplot (mat(coefs10[,5]), ci((coefs10[,7] coefs10[,6])))  (mat(coefs11[,5]), ci((coefs11[,7] coefs11[,6]))), bylabel(North) || ///
(mat(coefs00[,5]), ci((coefs00[,7] coefs00[,6]))) (mat(coefs01[,5]), ci((coefs01[,7] coefs01[,6]))), bylabel(South) ||, ///
xline(`=coefs11[1,1]') leg(order(2 "Control" 4 "Treatment")) byopts(note(Dashed line represents true values))

Click image for larger version

Name: Graph.png
Views: 1
Size: 24.9 KB
ID: 1761026

Last edited by Andrew Musau; 08 Aug 2024, 06:15.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35639
#8

08 Aug 2024, 07:56

New thread at https://www.statalist.org/forums/for...ence-intervals

If you've abandoned this one, please say so. Otherwise please don't run two or more related threads at the same time.
Comment
Anna Binger

Join Date: Jul 2024

Posts: 26
#9

08 Aug 2024, 08:15

Hi Andrew,

Thank you so much and my apologies.
This is my first time ever working with data as well as Stata and I thought it would be neccessary to walk you through my entire thought and work process.
I now realize that was wrong.

There seems to be a misunderstanding in #7. I made a mistake in how I presented my data.

This is the data I need to present in an easier to use version:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(treatment north mean_v1 mean_v2 yu_v1 yu_v2 yl_v1 yl_v2) 0 0 12.92 4.38 15.19 5.21 10.61 3.5 0 1 8.05 3.07 10.59 4.15 5.47 1.99 1 0 13.4 5.42 17.96 6.76 10.82 4.08 1 1 8.11 2.99 12.54 3.68 5.62 2.32 end

(To make it easier, I reduced the amount of v* from 8 to 2.)
The graph then should look like this:

Thank you!
Comment
Anna Binger

Join Date: Jul 2024

Posts: 26
#10

08 Aug 2024, 08:18

Hi Nick #8.

My apologies. I did not mean to abandon this thread. It took me a while to figure out how to rephrase my question and present the data.
In this thread, I asked which would be the appropriate way to visualize my data.
In the other thread I asked a specific question regarding the code of a scatterplot, which is why I thought it would not be appropriate to post in this thread, even tho it uses the same example data.

My apologies! I will take down the other thread, if that is what I am supposed to do.

Anna
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10186

#11

08 Aug 2024, 10:19

Same technique once you restructure your data, except that the points I make in #7 apply. Hollow symbols may help if you insist on having everything in one line.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(treatment north mean_v1 mean_v2 yu_v1 yu_v2 yl_v1 yl_v2)
0 0 12.92 4.38 15.19 5.21 10.61  3.5
0 1  8.05 3.07 10.59 4.15  5.47 1.99
1 0  13.4 5.42 17.96 6.76 10.82 4.08
1 1  8.11 2.99 12.54 3.68  5.62 2.32
end

gen id=_n
reshape long mean_ yu_ yl_, i(id) j(which) string
destring which, ignore(v) replace
cap noisily{
    mkmat which-yl if north & !treatment, mat(coefs10) rowname(which)
    mkmat which-yl if north & treatment, mat(coefs11) rowname(which)
    mkmat which-yl if !north & !treatment, mat(coefs00) rowname(which)
    mkmat which-yl if !north & treatment, mat(coefs01) rowname(which)
}

coefplot (mat(coefs10[,4]), ci((coefs10[,6] coefs10[,5])) msy(Oh)) ///
(mat(coefs11[,4]), ci((coefs11[,6] coefs11[,5])) msy(Th)), offset(0) bylabel(North) || ///
(mat(coefs00[,4]), ci((coefs00[,6] coefs00[,5]))) ///
(mat(coefs01[,4]), ci((coefs01[,6] coefs01[,5]))), offset(0) bylabel(South) ||, ///
leg(order(2 "Control" 4 "Treatment")) byopts(note(Dashed line represents true values)) ///
ciopts(recast(rcap)) ylab(1 "Var 1" 2 "Var 2")

Click image for larger version

Name: Graph.png
Views: 1
Size: 24.6 KB
ID: 1761060

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35639
#12

08 Aug 2024, 13:43

#10 Thanks for your explanation. You can't delete threads or posts of yours; this is explained at https://www.statalist.org/forums/help#closure

Otherwise, speaking for myself, there are now several questions in two threads, and several lines of attack suggested (from me too), so I think I'll bail out now and trust that your threads will converge.
Comment
Anna Binger

Join Date: Jul 2024

Posts: 26
#13

09 Aug 2024, 07:27

Thank you, Andrew and Nick, this was incredibly helpful.
Comment

Anna Binger

Join Date: Jul 2024
Posts: 26

#14

12 Aug 2024, 14:36

Hello everyone,
I do have a follow up question that I'm not sure belongs here or in a new thread.
Since it is based on the same data, I decided to try it here first. Happy to open a new thread if that is more appropriate!

Data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(treatment north mean_v1 mean_v2 se_v1 se_v2)
0 0 12.92 4.38 .76 .25
0 1  8.05 3.07  .9 .19
1 0  13.4 5.42 .91 .24
1 1  8.11 2.99 .85 .21
end

I need to generate a table like this:

	North				South
	Treatment = 0		Treatment = 1		Treatment = 0		Treatment =1
	mean	se	mean	se	mean	se	mean	se
Var 1	8.05	0.90	8.1	0.85	12.92	0.76	13.4	0.91
Var 2	3.07	0.19	2.99	0.21	4.38	0.25	5.42	0.24

My idea was plotting a matrix (like before in #11):

Code:

gen id=_n 
 reshape long mean_ se_, i(id) j(which) string destring which, ignore(v) replace cap noisily{     mkmat which-yl if north & !treatment, mat(table10) rowname(which)     mkmat which-yl if north & treatment, mat(table11) rowname(which)     mkmat which-yl if !north & !treatment, mat(table00) rowname(which)     mkmat which-yl if !north & treatment, mat(table01) rowname(which)}

but I'm at a loss how to proceed. Does anyone have an idea? Thank you!

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10186
#15

15 Aug 2024, 05:07

Start a new thread as this thread is about graphing and #14 is not.
1 like
Comment

Announcement