How to stratify results in a forest plot by two variables?

Eduardo Torre

Join Date: Jan 2016

Posts: 26
#1

How to stratify results in a forest plot by two variables?

17 Feb 2017, 06:48

Dear Statalist,

I am trying to deal with the following problem:

I want to plot several regression coefficients estimated considering quite few outcomes all together. My problem is that I do have 5 sub-populations for each outcome and within each sub-population 3 groups (with 2 coefficients to plot then, as the third group is ref). Therefore, I need to plot two coefficients for each subpopulation. I am using the metan package because it allows me to plot directly the coefficient and the CIs which have already been calculated (this is my case). To my knowledge other programs (e.g. ipdmetan) require the actual dataset to work properly.

At the moment I have used the following command:

metan coef lci uci, wgt(weight) nooverall null(0) ///
nobox lcols(sub_population group) tests(190) force ///
by(outcome) xlabel (-1.5,0,1)

coef = regression coefficient
lci uci = CIs
weight = weight (equal to 1, as it is not a meta-analysis)

The code works just fine but it has two main problems:

- The outcomes labels are displayed in the right position but not in bold characters
- Using lcols I have tried to overcome the issue displaying two columns, the first one on the left is about the sub-population, and the second one about the groups. This is clear but not aesthetically nice as it shows something similar to this

Blood pressure
Subpopulation 1 || Group1 (Group 3 as ref)
Subpopulation 1 || Group 2 (Group 3 as ref)
Subpopulation 2 || Group 1 (Group 3 as ref)
Subpopulation 2 || Group 2 (Group 3 as ref)

while I would like something like this:

Blood pressure

Subpopulation 1 (Group 3 as ref)
Group 1
Group 2

Subpopulation 2 (Group 3 as ref)
Group 1
Group 2

Is there a way to overcome this issue using metan? To my understanding the 'by' option does only support one variable. Alternatively, can I use a different package? The package should allow me to plot coefficients which have already been calculated though.

Thanks in advance.
Tags: forest plot, ipdmetan, metan, multiple groups
Eduardo Torre

Join Date: Jan 2016

Posts: 26
#2

18 Feb 2017, 10:30

Anybody who can help? Thanks in advance!
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 389
#3

18 Feb 2017, 22:15

I don't quite follow what you have been trying to accomplish. However, here is my opinion:

1.If you have been trying to present subgroups, you only have to reorganize your data accordingly and use the option label(namevar=variable).
2. Copy and paste the code below in order to see if it addresses your problem.

*/ --------------- start--------------
clear
input coef lci uci str20 group str20 subpopulation
.1266349 -.3146492 .5679189 Group1 Sub1
-.3509529 -1.023146 .3212402 Group2 Sub1
.08614 -.2777374 .4500174 Group1 Sub2
-.0951817 -.358211 .1678475 Group2 Sub2
-.0707719 -.4005753 .2590315 Group1 Sub3
-.1216107 -.3963165 .1530952 Group2 Sub3
.1412438 -.3248311 .6073186 Group1 Sub4
-.260193 -.5850677 .0646817 Group2 Sub4
-.1227866 -.5573016 .3117283 Group1 Sub5
-.0232415 -.2844573 .2379743 Group2 Sub5
end
metan coef lci uci , label(namevar=group) by(subpopulation) nooverall classic nosubgroup nowt xlabel(-2,-1,1,2) astext(60)
*/----------------end-------------------

All the best,

Tiago
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

19 Feb 2017, 04:15

You query may have remained unanswered for some time mainly due to the lack of information, as requested in the FAQ.

Taking the data presented in #3 , let's try to help you.

By the way, I got an error message ("type mismath", r 109) after typing Thiago's command.

Below, my suggestion:

Code:

.label define refsubpop 1 "Sub3" 2 "Sub1" 3 "Sub2" 4 "Sub4" 5 "Sub5" .encode subpopulation, gen(subpop) .label values subpop refsubpop .codebook subpop .metan coef lci uci, lcols(subpop) by(group)

Hopefully that helps!

Last edited by Marcos Almeida; 19 Feb 2017, 04:20.

Best regards,

Marcos
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 389
#5

19 Feb 2017, 07:15

*/ --------------- start--------------
clear
input coef lci uci str20 group str20 subpopulation
.1266349 -.3146492 .5679189 Group1 Sub1
-.3509529 -1.023146 .3212402 Group2 Sub1
.08614 -.2777374 .4500174 Group1 Sub2
-.0951817 -.358211 .1678475 Group2 Sub2
-.0707719 -.4005753 .2590315 Group1 Sub3
-.1216107 -.3963165 .1530952 Group2 Sub3
.1412438 -.3248311 .6073186 Group1 Sub4
-.260193 -.5850677 .0646817 Group2 Sub4
-.1227866 -.5573016 .3117283 Group1 Sub5
-.0232415 -.2844573 .2379743 Group2 Sub5
end
metan coef lci uci , label(namevar=group) by(subpopulation) nooverall classic nograph notable
metan coef lci uci , label(namevar=group) by(subpopulation) nooverall classic nosubgroup nowt xlabel(-2,-1,1,2) astext(60)
*/----------------end-------------------

should work. I have no idea why.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

19 Feb 2017, 07:40

Two further comments:

Im #4, I wanted to write "type mismatch" (instead of "mismath").

In the forthcoming messages, I kindly recommend to provide commands as well as data either under CODE delimiters or by using the SSC dataex.

Thanks.

Best regards,

Marcos
Comment
Eduardo Torre

Join Date: Jan 2016

Posts: 26
#7

20 Feb 2017, 10:30

Thanks both. The suggested example works perfectly. I was wondering whether would be possible to have a further stratification (e.g. Group Subgroup1 and Subgroup2)

Last edited by Eduardo Torre; 20 Feb 2017, 10:35.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

20 Feb 2017, 15:21

Hello Eduardo ,

Unfortunately, I did not get the point of having so many strata in spite of just a few studies.

That said, yes, you may fiddle with the commands shared in #4, plus the use of the if clause, for example.

The user-written metan is really excellent and I recommend that you take a look at its help files.

Surely, there you will find inspirational exemples!

Best regards,

Marcos
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#9

22 Feb 2017, 02:29

Dear Eduardo,

Apologies for the late response.

The above advice is, of course, excellent. However, if you already have coefficients and confidence intervals and specifically want extra details such as a third subgroup level and bolded headings, may I suggest using forestplot (part of the ipdmetan package). forestplot doesn't perform any analyses, but simply plots the data in memory (including, crucially, line breaks, spaces, formats, etc.) as a forest plot.

Let's start by generating a test dataset, based on the data you provide:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str14 outcome str8 subpopulation str13 group float(coef lci uci) byte weight "Blood pressure" "Subpop 1" "Group 1" -.5 -.75 -.25 1 "Blood pressure" "Subpop 1" "Group 2" -.25 -1 .5 1 "Blood pressure" "Subpop 1" "Group 3" 0 0 0 1 "Blood pressure" "Subpop 2" "Group 1" -.5 -.75 -.25 1 "Blood pressure" "Subpop 2" "Group 2" -.25 -1 .5 1 "Blood pressure" "Subpop 2" "Group 3" 0 0 0 1 "Second outcome" "Subpop 1" "Group 1" -.5 -.75 -.25 1 "Second outcome" "Subpop 1" "Group 2" -.25 -1 .5 1 "Second outcome" "Subpop 1" "Group 3" 0 0 0 1 "Second outcome" "Subpop 2" "Group 1" -.5 -.75 -.25 1 "Second outcome" "Subpop 2" "Group 2" -.25 -1 .5 1 "Second outcome" "Subpop 2" "Group 3" 0 0 0 1 end

Now, we're basically going to manually generate the spacing and labelling usually performed by metan, but to our own specifications.

Let's start with the headings and groupings:

Code:

gen int obs = _n gen byte expand = 2*(outcome!=outcome[_n-1]) expand expand bysort obs : gen byte _USE = cond(expand, _n>1, 1) drop expand replace subpop = "" if _USE==0 replace group = "" if _USE==0 replace coef = . if _USE==0 replace lci = . if _USE==0 replace uci = . if _USE==0 replace weight = . if _USE==0 // Bold-face heading using SMCL; see -help smcl- for details gen labels = `"{bf:"'+outcome+`"}"' if _USE==0 replace labels = subpop if subpop!=subpop[_n-1] & missing(labels) label var labels "Outcome and subpopulation" label var group "Group"

At this point, we can produce a basic forest plot:

Code:

forestplot coef lci uci, lcols(labels group) nowt

Now let's enhance it further:

Code:

// line breaks between subpops gen byte expand = 2*(_USE==1 & group=="Group 3") expand expand bysort obs (_USE) : replace _USE=0 if expand==2 &_n>1 replace group = "" if _USE==0 drop expand // replace "effect size" text for reference category gen effect = string(coef, "%5.2f") + " (" + string(lci, "%5.2f") + ", " + string(uci, "%5.2f") + ")" if _USE==1 & group!="Group 3" replace effect = "(reference)" if _USE==1 & group=="Group 3" label var effect "Effect (95% CI)" // left-justify left-hand columns ("describe" first to see the current display format; then simply negate the width value) describe labels group subpop format labels %-19s format group %-13s format subpop %-9s

Now we have a pretty good result:

Code:

forestplot coef lci uci, lcols(labels group) rcols(effect) nowt nostats

If you're not confident with Stata code, most of this manipulation could be done in Excel and copied/pasted into Stata. After running the code fragments above, try viewing the result in Stata and looking at the data structure. The crucial elements are:

- The data itself (effect size and confidence limits). forestplot will plot this, and will also automatically create a right-hand column to display the numbers as formatted text. However, in our second ("enhanced") plot, we over-rode this in order to display the reference category correctly (I plan to make this easier in a future version of forestplot). We manually generated our right-hand column and specified it in the rcols() option.

- Labels: We've taken the "outcome" (with bold formatting) and "subpopulation" to create our first left-hand column, "labels"; we then requested that "group" be an additional left-hand column using the lcols() option.

- Spaces and "_USE": the data will be plotted in the row order you provide, honouring any empty (in terms of effect-size data) rows. forestplot automatically looks for a variable named _USE which tells it what sort of data is in each row. Here, it's pretty simple: either data (_USE==1) or empty rows (_USE==0); but it can be more complicated (e.g. diamonds for pooled effects).

- Left-justification: Stata automatically right-justifies all its data, including strings. Therefore, we need to left-justify our left-hand columns before plotting. Unfortunately, there is no easy way of just saying "left-justify my data" (as far as I'm aware); you have to use Stata's format command, which is highly specific. Example code is given above. This is one step that cannot be done in Excel, as Stata will not honour the Excel justification when copying/pasting (as far as I'm aware).

I hope this is useful; please let me know if you have any questions.

Thanks,

David.
1 like
Comment
Alexander Rodriguez

Join Date: Jul 2017

Posts: 39
#10

30 May 2019, 01:07

Dear Statalisters,

I would like to revisit this topic as I have a similar query to the original by Eduardo Torre . For what it's worth, as I understand this thread is two years old now but there is the ipdover command now available to achieve the sub-group regression in forest plots.

My query however relates to using metan to perform meta-analyses and plot them on a single plot across two strata or 'layers'. For example, I am comparing outcomes in trials that diagnose disease using different methodologies (first 'layer' or strata) and then within each group of studies that use the same methodology there are different comparators to the trial drug of interest (second 'layer' or strata).

Using the useful plot by Marcos Almeida is it possible to get metan to plot what you have coded except each subgroup would represent a meta-analysis in itself.

Very crude, but I have attached a sketch to help trigger what I have in mind.

Many thanks,
Alexander
Attached Files

Many thanks,
Alexander
(Stata v14.2 IC for Mac)
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#11

03 Jun 2019, 04:12

Dear Alexander,

Leaving aside the text indentation, this looks similar to the original example except that you have four subgroups (albeit in sets of two) rather than two. Is that correct? If so, the only difficulty would seem to be your wish to have the subgroup diamonds above, rather than below, the individual trials.

At first glance, this looks do-able with forestplot.

Thanks,

David.
Comment
Alexander Rodriguez

Join Date: Jul 2017

Posts: 39
#12

10 Aug 2019, 19:41

Hi David Fisher
Apologies for the late response. Not strictly four subgroups, as you say, two sets of two subgroups. You mention that it is doable with forestplot. Are you able to provide a brief worked example to show this? I suspect my issue could be with the way I have coded the data. At the moment, I have two separate forest plots for Strata 1 and Strata 2 in my sketch above.

Code:

metan event_treat no_event_treat event_comparator no_event_comparator if strata==1, rr by(sub_strata) label(namevar==study) counts nooverall

and then

Code:

metan event_treat no_event_treat event_comparator no_event_comparator if strata==2, rr by(sub_strata) label(namevar==study) counts nooverall

I guess I wished metan had a way to incorporate two by options in the plot. In my example strata represents methodology and sub_strata represents drug comparator (placebo or active).

Many thanks,
Alexander
(Stata v14.2 IC for Mac)
Comment

Announcement

How to stratify results in a forest plot by two variables?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment