McNemar test - adjusting for baseline

Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#1

McNemar test - adjusting for baseline

17 May 2018, 16:26

Dear Stata Forum Users, do really need your help. I am reporting results of my randomized study on medication adjustment and face a challenge to adjust for the baseline values.

1) dataset:
I have 400 patients: 200 control/200 intervention, variables recorded: id (patient study number), r_treat (intervention or control group); nmedicine1 and nmedicine3 (number of medications received at day 3 and day 90), polypharmacy1 and polypharmacy3 - at day 3 and 90 (generated dichotome variables: if 10 and more drugs - polypharmacy==1, if 0-9 drugs polypharmacy==0)

2) Question : I compared number of persons with polypharmacy1 in the two groups and polypharmacy3 in with command:
*calculation risk ratios -table 90 days
csi 38 60 132 106, or

BUT i got a note that I should adjust for the baseline hyperpolypharmacy1 when I compare hyperpolypharmacy3.

-got an advice to calculate 2x2 table and then compare with z-test either by hand og using mcci to calculate the difference from baseline to 90 days and then again do z-test

I am really lost how to do it practically - I am sure there is a smart way to do it in STATA, Could you please help me with the commands?

3) example of my dataset:
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input int id float(nmedicine1 nmedicine3) byte r_treat float(hyperpolyfarmacy1 hyperpolyfarmacy3)
9 11 12 1 1 1
11 9 8 2 0 0
12 4 3 1 0 0
17 10 7 1 1 0
20 10 8 2 1 0
39 7 6 2 0 0
53 10 8 2 1 0
54 3 3 2 0 0
55 9 7 1 0 0
56 9 6 2 0 0
57 6 6 1 0 0
59 6 6 1 0 0
65 7 9 2 0 0
68 11 14 2 1 1
69 6 6 2 0 0
74 6 9 1 0 0
79 8 7 1 0 0
82 5 5 1 0 0
93 7 7 2 0 0
97 10 10 1 1 1
101 11 10 1 1 1
104 6 5 2 0 0
105 4 7 2 0 0

Thanks alot on advance for your help! It is really appreciated! Sincerely, Natallia
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1128
#2

18 May 2018, 16:06

Hello Natalia. One reason people may not be responding quickly is that you omitted some of the output from -dataex- the closing tag for the CODE section. Here's your sample data again with those issues fixed.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int id float(nmedicine1 nmedicine3) byte r_treat float(hyperpolyfarmacy1 hyperpolyfarmacy3) 9 11 12 1 1 1 11 9 8 2 0 0 12 4 3 1 0 0 17 10 7 1 1 0 20 10 8 2 1 0 39 7 6 2 0 0 53 10 8 2 1 0 54 3 3 2 0 0 55 9 7 1 0 0 56 9 6 2 0 0 57 6 6 1 0 0 59 6 6 1 0 0 65 7 9 2 0 0 68 11 14 2 1 1 69 6 6 2 0 0 74 6 9 1 0 0 79 8 7 1 0 0 82 5 5 1 0 0 93 7 7 2 0 0 97 10 10 1 1 1 101 11 10 1 1 1 104 6 5 2 0 0 105 4 7 2 0 0 end

I'm still a bit unclear on what it is you're trying to do. The cs in the -csi- command you used is short for cohort study. But say you're doing a randomized study (i.e., an experiment). So I think what you are trying to do is estimate a regression model that has polypharmacy3 as the outcome, and with r_treat and polypharmacy1 as explanatory variables. Is that right? If it is, one option would be to use -logit- to estimate a logistic regression model. But as it is an experiment, you may want to report a relative risk (or risk ratio, RR for short) rather than the odds ratio (OR) you get from -logit-. If so, you can see this UCLA web-page for some guidance.
https://stats.idre.ucla.edu/stata/fa...ohort-studies/

If I've completely misunderstood what you're wanting to do, you'll have to clarify.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

19 May 2018, 07:13

If I understood right, you could do some "massage" in the dataset so as to prepare it for a - clogit - estimation.

Best regards,

Marcos
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#4

19 May 2018, 14:10

Dear Bruce, thanks you for the replay and for the correction of the dataset-posting. Well, unfortunately, mu statistical knowledge is sparse, as I am a medical doctor, so I do in advance apologize for the level of my statistical understanding. I try to explain my challenge:
I do a randomized study. I calculated a "baseline -table" where I calculated number of persons with hyperpolypharmacy1 - prevalence- (and compared with chi2 test, for getting the RR with csi-command. I have also other vabiable (like f.e.x. persons with opioids, analgetics etc etc, but all done the same way)
Step 2: I did "an outcome-table", where I calculated number of persons with hyperpolypharmacy3 (again compared with chi2). Then I was faced with the question from statistician-colleague that I have to adjust my "outcome-results" and the way to do it is to perform the McNemar test (on 2x2 tables: hyperpolypharmacy/non-hyperpolypharmacy baseline/outcome) adn afterwards z-test, the problem that my colleague does not use STATA and did one calculation for me by hand. I even have a recipe to do it by hand, by would like to do it in STATA.
I hope, I managed to make my challenge a bit more clear, and do hope you can help with advice!
Thanks alot for using your time!
Natallia
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1128
#5

19 May 2018, 20:20

Thank you for clarifying, Natallia. I'd like to go back to your first post, where you wrote this:

I have 400 patients: 200 control/200 intervention, variables recorded: id (patient study number), r_treat (intervention or control group); nmedicine1 and nmedicine3 (number of medications received at day 3 and day 90), polypharmacy1 and polypharmacy3 - at day 3 and 90 (generated dichotome variables: if 10 and more drugs - polypharmacy==1, if 0-9 drugs polypharmacy==0)

Given what you've said about the design, I should think that your main question is whether there is a treatment effect. And a McNemar chi-square test is not going to address that question: It uses paired data, whereas your treatment groups are independent groups.

Here is another observation: Your polypharmacy variables are not true dichotomies, they are counts (the nmedicine variables) that have been carved into dichotomies. Coarsening variables in that way is generally ill-advised, because it discards information needlessly, and therefore reduces power. Have you given any thought to using a count regression model (e.g., Poisson or Negative Binomial) instead? It seems to me that a count regression model something like this would be more suitable, and would address the question suggested by your design.

Code:

* https://stats.idre.ucla.edu/stata/dae/poisson-regression/ poisson nmedicine3 i.r_treat nmedicine1, vce(robust) poisson, irr

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#6

20 May 2018, 09:43

Dear Bruce, you might be 100% right. I actually used negative binominal regression when addressing hospitalizations (when I count events). You are write, I aim to see if my treatment had an effect, but I did not aim to see if I reduced the number of medications (well in the way I do), but my main question if the treatment reduced the number of persons receiving more than 10 drugs (no matter if it is 10 or 20). Would you still recommend to do the binominal regression or it is ok to use chi2 test to test the prevalence (number of persons with 10 and more drugs) in two independent groups (somehow adjusting for the baseline values of hyperpolypharmacy)?
Thank you so much for your help - the discussion really helps me to justify the statistic in my study.
Kindly, Natallia
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1128
#7

20 May 2018, 10:25

Hello Natallia. By using that cut-point of 10+ medications, you are treating someone with 9 medications exactly the same as someone with 0; and you are treating someone with 10 exactly the same as someone with 20. I don't know why you would want to discard the information that is available in the actual counts. Is there some practical (e.g., administrative) reason that makes 10+ an important cut-point? That would be one reason to perhaps consider using a cut-point. But even then, I would be inclined to treat the analysis of the dichotomous outcome as secondary. I.e., my primary analysis would use nmedicine3 as the outcome in a Poisson or Negative Binomial model; and a (possible) secondary analysis would use -logit- or -glm- with the same explanatory variables, but with hyperpolyfarmacy3 as the dichotomous outcome variable (see UCLA page link in #2).

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#8

20 May 2018, 16:45

Dear Bruce, thanks you so much for the advices. Would you say that I could program like that (if I chose -for clinival reasons- to use a cut-of point)

*
logit hyperpolyfarmacy3 i.r_treat hyperpolyfarmacy1, vce(robust)
contrasts r_treat, eff or

Could you advice how I can get the RR (Risk ratio) indtead of OR (Odds ratio) in the output?

Could you satisfy my couriosity: is it possible to make McNemar test (to find out relative differences in the polypharmacy within the treatment group from baseline to outcome) and that compare the relative differences between the treatment groups with z-test?
How technically to program it Stata?

Thanks you so much for you help.
Kindly, Natallia
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1128
#9

20 May 2018, 18:09

Natallia, see the UCLA page below for examples of how to estimate a model that will give you the RR rather than the OR.
https://stats.idre.ucla.edu/stata/fa...ohort-studies/

Re the question about comparing the two McNemar Chi-squares,you could take advantage of the relationship shown here (first bullet point):
https://en.wikipedia.org/wiki/F-dist..._distributions

Your McNemar Chi-squares will have df=1, so the ratio of the two Chi-squares will be distributed as F(1,1).

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#10

21 May 2018, 05:06

Dear Bruce, thank you so much for your help!
I do apologize if I might bore you with silly questions - I am unfortunately a "STATA-dummie".
I tried to follow your suggestion and construct a "glm" model accordingly to the UCLA-tutorial, but I am very unsure in my "STATA-writing".
Am I doing right?
* Without baseline (hyperpolyfarmacy1 adjustment)
glm hyperpolyfarmacy3 ib1.r_treat, fam(bin) link(log) nolog eform

*with hyperpolyfarmacy1 adjustment
glm hyperpolyfarmacy3 ib1.r_treat ib2.hyperpolyfarmacy1, fam(bin) link(log) nolog eform

Thank you so much!
Kindly,
Natallia
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1128
#11

21 May 2018, 07:14

Hello Natallia. Yes, you're on the right track there. If you wanted to, could also generate a likelihood ratio test to compare the fits of the two models. Something like this:

Code:

* https://stats.idre.ucla.edu/stata/dae/poisson-regression/ * Without baseline (hyperpolyfarmacy1 adjustment) glm hyperpolyfarmacy3 ib1.r_treat, fam(bin) link(log) nolog glm, eform estimates store m1 *with hyperpolyfarmacy1 adjustment glm hyperpolyfarmacy3 ib1.r_treat ib2.hyperpolyfarmacy1, fam(bin) link(log) nolog glm, eform estimates store m2 lrtest m1 m2

Alternatively, you could use the -nestreg- prefix command. But bear in mind that -nestreg- does not support factor variables. With that in mind, I compute a true indicator variable for treatment first.

Code:

generate byte treat = r_treat==1 // 1=Treat, 0=Control nestreg: glm hyperpolyfarmacy3 (treat) (hyperpolyfarmacy1), /// fam(bin) link(log) nolog

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#12

21 May 2018, 10:26

Dear Bruce, thanks alot for commands and advices, will check today how it works for me in reality - always challenged with STATA:-) Thanks alot. Dont promise that I will not spam you with more silly questions when I runned commands:-). Kindly, Natallia
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#13

22 May 2018, 07:42

Dear Bruce, facing some challenges with glm model when trying to apply for other variables (f.ex. I have variable medication PPI1 and PPI3, and ATC_class A (gastro-intestinal mediciation), d-vit Cholecalciferol1 and Cholecalciferol3 etc etc - same principle -all persons are dichotomized to binary outcoems if thay get og not-get medication).
Stata starts thinking for very looong time and then gives these errors:
1) . *with PersonWithn_PPI1 adjustment
. glm PersonWithn_PPI3 ib1.r_treat ib2.PersonWithn_PPI1, fam(bin) link(log) n
> olog
note: 1.PersonWithn_PPI1 omitted because of collinearity
note: 2.PersonWithn_PPI1 identifies no observations in the sample

2) In some of my 2x2 table I have 0 values (f.ex. if no patients getting Quinin), how should I deal with that? I can see I can get a value with chi2 test, but how do I adjust for baseline og RR should be then reported as "-"?

I might going wrong somehow... Could you give an advice? Thanks alot for your help. Hope you can give an advice although I am getting too detailed...
Kindly,
Natallia
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1128
#14

22 May 2018, 08:47

Hello Natallia. What do you see if you cross-tabulate your two explanatory variables? I suspect one of them can be predicted (nearly) perfectly from the other.

Code:

tabulate ib1.r_treat ib2.PersonWithn_PPI1 if !missing(PersonWithn_PPI3)

Re your second point, explanatory variables have to vary. If they are constants (in your sample), they'll get kicked out of the model.

Perhaps other members can offer some further advice.

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#15

22 May 2018, 09:10

Dear Bruce, more thoughts for the choice of statistic method. My mathematical colleque (not medical researcher) brought some doubts. He calculated McNemar on less 10/more than 10 drugs and then compared the relative outcome-baseline differences between the groups with z-test (and get a p value of 0.17 agains 0.02 without adjustment). When I do glm with adjustment the p-value moves from 0.007 without adjustment to 0.017 with adjustment. Would you expect so much difference between the models?
I got poined that GLM looks at the baseline measurement as a fixed value and not something that is equally uncertain as the follow-up measurement. Secondly, GLM assume that the effect of the intervention is the same among patients with hyperpolypharmacy at baseline as among them without (when you do not has an interaction between the intervention and the baseline measurement included in the model).

What do you think? Is it possible to adjust the model to deal with this challenge? To analyze data using differences, adjust for baseline i a linear regression or to analyze it as repeated measurements, in the case of continuous measurements?

Thank you so much for your time ! Kindly, Natallia
Comment

Announcement

McNemar test - adjusting for baseline

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment