Regression analyses-the explanatory variable does not vary much

Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#1

Regression analyses-the explanatory variable does not vary much

22 May 2018, 10:28

I am running a GLM regression model to calculate RR on my medication results (Randomized study, 185 persons in intervention group, 183 persons in the control group; intervention variable is r_treat), PersonWithn_Tricycl3==1 if person gets the tricyclic andtidepressants at 90 day followup, PersonWithn_Tricycl1==1 if person gets the tricyclic andtidepressants at baseline, same for the PPI and all other medications

* Without baseline adjustment
glm PersonWithn_Tricycl3 ib1.r_treat, fam(bin) link(log) nolog
glm, eform
estimates store m1

I need to make an adjustemnt for the baseline medication:

*with PersonWithn_Tricycl1 adjustment
glm PersonWithn_Tricycl3 ib1.r_treat ib2.PersonWithn_Tricycl1, fam(bin) link(log) nolog
glm, eform
estimates store m2
lrtest m1 m2

And I get kikked out of the model:

note: 1.PersonWithn_Tricycl1 omitted because of collinearity
note: 2.PersonWithn_Tricycl1 identifies no observations in the sample

Number of persons receiving Baseline Intervention Outcome intervention Baseline Control Outcome Control

Tricyclic antidepressants 7 11 4 2

PPI 86 64 80 73

Could anyone helps how to handle this issue? Or should It be a different model I should use to make this ajustment.
Baseline medications does not differ significantly (randomized trial), but outcome medications does-- But I have to justify that this is the Intervention effect, but not the fact of the baseline medication.

Dataset example:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int id byte r_treat float(PersonWithn_PPI1 PersonWithn_Tricycl1 PersonWithn_PPI3 PersonWithn_Tricycl3) 9 1 1 0 1 0 11 2 1 0 1 0 12 1 1 0 1 0 17 1 0 0 0 0 20 2 1 0 1 0 39 2 0 0 0 0 53 2 0 0 0 0 54 2 0 0 0 0 55 1 0 0 0 0 56 2 0 0 0 0 57 1 0 0 0 0 59 1 0 0 0 0 65 2 1 0 1 0 68 2 1 0 1 0 69 2 1 0 0 0 74 1 0 0 0 0 79 1 1 0 0 0 82 1 0 0 0 0 93 2 0 0 0 0 97 1 0 0 0 0 101 1 1 0 1 0 104 2 0 0 0 0 105 2 0 0 0 0 117 2 0 0 0 0 121 2 1 0 1 0 122 1 0 0 0 0 136 1 0 0 0 0 146 1 0 0 0 0 147 2 0 0 0 0 153 1 0 0 0 0 154 2 0 0 0 0 165 1 0 0 0 0 166 1 0 0 0 0 167 1 1 0 1 0 170 1 0 0 0 0 172 1 0 0 0 0 173 1 0 0 0 0 177 2 0 0 1 0 178 2 1 0 0 1 192 2 1 0 1 0 193 2 0 0 1 0 197 1 0 0 0 0 199 1 0 0 0 0 203 2 0 0 0 0 209 2 1 0 1 0 213 1 1 0 1 0 221 2 1 0 1 1 224 1 0 0 0 0 229 2 0 0 0 0 231 1 0 0 0 0 233 1 1 0 1 0 234 1 0 1 1 1 239 1 0 0 1 0 243 1 1 0 1 0 245 2 1 0 1 0 247 1 1 0 1 0 249 2 0 0 0 0 270 2 0 0 0 0 282 1 0 0 0 0 287 1 0 0 0 0 288 2 1 0 1 0 293 1 0 0 0 0 306 2 1 0 0 0 310 2 0 1 0 1 317 1 1 0 1 0 332 2 0 0 0 0 335 1 1 0 1 0 340 2 0 0 0 0 341 2 1 0 0 0 352 2 0 0 0 0 353 1 1 0 1 0 354 1 1 0 1 0 358 2 0 0 0 0 364 2 1 0 0 0 366 2 0 0 0 0 367 2 0 0 0 0 373 1 0 0 0 0 375 1 1 0 1 0 377 1 0 0 0 0 380 2 0 1 0 1 381 2 0 0 1 0 390 1 1 0 1 0 397 2 0 0 0 0 400 2 1 0 1 0 401 1 1 0 1 0 402 2 1 1 0 1 408 2 0 0 0 0 411 1 1 0 1 0 412 1 0 0 0 0 416 2 1 0 1 0 418 2 1 0 1 0 421 1 1 0 0 0 428 2 1 0 1 0 429 1 0 0 0 0 433 2 0 0 0 0 435 1 1 0 1 0 440 1 0 0 0 0 442 2 1 0 1 0 446 1 0 0 0 0 449 1 0 0 0 0 end

All the advices are highly appreciated. Kindly, Natallia
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

22 May 2018, 11:20

So, here's what's going on. When you put the baseline tricyclic variable into the model, you have the following data configuration:

Code:

. table PersonWithn_Tricycl3 r_treat PersonWithn_Tricycl1 ------------------------------------ | PersonWithn_Tricycl1 and PersonWit | r_treat hn_Tricyc | ---- 0 --- ---- 1 --- l3 | 1 2 1 2 ----------+------------------------- 0 | 50 44 1 | 2 1 3 ------------------------------------

Notice that everybody who has a tricyclic at baseline also has one at base. That makes baseline tricyclic antidepressants a perfect predictor of the tricyclic outcome. But generalized linear models cannot work with perfect predictors because the maximum likelihood estimate of the coefficient for that variable would be infinite. The way Stata resolves this problem is to kick out both the perfect predictor (1.PersonWithn_Tricycl1) and the observations associated with it. So it now attempts to estimate the model using only the observations that I have shown in red. But, now, once again, we see that if r_treat = 1, the outcome is perfectly predicted (it is always 0, never 1). So now the variable 1.r_treat is omitted and the observations associated with that are also omitted. That leaves us with the 46 red italicized observations. But notice that now we have no variation on any of the predictors: all the surviving observations have r_treat = 2 and PersonWithn_Tricycl1 = 1, so you have no actual predictor variables left at all!

The best solution would be to get a much larger data set which doesn't have all of these gaps in the data. But that may not be feasible. So, if you are stuck with this data set, you need a different analysis that does not use maximum likelihood estimation. Joseph Coveney's -firthlogit-, available from SSC, can handle this.
2 likes
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#3

22 May 2018, 15:40

Dear Clyde, thanks so much for the explanation - I have a feeling that I also understand whats going on.
The challenge is how to handle this. Am I understand that it is technically possible to do it in STATA?
Got a suggestion to run McNemar test on the withingroup differences in baseline-outcome and then do z-test between the group on this relative differences. But still dont know how to program it in STATA. Do you have an opinion on this?
Thanks alot, Natallia
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

22 May 2018, 16:19

Got a suggestion to run McNemar test on the withingroup differences in baseline-outcome and then do z-test between the group on this relative differences.

I'm not familiar with this approach; I don't see offhand why the difference between two McNemar chi square statistics would, under the null hypothesis of no treatment effect, have a standard normal distribution (which is what would be needed for a z-test to be valid). And I don't see why, even if it did, the resulting difference would be a test of the hypothesis of zero treatment effect. Did the person who recommended that provide a reference? If so, I would like to see it so I can learn how this works.

As I said in #2, you can do this analysis using -firthlogit-. It's not part of official Stata, but it's a Stata program written by Joseph Coveney (a frequent participant in this forum and a fine statistician). You can get it by running -ssc install firthlogit- in the Stata command window. Then read -help firthlogit- for instructions for using it. The syntax is very similar to that of -logit-.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1129
#5

22 May 2018, 19:06

Originally posted by Clyde Schechter View Post

I'm not familiar with this approach; I don't see offhand why the difference between two McNemar chi square statistics would, under the null hypothesis of no treatment effect, have a standard normal distribution (which is what would be needed for a z-test to be valid). And I don't see why, even if it did, the resulting difference would be a test of the hypothesis of zero treatment effect. Did the person who recommended that provide a reference? If so, I would like to see it so I can learn how this works.

Clyde, I have no sources to cite, but I think the z-test described above is the z-test for a 2x2 table like this:

Code:

Change Negative Positive Tx group a b Ctrl group c d

The square of the z-value for this 2x2 table is Pearson's Chi-square. Within each row of the table, a Chi-square goodness of fit test (with equal expected frequencies) is equivalent to a McNemar Chi-square for that group.

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#6

22 May 2018, 21:49

Bruce, thanks for the clarification. Yes, what you show is familiar to me. It is rather different from what was described in #3, where it was proposed to do a z-test based on the difference between two McNemar chi squared statistics.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1129
#7

23 May 2018, 06:21

Clyde, I agree that what I suggested in #5 does not match the description in #3. But it was the only way I could think of to get a z-test that seemed somewhat relevant. As I suggested in the other thread about this analysis, if one really wanted to compare the two independent McNemar Chi-squares, their ratio (under H₀) is distributed as F(1,1).

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#8

23 May 2018, 15:26

Dear Clyde and Bruce, I am very grateful you are sharing your knowledge with me and I do apologize if I might annoy you with my perseverance on this McNemar test - I got an example how to calculate in hand -written on a peace of paper from a statistician I know.I dont have a reference. He is not using STATA and only know a solution "by hand calculation" - thats why I asked if any could help with transformation of this process to the STATA command.
I attach a photo of this calculation. I might not explain it correctly as I might misunderstand it. I do apologize.

I am grateful for the discussion and I can see that the other suggested approaches might be more relevant - as firthlogit- I downloaded it and will work on it next days. I do hope you will help with an advice if needed.

Thanks alot in advance! Kindly, Natallia

Attached Files

McNemar.docx (258.5 KB, 1 view)
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1129

24 May 2018, 13:06

Hello Natallia. Many people may not have looked at the Word document you attached to #8. See item 12.5 in the FAQ for more info on why not. ;-)

I did take a chance on opening it. Here is the image as a PNG file.

Click image for larger version

Name: Natallia_McNemar.png
Views: 1
Size: 627.5 KB
ID: 1445745

Here is some (not too elegant) code to carry out those computations, and to check that the z-tests for each group match the McNemar Chi-square results from -mcci-.

Code:

* https://newonlinecourses.science.psu.edu/stat504/node/96/
clear *
input a b c d
30 27 8 105
51 17 9  89
 .  . .   .
end

generate n = a+b+c+d
generate RD = (b-c)/n
generate SE = sqrt(b+c)/n
generate z = RD/SE
generate chi2 = z^2
list in 1/2
* Check that -mcci- gives the same Chi-squared values
quietly mcci 30 27 8 105 // Group 1
display "McNemar Chi-square for group 1 = " r(chi2)
quietly mcci 51 17 9  89 // Group 2
display "McNemar Chi-square for group 1 = " r(chi2)
* Compute the z-test on the difference
replace RD = RD[1]-RD[2] if _n==3
replace SE = sqrt(SE[1]^2 + SE[2]^2) if _n==3
replace  z = RD/SE if _n==3
generate p = normal(abs(z)*-1)*2
list RD SE z p in 3

And here are the results from the final -list- command.

Code:

. list RD SE z p in 3

     +-------------------------------------------+
     |       RD         SE          z          p |
     |-------------------------------------------|
  3. | .0635719   .0464177   1.369561   .1708238 |
     +-------------------------------------------+

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)

Comment

Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#10

25 May 2018, 09:07

Dear Bruce, thanks alot you trusted my atteched file. Do apollogize for giving up too soon while trying to upload the file the right way.

Thanks alot for transforming the foto into STATA commands. But did the analysis way make sense for you? Or -glm-, - firthlogit- will still be the way you choose to analyse the data like mine?
As a very new in statistic - namely a very much an amateur- I see alot of sense in your discussion and advices.

May I ask you and Clyde one more question about firthlogit?

Model looks to handle my perfect predictors:

Code:

firthlogit PersonWithn_Tricycl3 ib1.r_treat ib2.PersonWithn_Tricycl1 firthlogit, or

I was asked to report Risk Ratios istedet for Odds Ratios, but cannot get it working with -firthlogit-.
Find a

Code:

oddsrisk

but cannot figure out how to program in practice. Would you give me an advice how to get displayed RR with firthlogit?

Thanks alot for your advices.
Kindly, Natallia
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#11

25 May 2018, 11:47

I don't think -firthlogit- has any option for reporting RR instead of OR: it's in the nature of logistic regression, whether the usual approach or the penalized likelihood estimation used in -firthlogit-, that the results are odds ratios. Any attempt to translate those into risk ratios requires an estimation of the base rate, and then using some calculation to get a RR--but these are problematic because, at least the ones I have seen feel to account for uncertainty in the base rate estimate.

Joseph Coveney is the author of -firthlogit- and is an active member of this Forum. Perhaps he may have some advice for you here.

Off hand, I don't know of any procedure that will handle perfect prediction and also report risk ratios.
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#12

26 May 2018, 06:42

Dear Clyde, thank you for the advice. I try to post the question for Joseph Coveney.
Would you see any advantages/disadvantages in reporting RR in my case? Or OR?
Kindly, Natallia
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#13

26 May 2018, 10:09

I have always been amused by people who hold strong opinions on the merits of RR over OR. I do agree that OR's are harder for people to understand because they are a bit more complex computationally, but once somebody is used to them, I really don't see that either is preferable to the other in general. Each has its strengths and weaknesses. Actually, if any measure of effect is to be strongly preferred, in my opinion it would be the risk difference. So, no, I don't see any general advantages to RR over OR, nor the other way around.
Comment
Natallia Lapitskaya

Join Date: Nov 2017

Posts: 34
#14

28 May 2018, 00:58

Dear Clyde, thanks alot for your response. Kindly, Natallia
Comment

Number of persons receiving	Baseline Intervention	Outcome intervention	Baseline Control	Outcome Control
Tricyclic antidepressants	7	11	4	2
PPI	86	64	80	73

Announcement