Thought Experiment re: Logistic Regression

Lawson Ung

Join Date: Mar 2019

Posts: 3
#1

Thought Experiment re: Logistic Regression

28 Mar 2019, 12:34

Dear all,

I would like to ask your advice regarding logistic regression.

In my study, I am looking at the effect of the introduction of a clinical algorithm on an outcome of interest. The clinical algorithm is composed of three simple rules, all of which are binary (yes/no). If any of these rules are satisfied, this warrants treatment with treatment X. If no rules are satisfied, this warrants treatment with treatment Y.

In my dataset, I have cases from the year prior to the introduction of the algorithm, and the year after the introduction of the algorithm (variable = group in this case).
The primary outcome is visual loss, a binary outcome (yes/no).

The predictors are all binary: rule1, rule2, rule3, group, treatment.

Plugging these variables into STATA for all patients, I receive the following output. Because there are 8 possible combinations of the rules, I coded this as "conditions". I added a group*treatment interaction term as well. Does my data suggest that group had a significant impact on the outcome of interest? Or was there something intrinsically wrong about combining data from both datasets? The rationale behind combining all patients, even though there was a change in clinical practice, was that the results would appear to be conditioned by the "conditions" i.e. combinations of rules satisfied by the patient on presentation. Many thanks for all your help!

Last edited by Lawson Ung; 28 Mar 2019, 12:39.
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

28 Mar 2019, 13:29

Lawson,

First, the FAQ does request you present output in code delimiters (see my signature) rather than attachments.

There shouldn't be any real issue with combining data from before and after the policy change.

In general, it's best to rely on factor variable notation to denote your interactions, rather than computing them. You could have typed:

Code:

logistic outcome i.conditions i.group##i.treatment

You could even have used that notation to do the three-way interaction for your rules, although that might sequence the output in a manner you don't like.

In any case, there's potentially another issue. You indicated that if all 3 rules are met, you should be treated. If none of the rules are met, patients should not be treated. (What if only some of the rules are met - is it still no treatment?) Yet, you still retain a treatment variable, which seems to indicate that some people could have not received treatment in year 2 despite the algorithm recommending it. I'm not sure exactly how I'd specify the regression to account for this, but you could check via cross tabulation.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30177
#3

28 Mar 2019, 13:36

Please read the Forum FAQ for excellent advice about effective posting. There you will learn, among other things, that screenshots are discouraged, for several good reasons. In this particular case, your screenshot is too small to be readable, at least on my computer. So I can't comment on your output and I don't really know what kind of model you fit.

So my comments here are based only on your description of the problem.

First, by using treatment, group, and group#treatment interactions you are testing whether or not the effectiveness of the treatments is different between the pre- and post-implementation of the algorithm periods (presumably to their being used more often in the conditions where they work best). That's a perfectly reasonable thing to test, but I can also think of different questions that might be posed. So first I urge you to confirm whether this is, in fact, the question you want to answer.

It is perfectly sensible to combine the data from both the pre- and post- eras into a single data set, with a variable designating which is which, in order to research any kind of question about whether things changed between those two eras. In fact, I can't think of any way you could answer that kind of question without combining the data.

The issues raised really relate to the limited strength of a simple pre-post design. While the difference in outcomes you calculate may be caused by the implementation of the algorithm, they also might be due to something else that happened in that setting over the same time period. A stronger design would be to gather data from the same time periods in one or more comparable settings where no algorithm was introduced, and then compare the pre-post difference in the algorithm site with the pre-post difference in the others. (This is called a difference-in-differences design.) This would at least eliminate the concern that the observed distribution is attributable to some other thing that applied across the board to all settings during that time period.

Also, if you have a very large data set, it might make sense to also include interaction terms involving the conditions, if there is reason to think that implementation of the algorithm might affect the effectiveness of some treatments more than others.

Added: Crossed with #2.

Last edited by Clyde Schechter; 28 Mar 2019, 13:41.
Comment

Lawson Ung

Join Date: Mar 2019
Posts: 3

28 Mar 2019, 13:46

Dear Weiwen and Clyde,

Thank you so much, and my sincere apologies about the data output.

Let me clarify re: the rules. If any one of the rules are met, the patient is treated with treatment X (fortified antibiotics). If none of the rules are met, the patient is treated with Y (just a fluoroquinolone).

In the latter year, there were more patients who were treated with fortified antibiotics than fluoroquinolone because of the change in algorithm:

Code:

.-> group = 1

  treatment |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        159       58.67       58.67
          1 |        112       41.33      100.00
------------+-----------------------------------
      Total |        271      100.00

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> group = 2

  treatment |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        152       49.51       49.51
          1 |        155       50.49      100.00
------------+-----------------------------------
      Total |        307      100.00

I suppose what I wanted to see was whether there was an effect of the change in treatment in the latter group on outcomes. However, Dr. Clyde, you've just pointed out that other interactions involving the conditions and treatment would be useful to test. Your point about the effectiveness of treatments was well-taken, because really there is no scientific reason for the treatment to be more effective in the post-year compared to the pre-year. My main question relates to whether outcomes differed because of the implementation of the algorithm, which may have allowed for better selection of patients to treat. Now I will be doing some reading on your difference-in-differences study designs...

Code:

. logit outcome i.conditions i.group##i.treatment

Iteration 0: log likelihood = -360.83785
Iteration 1: log likelihood = -272.30755
Iteration 2: log likelihood = -263.2987
Iteration 3: log likelihood = -262.47979
Iteration 4: log likelihood = -262.4723
Iteration 5: log likelihood = -262.4723

Logistic regression Number of obs = 578
LR chi2(10) = 196.73
Prob > chi2 = 0.0000
Log likelihood = -262.4723 Pseudo R2 = 0.2726

---------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
conditions |
2 | 1.792889 .7572319 2.37 0.018 .308742 3.277037
3 | 2.697377 .6908734 3.90 0.000 1.34329 4.051464
4 | 2.413896 .7789793 3.10 0.002 .887125 3.940668
5 | 2.954431 .6394418 4.62 0.000 1.701148 4.207714
6 | 3.912926 .6494603 6.02 0.000 2.640008 5.185845
7 | 3.92501 .6547231 5.99 0.000 2.641776 5.208243
8 | 4.454918 .6376169 6.99 0.000 3.205212 5.704624
|
2.group | -.6984404 .3491415 -2.00 0.045 -1.382745 -.0141357
1.treatment | .2062242 .3249102 0.63 0.526 -.4305881 .8430365
|
group#treatment |
2 1 | .3009931 .4489968 0.67 0.503 -.5790244 1.181011
|
_cons | -3.677937 .5997998 -6.13 0.000 -4.853523 -2.502351
---------------------------------------------------------------------------------

Last edited by Lawson Ung; 28 Mar 2019, 13:49.

Announcement

Thought Experiment re: Logistic Regression

Comment

Comment

Comment