Odds ratio is too high

Patty Naibaho

Join Date: Nov 2019
Posts: 9

Odds ratio is too high

13 Nov 2019, 00:45

Hello,

I am writing my research on the determinants of bribing. I am using interaction variables in my logistic regression. My data is from a survey with oversampling in six regions.

After reading in this forum that I could use [control = pweight] in my command, I decided not to use svy: set, since I need to report my pseudo R2.

However, now I am not quite sure about the result, because I think the odds ratio of the interaction variable kis##health is too big. I am quite new with stata and statistic. Thus, I need your advice on this matter.
Both my interaction variables are dummy: KIS = poor people = 1; health : poor perception on quality of health service = 1

my output is

Code:

 . logit brihealth kis##health urban age
gender
education employment business religius1
value
$controls[pw
=
BOT_NAS_JBR_JTG],
or
robus

> t nolog

Logistic regression
Number of obs = 1,373


Wald chi2(11) = 81.68


Prob > chi2 = 0.0000

Log pseudolikelihood = -210.62002
Pseudo R2 = 0.1182




Robust

brihealth Odds Ratio Std. Err.
z
P>z [95% Conf. Interval]



1.kis .6619124 .179435
-1.52
0.128 .3890916 1.126028

1.health .6359187 .3877471
-0.74
0.458 .1924807 2.100951



kis#health

1 1 11.5063 8.774819
3.20
0.001 2.581076 51.29446



urban .6217023 .1611696
-1.83
0.067 .3740397 1.033349

age .9574096 .0090956
-4.58
0.000 .9397476 .9754036

gender .3653943 .1218295
-3.02
0.003 .190088 .7023746

education .8099218 .1092265
-1.56
0.118 .6217984 1.054961

employment .6651625 .2377409
-1.14
0.254 .3301363 1.340177

business 2.459535 .8240591
2.69
0.007 1.275442 4.742914

religius1 1.008044 .1638453
0.05
0.961 .7330385 1.386219

value .4081261 .0997453
-3.67
0.000 .2527913 .6589107

_cons 2.904753 2.118633
1.46
0.144 .695457 12.13244

Is there anything wrong with the data? When I do not use the controls, the odds ratio is 7 (which is still high).

I'll try to use margin, dydx(*) but it does not show the interaction variable result.

If there is nothing wrong: Is it right if I interpret it as: the odds ratio of poor people with poor perception on health service ten times more likely to bribe than not poor people with good perception on the health service. I find this sentence is wrong, but I don't know how to fix it.

Here is the dataex of my research.

Code:


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input float(brihealth kis health urban age gender education employment business religius1 value) double BOT_NAS_JBR_JTG
0 1 0 0 54 0 0 1 0 4 1 .07809219214599998
. 0 0 0 22 1 2 1 0 4 1 .07809219214599998
. 1 0 0 64 0 1 1 0 4 0 .07809219214599998
. . 0 0 52 1 0 1 0 4 1 .07809219214599998
. 0 0 0 34 0 0 1 0 4 1 .07809219214599998
0 0 0 0 32 1 0 1 0 4 1 .07809219214599998
0 0 0 0 36 0 0 1 0 4 1 .07809219214599998
. 1 0 0 59 1 0 1 0 4 1 .07809219214599998
0 . 0 0 58 0 0 1 0 4 1 .07809219214599998
. . 0 0 28 1 2 1 0 4 1 .07809219214599998
0 0 0 0 19 0 2 0 0 . 1 .08627298325863374
. . . 0 57 1 0 0 0 . . .08627298325863374
0 . 0 0 50 0 0 1 0 4 1 .08627298325863374
0 . 0 0 45 1 0 0 0 3 1 .08627298325863374
0 0 0 0 37 0 1 1 0 4 1 .08627298325863374
. . 0 0 35 1 0 0 0 . . .08627298325863374
0 0 0 0 57 0 0 1 0 3 1 .08627298325863374
. . 0 0 50 1 0 0 0 . . .08627298325863374
0 0 0 0 53 0 0 1 0 3 1 .08627298325863374
. . 0 0 38 1 0 0 0 4 1 .08627298325863374
0 0 0 0 25 0 0 1 1 4 1 .09816057671983237
. . 0 0 24 1 2 0 0 4 1 .09816057671983237
. 0 0 0 67 0 0 0 0 4 1 .09816057671983237
0 . 0 0 42 1 0 1 0 3 1 .09816057671983237
0 0 0 0 45 0 0 1 0 4 1 .09816057671983237
1 0 0 0 30 1 0 0 0 4 1 .09816057671983237
. 1 0 0 55 0 2 1 0 4 1 .09816057671983237
0 0 0 0 27 1 2 0 0 4 1 .09816057671983237
0 1 0 0 44 0 1 1 0 4 1 .09816057671983237
0 0 0 0 46 1 0 1 0 4 1 .09816057671983237
0 0 0 0 36 0 1 1 0 3 1 .09816057671983237
0 0 0 0 28 1 2 0 0 4 1 .09816057671983237
0 0 0 0 29 0 1 1 0 3 1 .09816057671983237
0 0 0 0 27 1 2 0 0 3 1 .09816057671983237
0 . 0 0 59 0 2 1 0 4 1 .09816057671983237
0 0 1 0 23 1 2 0 0 4 1 .09816057671983237
. 0 0 0 58 0 2 1 1 4 1 .09816057671983237
0 1 0 0 42 1 1 1 1 3 1 .07417418612448407
. 0 0 0 60 0 0 1 0 4 1 .09816057671983237
0 1 1 0 39 1 0 0 0 3 1 .09816057671983237
. 1 0 0 48 0 1 1 0 4 1  .0868175767433184
. 0 0 0 70 1 0 1 0 4 1  .0868175767433184
. 1 0 0 47 0 0 1 0 4 1  .0868175767433184
. 1 0 0 30 1 0 1 0 4 1  .0868175767433184
. 1 0 0 35 0 1 1 0 4 1  .0868175767433184
. 1 0 0 35 1 2 0 0 4 1  .0868175767433184
. 0 0 0 45 0 2 1 0 4 1  .0868175767433184
0 1 0 0 40 1 0 1 0 4 1  .0868175767433184
0 0 0 0 48 0 2 1 0 3 1  .0868175767433184
0 0 0 0 36 1 1 0 0 3 0 .07529966521257946
0 1 0 0 31 0 2 1 0 4 1  .0868175767433184
. 0 0 0 52 1 0 1 0 4 0  .0868175767433184
. 0 0 0 23 0 2 1 0 3 1  .0868175767433184
. 1 0 0 22 1 1 0 0 3 1  .0868175767433184
. 0 0 0 60 0 0 1 0 4 1  .0868175767433184
. 0 0 0 31 1 0 1 0 3 1  .0868175767433184
. 0 0 0 34 0 1 1 0 4 1  .0868175767433184
. 0 0 0 39 1 0 1 0 4 1  .0868175767433184
0 0 0 0 67 0 0 0 0 4 1  .0868175767433184
0 0 0 0 41 1 3 1 0 4 1  .0868175767433184
0 0 0 0 40 0 1 1 0 4 1 .11662930745082345
0 . 1 0 41 1 0 0 0 4 1 .11662930745082345
0 . 1 0 50 0 0 1 0 4 1 .11662930745082345
0 1 0 0 35 1 0 0 0 3 1 .11662930745082345
0 0 0 0 19 0 1 1 0 4 1 .11662930745082345
0 0 0 0 46 1 1 1 0 4 1 .11662930745082345
0 0 0 0 21 0 2 1 0 4 1 .11662930745082345
0 . 1 0 26 1 1 1 0 4 1 .11662930745082345
0 1 0 0 50 0 0 1 0 4 1 .11662930745082345
0 1 1 0 25 1 3 0 0 3 1 .11662930745082345
1 . 0 0 35 0 2 0 0 4 1 .11662930745082345
0 0 0 0 40 1 2 0 0 4 1 .10115633417167323
. 0 0 0 48 0 0 1 0 4 1 .11662930745082345
. . 0 0 47 1 1 0 0 4 1 .11662930745082345
. 1 0 0 46 0 0 1 0 4 1 .11662930745082345
. . 0 0 19 1 3 0 0 4 1 .11662930745082345
. 0 0 0 46 0 0 1 0 4 1 .11662930745082345
. 0 0 0 42 1 2 1 0 4 1 .11662930745082345
. 0 0 0 29 0 3 1 0 4 1 .11662930745082345
0 1 0 0 23 1 3 0 0 4 1 .11662930745082345
. 0 0 1 52 0 2 1 0 4 1   .098587619521058
. 0 0 1 44 1 1 0 0 4 1   .098587619521058
. 0 0 1 37 0 1 1 0 4 1   .098587619521058
0 1 0 1 35 1 0 0 0 4 1   .098587619521058
. . 0 1 49 0 0 1 0 4 1   .098587619521058
. 0 0 1 32 1 2 0 0 4 1   .098587619521058
. 1 0 1 49 0 0 1 0 4 1   .098587619521058
. . 0 1 19 1 1 0 0 4 1   .098587619521058
0 1 0 1 19 0 2 0 0 4 1   .098587619521058
. 1 0 1 50 1 0 0 0 4 1   .098587619521058
. 1 0 1 46 0 2 1 0 4 0 .07209680654501288
. 1 0 1 41 1 2 1 0 3 0 .07209680654501288
. 0 0 1 46 0 2 1 0 4 1 .07209680654501288
. 1 0 1 39 1 1 0 0 4 1 .07209680654501288
. . 1 1 60 0 3 1 0 3 1 .07209680654501288
. 0 0 1 34 1 0 1 1 3 1 .07209680654501288
. 1 0 1 57 0 2 1 1 3 1 .05447932486087617
. 0 0 1 51 1 3 1 0 3 1 .07209680654501288
. 0 0 1 42 0 1 1 0 3 1 .07209680654501288
. 0 0 1 46 1 3 1 0 4 1 .06253186969023976
end

Really appreciate your help on this matter. Thank you

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#2

13 Nov 2019, 08:50

Well, there is nothing wrong with your Stata code here. But the data are probably not suitable for this kind of analysis. In your example, there are only two cases with brihealth = 1. Now, I imagine your real data is larger and perhaps you have a more reasonable number of such cases. But with such cases being only about 4% of your data, you are likely to run into several problems unless your full data set is huge.

In the example data, both of the brihealth = 1 cases are females, both have health = 0, and both have kis = 0. In addition, neither of them has education = 1 or 3. And all of them are over age 30. Consequently, Stata has no choice but to omit the variables health and kis (and their interaction) from the model when run on this sample, and to strip education down to categories 0 and 2 only. It also has to delete the corresponding observations. So you are left with essentially nothing to analyze. Now in a larger data set, you probably won't have so many "perfect predictions," but when there are few observations and many near-perfect predictions, the maximum-likelihood estimates that -logit- uses are known to be biased upward (in magnitude). This would explain why you are getting some surprisingly high estimates for your odds ratios.

So the first thing I would do is check whether the data is in fact correct: is brihealth really such a rare outcome?

Assuming the data is correct, I would probably estimate this model with penalized maximum likelihood, using Joseph Coveney's -firthlogit- command (available from SSC). This will produce less biased estimates than -logit-.
2 likes
Comment
David Radwin

Join Date: Mar 2014

Posts: 369
#3

13 Nov 2019, 17:58

If you need a goodness-of-fit statistic to use with svy: logit as an alternative to pseudo-R-squared, you might consider this approach:
Archer, K.J., and Lemeshow, S. (2006). Goodness-of-Fit Test for a Logistic Regression Model Fitted Using Survey Sample Data. Stata Journal, 6(1), 97-105.

Code:

net describe st0099_1, from(http://www.stata-journal.com/software/sj10-2)

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment

Patty Naibaho

Join Date: Nov 2019
Posts: 9

14 Nov 2019, 02:07

Dear Clyde and David,

Thanks for your kind advice

Clyde,

The observations are quite big, its about 2000 (after weight). But I think you are right that the number of bribery is quite low, only 8% admitted it.

I tried your suggestion and ran the data using firthlogit, and the odds number is still high 6

Code:

 . firthlogit brihealth kis##health urban age gender education employment business
religius1
value $controls[pw = BOT_NAS_JBR_JTG],
or

> robust nolog

pweight not allowed

r(101);

. firthlogit brihealth kis##health urban age gender education employment business
religius1
value, or robust nolog

option robust not allowed

r(198);

. firthlogit brihealth kis##health urban age gender education employment business
religius1
value, or nolog

Number of obs = 1,373

Wald chi2(11) = 59.23

Penalized log likelihood = -390.58242 Prob > chi2 = 0.0000



brihealth Odds Ratio Std. Err. z P>z [95% Conf. Interval]

1.kis .7468839 .1550095 -1.41 0.160 .4972713 1.121793

1.health .901699 .3694904 -0.25 0.801 .4038905 2.013073



kis#health

1 1 6.605186 3.693814 3.38 0.001 2.207334 19.76524



urban .8494245 .1669886 -0.83 0.406 .5778124 1.248713

age .9747884 .0079569 -3.13 0.002 .9593173 .990509

gender .5594425 .124721 -2.61 0.009 .3614018 .8660053

education .9123759 .0899793 -0.93 0.352 .7520168 1.10693

employment .9600411 .2310079 -0.17 0.865 .5990613 1.538539

business 1.944821 .4867584 2.66 0.008 1.190796 3.176305

religius1 1.10531 .1480458 0.75 0.455 .850107 1.437125

value .4514135 .0867516 -4.14 0.000 .3097367 .6578948

_cons .6110942 .380623 -0.79 0.429 .180274 2.071492

But then if I use firthlogit, I can't use the weight. Is there any way that I can use both of the commands?

David,

I used the goodness of fit test before, but when I try to find the rule of thumb for the F test, I can not find it. Do you have any info about this?
I am new in statistic, I am sorry if the question is too basic.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#5

14 Nov 2019, 02:37

So, using the -firthlogit- exponentiated coefficients and ignoring the other predictors for the moment (setting them to unity, which seems about right with the possible exception of "value"), you'd get the following fourfold table.
.

KIS/health 0 1

0 invlogit(ln(0.6110942) invlogit(ln(.6110942) + ln(.7468839))

1 invlogit(ln(.6110942) + ln(.901699)) invlogit(ln(.6110942) + ln(.7468839) + ln(.901699) + ln(6.605186))

.

KIS/health 0 1

0 0.4 0.3

1 0.4 0.7

.
Is there a problem?

If not then add back the other predictors and use your weights by checking what margins gives you for a fourfold table after the fitted model shown in #1.

Code:

margins kis#health

.
1 like
Comment
Patty Naibaho

Join Date: Nov 2019

Posts: 9
#6

25 Nov 2019, 20:12

Joseph, thank you very much for your input. If I want to interpret the margin result can I say:

one unit difference in the ratio of poverty and perception on the quality of public service to total observations is associated with a 70 percentage point difference in probability to bribe.

I am not quite sure on how to write the interpretation, since both of the interaction is categorical.

Many thanks for your help.

Maria
Comment

KIS/health	0	1
0	invlogit(ln(0.6110942)	invlogit(ln(.6110942) + ln(.7468839))
1	invlogit(ln(.6110942) + ln(.901699))	invlogit(ln(.6110942) + ln(.7468839) + ln(.901699) + ln(6.605186))

KIS/health	0	1
0	0.4	0.3
1	0.4	0.7

Announcement

Odds ratio is too high

Comment

Comment

Comment

Comment

Comment