Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Odds ratio is too high

    Hello,

    I am writing my research on the determinants of bribing. I am using interaction variables in my logistic regression. My data is from a survey with oversampling in six regions.

    After reading in this forum that I could use [control = pweight] in my command, I decided not to use svy: set, since I need to report my pseudo R2.

    However, now I am not quite sure about the result, because I think the odds ratio of the interaction variable kis##health is too big. I am quite new with stata and statistic. Thus, I need your advice on this matter.
    Both my interaction variables are dummy: KIS = poor people = 1; health : poor perception on quality of health service = 1

    my output is

    Code:
     
    . logit brihealth kis##health urban age gender education employment business religius1 value $controls[pw = BOT_NAS_JBR_JTG], or robus
    > t nolog
    Logistic regression Number of obs = 1,373
    Wald chi2(11) = 81.68
    Prob > chi2 = 0.0000
    Log pseudolikelihood = -210.62002 Pseudo R2 = 0.1182
    Robust
    brihealth Odds Ratio Std. Err. z P>z [95% Conf. Interval]
    1.kis .6619124 .179435 -1.52 0.128 .3890916 1.126028
    1.health .6359187 .3877471 -0.74 0.458 .1924807 2.100951
    kis#health
    1 1 11.5063 8.774819 3.20 0.001 2.581076 51.29446
    urban .6217023 .1611696 -1.83 0.067 .3740397 1.033349
    age .9574096 .0090956 -4.58 0.000 .9397476 .9754036
    gender .3653943 .1218295 -3.02 0.003 .190088 .7023746
    education .8099218 .1092265 -1.56 0.118 .6217984 1.054961
    employment .6651625 .2377409 -1.14 0.254 .3301363 1.340177
    business 2.459535 .8240591 2.69 0.007 1.275442 4.742914
    religius1 1.008044 .1638453 0.05 0.961 .7330385 1.386219
    value .4081261 .0997453 -3.67 0.000 .2527913 .6589107
    _cons 2.904753 2.118633 1.46 0.144 .695457 12.13244
    Is there anything wrong with the data? When I do not use the controls, the odds ratio is 7 (which is still high).

    I'll try to use margin, dydx(*) but it does not show the interaction variable result.

    If there is nothing wrong: Is it right if I interpret it as: the odds ratio of poor people with poor perception on health service ten times more likely to bribe than not poor people with good perception on the health service. I find this sentence is wrong, but I don't know how to fix it.

    Here is the dataex of my research.

    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(brihealth kis health urban age gender education employment business religius1 value) double BOT_NAS_JBR_JTG
    0 1 0 0 54 0 0 1 0 4 1 .07809219214599998
    . 0 0 0 22 1 2 1 0 4 1 .07809219214599998
    . 1 0 0 64 0 1 1 0 4 0 .07809219214599998
    . . 0 0 52 1 0 1 0 4 1 .07809219214599998
    . 0 0 0 34 0 0 1 0 4 1 .07809219214599998
    0 0 0 0 32 1 0 1 0 4 1 .07809219214599998
    0 0 0 0 36 0 0 1 0 4 1 .07809219214599998
    . 1 0 0 59 1 0 1 0 4 1 .07809219214599998
    0 . 0 0 58 0 0 1 0 4 1 .07809219214599998
    . . 0 0 28 1 2 1 0 4 1 .07809219214599998
    0 0 0 0 19 0 2 0 0 . 1 .08627298325863374
    . . . 0 57 1 0 0 0 . . .08627298325863374
    0 . 0 0 50 0 0 1 0 4 1 .08627298325863374
    0 . 0 0 45 1 0 0 0 3 1 .08627298325863374
    0 0 0 0 37 0 1 1 0 4 1 .08627298325863374
    . . 0 0 35 1 0 0 0 . . .08627298325863374
    0 0 0 0 57 0 0 1 0 3 1 .08627298325863374
    . . 0 0 50 1 0 0 0 . . .08627298325863374
    0 0 0 0 53 0 0 1 0 3 1 .08627298325863374
    . . 0 0 38 1 0 0 0 4 1 .08627298325863374
    0 0 0 0 25 0 0 1 1 4 1 .09816057671983237
    . . 0 0 24 1 2 0 0 4 1 .09816057671983237
    . 0 0 0 67 0 0 0 0 4 1 .09816057671983237
    0 . 0 0 42 1 0 1 0 3 1 .09816057671983237
    0 0 0 0 45 0 0 1 0 4 1 .09816057671983237
    1 0 0 0 30 1 0 0 0 4 1 .09816057671983237
    . 1 0 0 55 0 2 1 0 4 1 .09816057671983237
    0 0 0 0 27 1 2 0 0 4 1 .09816057671983237
    0 1 0 0 44 0 1 1 0 4 1 .09816057671983237
    0 0 0 0 46 1 0 1 0 4 1 .09816057671983237
    0 0 0 0 36 0 1 1 0 3 1 .09816057671983237
    0 0 0 0 28 1 2 0 0 4 1 .09816057671983237
    0 0 0 0 29 0 1 1 0 3 1 .09816057671983237
    0 0 0 0 27 1 2 0 0 3 1 .09816057671983237
    0 . 0 0 59 0 2 1 0 4 1 .09816057671983237
    0 0 1 0 23 1 2 0 0 4 1 .09816057671983237
    . 0 0 0 58 0 2 1 1 4 1 .09816057671983237
    0 1 0 0 42 1 1 1 1 3 1 .07417418612448407
    . 0 0 0 60 0 0 1 0 4 1 .09816057671983237
    0 1 1 0 39 1 0 0 0 3 1 .09816057671983237
    . 1 0 0 48 0 1 1 0 4 1  .0868175767433184
    . 0 0 0 70 1 0 1 0 4 1  .0868175767433184
    . 1 0 0 47 0 0 1 0 4 1  .0868175767433184
    . 1 0 0 30 1 0 1 0 4 1  .0868175767433184
    . 1 0 0 35 0 1 1 0 4 1  .0868175767433184
    . 1 0 0 35 1 2 0 0 4 1  .0868175767433184
    . 0 0 0 45 0 2 1 0 4 1  .0868175767433184
    0 1 0 0 40 1 0 1 0 4 1  .0868175767433184
    0 0 0 0 48 0 2 1 0 3 1  .0868175767433184
    0 0 0 0 36 1 1 0 0 3 0 .07529966521257946
    0 1 0 0 31 0 2 1 0 4 1  .0868175767433184
    . 0 0 0 52 1 0 1 0 4 0  .0868175767433184
    . 0 0 0 23 0 2 1 0 3 1  .0868175767433184
    . 1 0 0 22 1 1 0 0 3 1  .0868175767433184
    . 0 0 0 60 0 0 1 0 4 1  .0868175767433184
    . 0 0 0 31 1 0 1 0 3 1  .0868175767433184
    . 0 0 0 34 0 1 1 0 4 1  .0868175767433184
    . 0 0 0 39 1 0 1 0 4 1  .0868175767433184
    0 0 0 0 67 0 0 0 0 4 1  .0868175767433184
    0 0 0 0 41 1 3 1 0 4 1  .0868175767433184
    0 0 0 0 40 0 1 1 0 4 1 .11662930745082345
    0 . 1 0 41 1 0 0 0 4 1 .11662930745082345
    0 . 1 0 50 0 0 1 0 4 1 .11662930745082345
    0 1 0 0 35 1 0 0 0 3 1 .11662930745082345
    0 0 0 0 19 0 1 1 0 4 1 .11662930745082345
    0 0 0 0 46 1 1 1 0 4 1 .11662930745082345
    0 0 0 0 21 0 2 1 0 4 1 .11662930745082345
    0 . 1 0 26 1 1 1 0 4 1 .11662930745082345
    0 1 0 0 50 0 0 1 0 4 1 .11662930745082345
    0 1 1 0 25 1 3 0 0 3 1 .11662930745082345
    1 . 0 0 35 0 2 0 0 4 1 .11662930745082345
    0 0 0 0 40 1 2 0 0 4 1 .10115633417167323
    . 0 0 0 48 0 0 1 0 4 1 .11662930745082345
    . . 0 0 47 1 1 0 0 4 1 .11662930745082345
    . 1 0 0 46 0 0 1 0 4 1 .11662930745082345
    . . 0 0 19 1 3 0 0 4 1 .11662930745082345
    . 0 0 0 46 0 0 1 0 4 1 .11662930745082345
    . 0 0 0 42 1 2 1 0 4 1 .11662930745082345
    . 0 0 0 29 0 3 1 0 4 1 .11662930745082345
    0 1 0 0 23 1 3 0 0 4 1 .11662930745082345
    . 0 0 1 52 0 2 1 0 4 1   .098587619521058
    . 0 0 1 44 1 1 0 0 4 1   .098587619521058
    . 0 0 1 37 0 1 1 0 4 1   .098587619521058
    0 1 0 1 35 1 0 0 0 4 1   .098587619521058
    . . 0 1 49 0 0 1 0 4 1   .098587619521058
    . 0 0 1 32 1 2 0 0 4 1   .098587619521058
    . 1 0 1 49 0 0 1 0 4 1   .098587619521058
    . . 0 1 19 1 1 0 0 4 1   .098587619521058
    0 1 0 1 19 0 2 0 0 4 1   .098587619521058
    . 1 0 1 50 1 0 0 0 4 1   .098587619521058
    . 1 0 1 46 0 2 1 0 4 0 .07209680654501288
    . 1 0 1 41 1 2 1 0 3 0 .07209680654501288
    . 0 0 1 46 0 2 1 0 4 1 .07209680654501288
    . 1 0 1 39 1 1 0 0 4 1 .07209680654501288
    . . 1 1 60 0 3 1 0 3 1 .07209680654501288
    . 0 0 1 34 1 0 1 1 3 1 .07209680654501288
    . 1 0 1 57 0 2 1 1 3 1 .05447932486087617
    . 0 0 1 51 1 3 1 0 3 1 .07209680654501288
    . 0 0 1 42 0 1 1 0 3 1 .07209680654501288
    . 0 0 1 46 1 3 1 0 4 1 .06253186969023976
    end
    Really appreciate your help on this matter. Thank you

  • #2
    Well, there is nothing wrong with your Stata code here. But the data are probably not suitable for this kind of analysis. In your example, there are only two cases with brihealth = 1. Now, I imagine your real data is larger and perhaps you have a more reasonable number of such cases. But with such cases being only about 4% of your data, you are likely to run into several problems unless your full data set is huge.

    In the example data, both of the brihealth = 1 cases are females, both have health = 0, and both have kis = 0. In addition, neither of them has education = 1 or 3. And all of them are over age 30. Consequently, Stata has no choice but to omit the variables health and kis (and their interaction) from the model when run on this sample, and to strip education down to categories 0 and 2 only. It also has to delete the corresponding observations. So you are left with essentially nothing to analyze. Now in a larger data set, you probably won't have so many "perfect predictions," but when there are few observations and many near-perfect predictions, the maximum-likelihood estimates that -logit- uses are known to be biased upward (in magnitude). This would explain why you are getting some surprisingly high estimates for your odds ratios.

    So the first thing I would do is check whether the data is in fact correct: is brihealth really such a rare outcome?

    Assuming the data is correct, I would probably estimate this model with penalized maximum likelihood, using Joseph Coveney's -firthlogit- command (available from SSC). This will produce less biased estimates than -logit-.

    Comment


    • #3
      If you need a goodness-of-fit statistic to use with svy: logit as an alternative to pseudo-R-squared, you might consider this approach:
      Archer, K.J., and Lemeshow, S. (2006). Goodness-of-Fit Test for a Logistic Regression Model Fitted Using Survey Sample Data. Stata Journal, 6(1), 97-105.
      Code:
      net describe st0099_1, from(http://www.stata-journal.com/software/sj10-2)
      David Radwin
      Senior Researcher, California Competes
      californiacompetes.org
      Pronouns: He/Him

      Comment


      • #4
        Dear Clyde and David,

        Thanks for your kind advice

        Clyde,

        The observations are quite big, its about 2000 (after weight). But I think you are right that the number of bribery is quite low, only 8% admitted it.

        I tried your suggestion and ran the data using firthlogit, and the odds number is still high 6

        Code:
         
        . firthlogit brihealth kis##health urban age gender education employment business religius1 value $controls[pw = BOT_NAS_JBR_JTG], or
        > robust nolog
        pweight not allowed
        r(101);
        . firthlogit brihealth kis##health urban age gender education employment business religius1 value, or robust nolog
        option robust not allowed
        r(198);
        . firthlogit brihealth kis##health urban age gender education employment business religius1 value, or nolog
        Number of obs = 1,373
        Wald chi2(11) = 59.23
        Penalized log likelihood = -390.58242 Prob > chi2 = 0.0000
        brihealth Odds Ratio Std. Err. z P>z [95% Conf. Interval]
        1.kis .7468839 .1550095 -1.41 0.160 .4972713 1.121793
        1.health .901699 .3694904 -0.25 0.801 .4038905 2.013073
        kis#health
        1 1 6.605186 3.693814 3.38 0.001 2.207334 19.76524
        urban .8494245 .1669886 -0.83 0.406 .5778124 1.248713
        age .9747884 .0079569 -3.13 0.002 .9593173 .990509
        gender .5594425 .124721 -2.61 0.009 .3614018 .8660053
        education .9123759 .0899793 -0.93 0.352 .7520168 1.10693
        employment .9600411 .2310079 -0.17 0.865 .5990613 1.538539
        business 1.944821 .4867584 2.66 0.008 1.190796 3.176305
        religius1 1.10531 .1480458 0.75 0.455 .850107 1.437125
        value .4514135 .0867516 -4.14 0.000 .3097367 .6578948
        _cons .6110942 .380623 -0.79 0.429 .180274 2.071492
        But then if I use firthlogit, I can't use the weight. Is there any way that I can use both of the commands?

        David,

        I used the goodness of fit test before, but when I try to find the rule of thumb for the F test, I can not find it. Do you have any info about this?
        I am new in statistic, I am sorry if the question is too basic.

        Comment


        • #5
          So, using the -firthlogit- exponentiated coefficients and ignoring the other predictors for the moment (setting them to unity, which seems about right with the possible exception of "value"), you'd get the following fourfold table.
          .
          KIS/health 0 1
          0 invlogit(ln(0.6110942) invlogit(ln(.6110942) + ln(.7468839))
          1 invlogit(ln(.6110942) + ln(.901699)) invlogit(ln(.6110942) + ln(.7468839) + ln(.901699) + ln(6.605186))
          .
          KIS/health 0 1
          0 0.4 0.3
          1 0.4 0.7
          .
          Is there a problem?

          If not then add back the other predictors and use your weights by checking what margins gives you for a fourfold table after the fitted model shown in #1.
          Code:
          margins kis#health
          .

          Comment


          • #6
            Joseph, thank you very much for your input. If I want to interpret the margin result can I say:

            one unit difference in the ratio of poverty and perception on the quality of public service to total observations is associated with a 70 percentage point difference in probability to bribe.

            I am not quite sure on how to write the interpretation, since both of the interaction is categorical.

            Many thanks for your help.

            Maria

            Comment

            Working...
            X