Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multilevel logistic regression

    Hi
    I am running an multilevel logistic regression on dhs data of ukraine.I amnot sure about the result.I need an expert opnion ,some one who could assure that I did it right..I want to investigate the factors of tobacco smoking.For random intercept I used region.Which is categorical variable(North,South,east,west).For level1 model my variables are(age,gender,work or not,highest education level,marital status).I run random model first without introducing level1 variables.I used xtmelogit smoke || region:


    Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

    region: Identity
    var(_cons) .0660989 .0433564 .0182752 .23907

    LR test vs. logistic model: chibar2(01) = 122.11 Prob >= chibar2 = 0.0000


    Than I use patient risk score by running logistic regression on all predictor variables.Later used that smokerisk variable for fixed effect model.

    Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

    region: Identity
    var(_cons) .0660989 .0433564 .0182752 .23907

    LR test vs. logistic model: chibar2(01) = 122.11 Prob >= chibar2 = 0.0000

    . xi:logistic smoke age i.ms i.highedu i.wealthi i.cwork i.gender
    i.ms _Ims_0-5 (naturally coded; _Ims_0 omitted)
    i.highedu _Ihighedu_0-3 (naturally coded; _Ihighedu_0 omitted)
    i.wealthi _Iwealthi_1-5 (naturally coded; _Iwealthi_1 omitted)
    i.cwork _Icwork_0-1 (naturally coded; _Icwork_0 omitted)
    i.gender _Igender_0-1 (naturally coded; _Igender_0 omitted)

    Logistic regression Number of obs = 12,210
    LR chi2(15) = 2764.91
    Prob > chi2 = 0.0000
    Log likelihood = -4906.05 Pseudo R2 = 0.2198


    smoke Odds Ratio Std. Err. z P>z [95% Conf. Interval]

    age 1.004373 .0034311 1.28 0.201 .997671 1.011121
    _Ims_1 1.314048 .1226059 2.93 0.003 1.094437 1.577727
    _Ims_2 3.14551 .4351557 8.28 0.000 2.39847 4.125227
    _Ims_3 1.641184 .289318 2.81 0.005 1.161722 2.318528
    _Ims_4 3.149873 .3790349 9.53 0.000 2.488084 3.987686
    _Ims_5 3.116891 .5685119 6.23 0.000 2.180043 4.45634
    _Ihighedu_1 3.54832 3.780326 1.19 0.235 .4397094 28.63386
    _Ihighedu_2 .4787269 .4745827 -0.74 0.457 .0685891 3.341341
    _Ihighedu_3 .3140836 .3114419 -1.17 0.243 .0449783 2.193247
    _Iwealthi_2 .8653848 .070379 -1.78 0.075 .7378766 1.014927
    _Iwealthi_3 1.242929 .1050261 2.57 0.010 1.053224 1.466804
    _Iwealthi_4 1.108127 .1003171 1.13 0.257 .9279647 1.323267
    _Iwealthi_5 1.047131 .093587 0.52 0.606 .8788707 1.247604
    _Icwork_1 1.589846 .1090885 6.76 0.000 1.38979 1.818699
    _Igender_1 12.57861 .7383958 43.13 0.000 11.21153 14.11238
    _cons .1005307 .1002092 -2.30 0.021 .0142501 .7092155

    Note: _cons estimates baseline odds.

    . predict smokerisk,xb

    . xtmelogit smoke smokerisk region:,var

    Refining starting values:

    Iteration 0: log likelihood = -4876.2025 (not concave)
    Iteration 1: log likelihood = -4871.9337
    Iteration 2: log likelihood = -4871.5384

    Performing gradient-based optimization:

    Iteration 0: log likelihood = -4871.5384
    Iteration 1: log likelihood = -4871.2656
    Iteration 2: log likelihood = -4871.2629
    Iteration 3: log likelihood = -4871.2629

    Mixed-effects logistic regression Number of obs = 12,210
    Group variable: region Number of groups = 5

    Obs per group:
    min = 1,889
    avg = 2,442.0
    max = 3,145

    Integration points = 7 Wald chi2(1) = 2230.59
    Log likelihood = -4871.2629 Prob > chi2 = 0.0000


    smoke Coef. Std. Err. z P>z [95% Conf. Interval]

    smokerisk .9993888 .0211604 47.23 0.000 .9579151 1.040862
    _cons -.0028393 .1065424 -0.03 0.979 -.2116585 .2059799



    Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

    region: Identity
    var(_cons) .0509872 .0341417 .0137241 .1894251

    LR test vs. logistic model: chibar2(01) = 69.57 Prob >= chibar2 = 0.0000

    so the variance droped from .0660989 to .0509872 .
    This actually means .22862256 or almost 23 percent regional variance can be explained by personal characteristics.
    When I try to run this model with cluster number result get messed up instead of decrease in variance it increases.I am confused kindly advice.

  • #2
    When I try to run this model with cluster number
    What does this mean? Please show the code and the output.

    I will also make the general comment that while it is not strictly speaking wrong to use a 4-level variable, region, as a random intercept, it is dicey. That is a very small sample of region-space, and the variance component estimate at that level will be pretty meaningless as a result. Moreover, from the names of the regions, north, south, east, and west, it sounds like the regions are in fact all of the regions in the country. I think it would be more appropriate to just do this as a simple logistic regression with region as one of the predictor variables, rather than doing a multi-level model in this case.

    Finally, get rid of the -xi:- prefixes. It doesn't actually do anything here except make your output harder to read. Keep all the i.'s, but eliminate -xi:- itself. Then you are using factor-variable notation (see -help fvvarlist-), which is much better and also opens up the possibility of using the -margins- command later to help interpret your findings.

    Comment


    • #3
      hi
      Thank you so much for the response and suggestion:if you would help me understand this output.I tried to run multilevel model with cluster number and i get following results.
      xtmelogit smoke cluster_num:,var

      Refining starting values:

      Iteration 0: log likelihood = -6173.3441
      Iteration 1: log likelihood = -6139.9731
      Iteration 2: log likelihood = -6138.9532

      Performing gradient-based optimization:

      Iteration 0: log likelihood = -6138.9532
      Iteration 1: log likelihood = -6138.9519
      Iteration 2: log likelihood = -6138.9519

      Mixed-effects logistic regression Number of obs = 12,210
      Group variable: cluster_num Number of groups = 500

      Obs per group:
      min = 1
      avg = 24.4
      max = 71

      Integration points = 7 Wald chi2(0) = .
      Log likelihood = -6138.9519 Prob > chi2 = .


      smoke Coef. Std. Err. z P>z [95% Conf. Interval]

      _cons -1.382858 .0384116 -36.00 0.000 -1.458144 -1.307573



      Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

      cluster_num: Identity
      var(_cons) .4145091 .0477905 .3306705 .5196042

      LR test vs. logistic model: chibar2(01) = 299.10 Prob >= chibar2 = 0.0000

      . logistic smoke age i.wealthi i.gender i.cwork i.highedu

      Logistic regression Number of obs = 12,210
      LR chi2(10) = 2575.18
      Prob > chi2 = 0.0000
      Log likelihood = -5000.9126 Pseudo R2 = 0.2048


      smoke Odds Ratio Std. Err. z P>z [95% Conf. Interval]

      age 1.011971 .0029652 4.06 0.000 1.006176 1.017799

      wealthi
      Poorer .8028593 .0642915 -2.74 0.006 .6862412 .9392951
      Middle 1.177615 .0979283 1.97 0.049 1.000504 1.386078
      Richer 1.079127 .0961087 0.86 0.393 .9062822 1.284938
      Richest .975636 .0858883 -0.28 0.779 .8210202 1.159369

      gender
      Male 10.80233 .5715927 44.97 0.000 9.738161 11.98278

      cwork
      Yes 1.741987 .1161309 8.33 0.000 1.528618 1.985139

      highedu
      Primary 5.171662 5.526414 1.54 0.124 .6368544 41.99718
      Secondary .6259353 .6236485 -0.47 0.638 .088805 4.411855
      Higher .4003688 .3989989 -0.92 0.358 .0567768 2.82325

      _cons .0934087 .0936148 -2.37 0.018 .0131011 .6659882

      Note: _cons estimates baseline odds.

      . predict smokerisk, xb

      . xtmelogit smoke smokerisk cluster_num:,var

      Refining starting values:

      Iteration 0: log likelihood = -4836.0464
      Iteration 1: log likelihood = -4817.1706
      Iteration 2: log likelihood = -4813.0308

      Performing gradient-based optimization:

      Iteration 0: log likelihood = -4813.0308
      Iteration 1: log likelihood = -4813.0239
      Iteration 2: log likelihood = -4813.0239

      Mixed-effects logistic regression Number of obs = 12,210
      Group variable: cluster_num Number of groups = 500

      Obs per group:
      min = 1
      avg = 24.4
      max = 71

      Integration points = 7 Wald chi2(1) = 2087.12
      Log likelihood = -4813.0239 Prob > chi2 = 0.0000


      smoke Coef. Std. Err. z P>z [95% Conf. Interval]

      smokerisk 1.10148 .0241103 45.68 0.000 1.054225 1.148735
      _cons .0231843 .0526071 0.44 0.659 -.0799237 .1262924



      Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

      cluster_num: Identity
      var(_cons) .675071 .073819 .544842 .8364276

      LR test vs. logistic model: chibar2(01) = 375.78 Prob >= chibar2 = 0.0000
      The variance increased to 0.675071 I dont understand this.I think I should take your advice but kindly for learning I want to know how to interpret this result.Or this is just meaningless?
      Thank you so much for the help.

      Comment


      • #4
        in #1 you are trying to calculate proportions of variance attributable to regional and personal characteristics in some way that I don't understand. These are logistic regressions, multi-level at that, and you are applying a logic that really only works for linear regressions with R2. The logistic model is more complicated and there is no statistic that is truly analogous to a linear regression's R2, nor can the estimated variance components be compared in the way you are using them. In particular, you cannot in a logistic regression partition the total outcome variance into a sum of the variance of the predicted values (xb) and the variance of the residuals (or residuals + higher level residuals). The sum of the residual variance(s) and the variance of xb's will change from one model to the next. Remember, in a logistic model, there is no orthogonality between residuals (at any level) and the predicted values, so we do not have a Pythagorean situation the way we do in linear regression.

        Basically, there is no reason to expect any particular magnitude or direction of change in the variance components at any level based on the addition of new predictors to the model.

        Comment


        • #5
          okay thank you so much.

          Comment


          • #6

            Mixed-effects logistic regression Number of obs = 12,210
            Group variable: cluster_num Number of groups = 500

            Obs per group:
            min = 1
            avg = 24.4
            max = 71

            Integration method: mvaghermite Integration pts. = 7

            Wald chi2(3) = 2057.40
            Log likelihood = -4875.2058 Prob > chi2 = 0.0000

            smoke Coef. Std. Err. z P>z [95% Conf. Interval]

            cwork
            Yes .4715606 .0706877 6.67 0.000 .3330152 .610106

            gender
            Male 2.636477 .0595999 44.24 0.000 2.519663 2.753291
            age .0138215 .003147 4.39 0.000 .0076535 .0199895
            _cons -3.276884 .1278783 25.63 0.000 -3.527521 -3.026247

            cluster_num
            var(_cons) .6691699 .0731469 .5401217 .829051

            LR test vs. logistic model: chibar2(01) = 374.55 Prob >= chibar2 = 0.0000

            .

            Can you please give me suggestion about the above results?

            Comment


            • #7
              The best suggestion I can give you is to repost it placing it between code delimiters so it is easily readable. (See Forum FAQ #12 for instructions on using code deilmiters.)

              But that isn't what you had in mind, I suppose. What kind of suggestion are you looking for? Suggestion about what?

              Comment


              • #8
                The model

                Code:
                  melogit smoke age i.cwork i.gender || cluster_num:
                my data
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str15 id str3 ccode long cluster_num byte(age smoke ms cwork) float gender
                "       16140  2" "UA5" 16 30 0 1 1 0
                "19142  2"        "UA5" 19 37 1 4 1 1
                "        3 75  2" "UA5"  3 31 0 1 1 0
                "       25161  2" "UA5" 25 47 0 1 1 0
                "13140  2"        "UA5" 13 20 1 0 0 1
                "       20133  2" "UA5" 20 34 0 1 0 0
                "       19121  3" "UA5" 19 40 0 1 0 0
                "14 71  4"        "UA5" 14 22 0 0 1 1
                "       24 73  2" "UA5" 24 46 0 1 1 0
                "5 37  2"         "UA5"  5 26 1 4 1 1
                "       13 51  2" "UA5" 13 27 0 2 1 0
                "       23 59  1" "UA5" 23 46 0 4 0 0
                "17257  3"        "UA5" 17 26 1 0 1 1
                "        4 33  1" "UA5"  4 48 0 1 1 0
                "20 34  2"        "UA5" 20 21 0 0 1 1
                "       12 13  2" "UA5" 12 31 1 2 1 0
                "17 54  2"        "UA5" 17 49 0 5 1 1
                "       16 59  1" "UA5" 16 49 0 1 1 0
                "14 51  2"        "UA5" 14 46 1 4 0 1
                "9 86  2"         "UA5"  9 22 0 1 1 1
                "       15100  2" "UA5" 15 42 0 1 1 0
                "        2 52  1" "UA5"  2 49 0 3 1 0
                "       14136  2" "UA5" 14 44 0 5 0 0
                "        8120  2" "UA5"  8 30 0 1 1 0
                "       17211  2" "UA5" 17 48 1 1 1 0
                "7 91  2"         "UA5"  7 22 1 0 1 1
                "        8120  2" "UA5"  8 30 0 1 1 0
                "       23128  1" "UA5" 23 48 1 2 1 0
                "       11 71  2" "UA5" 11 32 1 3 0 0
                "       16120  3" "UA5" 16 42 0 1 1 0
                "        3 40  1" "UA5"  3 44 1 1 1 0
                "       12 77  2" "UA5" 12 41 0 3 1 0
                "       20 39  1" "UA5" 20 47 0 1 0 0
                "       21139  1" "UA5" 21 30 1 2 0 0
                "       19 48  2" "UA5" 19 39 0 1 0 0
                "        5 21  2" "UA5"  5 44 0 1 1 0
                "       13121  2" "UA5" 13 30 0 1 0 0
                "        9 76  2" "UA5"  9 49 0 1 1 0
                "5 53  1"         "UA5"  5 42 1 1 1 1
                "       24183  2" "UA5" 24 48 0 1 0 0
                "        4 82  1" "UA5"  4 26 0 1 1 0
                "       19 48  2" "UA5" 19 39 0 1 0 0
                "        9 80  2" "UA5"  9 25 0 1 0 0
                "9 53  3"         "UA5"  9 21 1 1 1 1
                "       12100  2" "UA5" 12 37 0 4 1 0
                "       20133  2" "UA5" 20 34 0 1 0 0
                "       11166  6" "UA5" 11 24 0 1 1 0
                "       21139  1" "UA5" 21 30 1 2 0 0
                "       13  4  3" "UA5" 13 45 0 4 1 0
                "       21129  1" "UA5" 21 24 1 2 0 0
                "       23 42  2" "UA5" 23 27 1 1 0 0
                "       20105  2" "UA5" 20 37 0 1 0 0
                "15 73  1"        "UA5" 15 45 1 1 1 1
                "       19100  1" "UA5" 19 49 0 1 0 0
                "       10126  2" "UA5" 10 26 0 1 1 0
                "6 29  3"         "UA5"  6 17 0 0 0 1
                "        6 45  1" "UA5"  6 41 1 3 0 0
                "       21 40  2" "UA5" 21 43 0 1 1 0
                "        6 99  2" "UA5"  6 37 0 1 1 0
                "4106  2"         "UA5"  4 24 1 0 1 1
                "       25 12  2" "UA5" 25 37 0 1 1 0
                "16 79  1"        "UA5" 16 41 0 5 1 1
                "        4  8  3" "UA5"  4 40 0 1 0 0
                "       18 67  2" "UA5" 18 28 0 1 0 0
                "       21129  1" "UA5" 21 24 1 2 0 0
                "        2118  3" "UA5"  2 27 0 1 1 0
                "       10104  2" "UA5" 10 30 0 1 1 0
                "        6 88  1" "UA5"  6 34 0 1 0 0
                "       26  8  3" "UA5" 26 28 1 1 0 0
                "       12 82  1" "UA5" 12 34 0 1 1 0
                "       19 59  1" "UA5" 19 35 0 4 0 0
                "9 93  1"         "UA5"  9 38 1 1 1 1
                "15 80  3"        "UA5" 15 15 1 0 0 1
                "       17100  1" "UA5" 17 49 0 1 1 0
                "       16 13  1" "UA5" 16 42 0 1 1 0
                "       24175  3" "UA5" 24 21 0 1 0 0
                "        8120  2" "UA5"  8 30 0 1 1 0
                "13112  2"        "UA5" 13 19 1 0 1 1
                "       15  6  2" "UA5" 15 29 0 2 0 0
                "3 46  6"         "UA5"  3 17 1 0 0 1
                "       10109  3" "UA5" 10 28 1 0 1 0
                "       16110  1" "UA5" 16 40 1 2 1 0
                "        6 45  1" "UA5"  6 41 1 3 0 0
                "       23  3  1" "UA5" 23 45 0 1 1 0
                "       20 39  1" "UA5" 20 47 0 1 0 0
                "25 12  3"        "UA5" 25 38 1 1 1 1
                "25148  1"        "UA5" 25 41 0 0 0 1
                "16 69  1"        "UA5" 16 35 1 0 0 1
                "       25 31  2" "UA5" 25 19 0 1 0 0
                "12 82  2"        "UA5" 12 30 1 1 1 1
                "       19121  3" "UA5" 19 40 0 1 0 0
                "       18152  2" "UA5" 18 34 0 1 1 0
                "       26  8  3" "UA5" 26 28 1 1 0 0
                "       24132  2" "UA5" 24 28 1 1 0 0
                "       11 83  2" "UA5" 11 32 0 1 0 0
                "       26 95  1" "UA5" 26 28 0 4 1 0
                "       25 31  2" "UA5" 25 19 0 1 0 0
                "       16 24  1" "UA5" 16 45 0 1 0 0
                "19121  2"        "UA5" 19 46 1 1 0 1
                "       19  7  2" "UA5" 19 46 0 1 1 0
                end
                label values smoke MV463A
                label def MV463A 0 "No", modify
                label def MV463A 1 "Yes", modify
                label values ms MV501
                label def MV501 0 "Never married", modify
                label def MV501 1 "Married", modify
                label def MV501 2 "Living together", modify
                label def MV501 3 "Widowed", modify
                label def MV501 4 "Divorced", modify
                label def MV501 5 "Not living together", modify
                label values cwork MV714
                label def MV714 0 "No", modify
                label def MV714 1 "Yes", modify
                label values gender gender
                label def gender 0 "Female", modify
                label def gender 1 "Male", modify
                I want your opnion about the results.Is this multilevel model better than simple logistic regression?And I want to confirm the interpretation of the results.According to the results there is almost 67% variance between clusters.Is this how the interpretation of Cluster_num (Var(cons_)) done?

                Comment


                • #9
                  According to the results there is almost 67% variance between clusters.Is this how the interpretation of Cluster_num (Var(cons_)) done?
                  No, that is not correct. The 0.67 number represents the absolute variance of the cluster intercepts, it is not a percent of variance. The percent of variance can be gotten by running -estat icc-. Or, you can calculate the percent of variance by hand: the variance at the residual level in a logistic model is always pi2/3, which is about 3.29. So the proportion of variance at the cluster level is 0.67/(0.67+3.29) which is approximately 0.17, so 17% of the unexplained variance is at the cluster level.

                  That said, what conclusion are you drawing from this? The proportion of variance at the cluster level is whatever it is: a high or low level doesn't say anything about the validity of your model one way or another.

                  As for whether the multilevel model is better than a simple logistic regression, the answer is yes. If you had only a very small proportion of variance at the cluster level, then arguably a flat logistic regression model would be just as good--but at 17%, the multi-level model is accounting better for the data. If you like to make these decisions using statistical tests, the output of your -melogit- command itself includes such a test at the very end where it says:
                  Code:
                  LR test vs. logistic model: chibar2(01) = 374.55 Prob >= chibar2 = 0.0000
                  So the likelihood ratio test of multilevel vs logistic model has a very low p-value. (I don't recommend using statistical tests for model selection--I'm just showing you how to do it if you want to use that approach.) You could also calculate AIC and BIC for both models and see how much change there is (-estat ic- gives these statistics following a regression.)





                  Comment


                  • #10
                    Thank you so much for the help.

                    Comment


                    • #11
                      Hi!
                      Can you explain me how to find odds ratio using multilevel model?What is the code?

                      Comment


                      • #12
                        use the "or" option; see
                        Code:
                        help melogit

                        Comment


                        • #13
                          Thank you.
                          Is this equation is allright?
                          Beta0+Beta1*(gender)ij+u1j*(gender)ij+beta2*(cwork )ij+u2j*(cwork)ij+beta3*(ms)ij+u3j*(ms)ij+uoj
                          beta0 fixed intercept
                          uoj random intercept variance
                          beta1*(gender)ij general effect of the level 1 variable (gender)ij
                          (gender)ij is the observed value of the predictor variable for respondent i in cluster j.
                          u1j residual term associated with the deviation of the specific effect of the level 1 predictor gender in a cluster from the overall affect of gender across all clusters.
                          Thank you so much for your time and effort.

                          Comment

                          Working...
                          X