Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modelling prevalence in stata

    Hi all! I am trying to model prevalence of malaria using a GSEM in stata. I have information about the nuumber of malaria cases for 1532 adminsitrative units and number of total population for the same administrative units. I looked to the prevalence mean and variace and i did not find any overdispersion. Thus, I am not completely sure if I should use Binomial logit distribution or Poisson (i do not have any exposure variable). I decided to use Binomial logit in the variable name I specified my number of cases variable and in the denominator I used the total number of population variable. Is this correct? I run the model ( described in the atatched image) and it does not find convergence. the measurement model for climate was tested and is the right one. I guess the other variables (def16_18, longitud_vias and num_riesgo) should be standardized? which standarization shall i use?

    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	86.3 KB
ID:	1633213


    Thank you!

    Andrea.

  • #2
    As explained in FAQ.12.2, show some example of your data using -dataex-. This will help others to understand your model. Also, show the Stata commands (using code delimiters) you have used, again read through the frequently asked section link before posting.

    Suggestions aside, prevalence data are supposed to be continuous scale bounded by 1 while logit takes the form of 0 and 1. Not sure about your model unless you show the full command and data structure.
    Last edited by Roman Mostazir; 25 Oct 2021, 15:15.
    Roman

    Comment


    • #3
      Thank you Roman. Should the data be like this? I include my data in dta format as well.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int casos2016 float(pr2016 tmax2016 tmed2016) int(ndl_2016 num_riesgo) float(def16_18 longitud_vias)
      0 1960.4326 22.499063  18.65806  71   8 1.3520716  9327.633
      0 1874.3156 22.499063  18.65806  70  21 4.4355955 14859.553
      0 1637.2522 17.463572 14.365536  59  24  .8926113 25358.314
      0 1960.4326 22.499063  18.65806  71  15 2.0064905  16014.66
      0 1960.4326 22.499063  18.65806  71  24 1.9126343 4151.2617
      0 1936.3463  24.24782 20.224846  70  49 .11113178 14564.335
      0  2075.348 24.811754  20.56939  74  32 .06978014 14021.235
      0   1891.24  21.40913  17.85292  64   6 2.3530507  19822.57
      0 2035.6243  24.24782 20.224846  69  37 2.1893692  22711.95
      0 2740.6094 24.818853 20.676214  72  35  2.921561  6823.809
      0    2512.1 24.818853 20.676214  72   8  7.376283  1486.434
      0    2512.1 24.818853 20.676214  72   1  9.797176  1885.637
      0 2705.9614  24.31849  20.33786  78  27 2.8726206 1655.0975
      0 2459.3286 24.196983 20.122517  76  45  2.046238  9383.443
      0 2459.3286 24.196983  20.12252  76   7 1.1260817  5107.783
      0  2666.932 24.196983  20.12252  69  74  3.300219  5863.369
      0  2666.932 24.196983  20.12252  69  11   2.61236 255.49303
      0 2336.9136  24.24782 20.224846  69  61 4.0716286  7357.283
      0 2336.9136  24.24782 20.224846  69  82 1.6707628  6937.645
      0  2253.841  24.24782 20.224846  70  49  .7796409  8432.072
      0  2253.841  24.24782 20.224846  70  20 .25833806  5507.706
      0  2253.841  24.24782 20.224846  70  50  .8733951  2451.929
      0 2558.1025 24.196983  20.12252  69  37 2.9272294 4748.3384
      0  2666.932 24.196983  20.12252  69  76 .12433659 4677.8447
      0 2558.1025 24.196983  20.12252  69  48 1.2984296  9172.156
      0 2390.7031 24.196983  20.12252  77  39  2.346743   1726.28
      0  2602.762  22.94782  19.58547  81   6 .46921375  3310.092
      0  2623.635   23.3985 19.429316  76  30 1.0651296  6131.043
      0  2716.847   23.3985 19.429316  71  10 1.3966126  6016.741
      0 2558.1025 24.196983  20.12252  69  56 4.0072546 4256.6855
      0  2716.847   23.3985 19.429316  71  67  2.479288  7634.785
      0 2347.9954 24.811754  20.56939  69  75 1.0480261 10760.155
      0 2347.9954 24.811754  20.56939  69  23         0  8027.264
      0  2223.093 24.811754  20.56939  76  39 2.2687595 10099.996
      0  2716.847   23.3985 19.429316  71  64  .9331453  3003.628
      0  2618.183   23.3985 19.429316  73 108  .5297746 3920.5154
      2  2223.093 24.811754  20.56939  76 114  .3924594   11073.8
      0 2129.3665 25.454805  20.98287  80  69 1.4151956  5221.964
      0 1785.5553 25.454805  20.98287  78  68  .7798167  6135.278
      0 2590.3574 23.064917  19.44737  76  22  .6994985  3641.865
      0  2552.689 23.064917  19.44737  81   6 .26963028         0
      0  2824.276 22.005074 19.037636  78  28  .5204259         0
      0 3301.7314  27.03278  23.47951  96   6  .6640714         0
      0  2767.986  23.87233 20.776224  90  26  .8463443         0
      0  2859.063 25.290773  21.96145  94   9 .52042145         0
      0   3113.21  28.08098  24.31376 101   8  .4433858         0
      0  3304.973  28.08098  24.31376 101  47  1.555511         0
      0  3106.551 29.284124  25.24578 104  12 .15851097         0
      0   3092.94 28.953876  25.21889 117  32  .7013907         0
      0  3283.603  28.08098  24.31376 105  36  .8754107         0
      0  3125.836  28.08098  24.31376 101   8 1.1331778         0
      0  3125.836  28.08098  24.31376 101 102  .9656757         0
      0 2930.0085 25.290773  21.96145 102  25   .617736         0
      0  3196.293 28.163074  24.60037 102  22 .56580585         0
      0 3313.2524 28.163074  24.60037 106   2 .52726215         0
      0  3349.846 28.163074  24.60037 105   9 .03790818         0
      0 3097.1304 28.953876  25.21889 113  45 .12956668         0
      1  3145.121 29.199293  25.32052 112   4 1.4638028         0
      0  3144.041 29.199293  25.32052 110  16 2.2253387         0
      0  3285.347    28.531 24.851284 108   1  .2257172         0
      0  3227.503    28.531 24.851284 108  33   .843044         0
      0  3227.503    28.531 24.851284 108   1  .6555548         0
      0 3253.4226    28.531 24.851284 104   0  .8721642         0
      0  3126.579 27.595676 23.950693 104  12  1.357205         0
      0  3126.579 27.595676 23.950693 104   4  1.768676         0
      0 1824.0134  20.31072  17.00509  64  66  .6459512  23357.14
      0 2376.1892   23.8984  20.02043  61  60  3.147552 38768.605
      0  994.0181 13.477676  10.54261  67  15 .14533941 29058.533
      0  3505.352 26.109533  21.83565  84  10  .4300197         0
      0  3412.229 25.060654 21.244656  87  16  1.963831 2120.0881
      0 3294.0996 25.060654 21.244656  81  74  3.157337 13220.687
      0 3018.2205  24.31849  20.33786  81  22  .7343524  615.7242
      0  3056.878  24.31849  20.33786  81  35  2.693346   3469.32
      0 2705.9614  24.31849  20.33786  78  37 3.1629314 12865.222
      0 3018.2205  24.31849  20.33786  81  60  2.438619  7699.965
      0 3018.2205  24.31849  20.33786  81   4  2.512139  3727.831
      0 2947.5415  22.94782  19.58547  80  29 1.5392475  5170.585
      0 3294.0996 25.060654 21.244656  81  37  1.395279  3512.715
      0 3294.0996 25.060654 21.244656  81  20  1.294979  977.6693
      0  3177.557  24.04983 20.907675  83   7 .25709236         0
      0 1360.0756 12.816942 10.059307  72 126 .57187176 21434.816
      0  1684.717  23.96785 19.911085  84  90 1.9178658  18144.27
      0 1231.8643   19.8473  16.34034  61  17  .5586715  2842.289
      0 1231.8643   19.8473  16.34034  61  17 .28025597  5981.654
      0  1404.562 23.857054 19.629837  61  60 .17538744 18215.447
      0 1213.8164  22.95127  19.09852  56  14 2.3673851  13085.01
      0  1294.775  24.71883 20.887787  56  46  .1375092 32502.193
      0  1399.879  24.71883 20.887787  61  35  .2015793 15320.187
      0 1478.2648 23.857054 19.629837  58  58   .474673 19497.236
      0 1576.7183 23.857054 19.629837  69  23   .325948  8200.637
      0 1582.2325 24.237675 20.023193  67  68  .5164868 14343.727
      0 1582.2325 24.237675 20.023193  67  84  .3754576  31571.47
      0  1310.686  24.71883 20.887787  56  21 .19348995  10769.23
      0 1082.0302 16.019917 12.883747  60  15         0         0
      0 1082.0302 16.019917 12.883747  60  21 1.0776219         0
      0 1111.1948 20.137445 16.803696  64  12 1.4022865         0
      0 1154.4204 20.137445 16.803696  53  70  .7448177  5536.652
      0 1069.3672 16.019917 12.883747  51  15 1.0704374         0
      0  985.3566  17.74958 14.679419  55  11  .2187572         0
      0  950.6255 18.371286 14.924468  56   1  .9093801         0
      end
      Attached Files

      Comment


      • #4
        Thanks for the data example using -dataex-. Running the summary of the -dataex- example showing that the outcome variable casos2016 has a value 2 which is not permitted in logit model. Logit model takes the value of 1 for event and 0 for no event. Suggest you check the outcome variable by the 'summary' command and investigate what are the values other than 0,1 means in your data. If you have data other than 0 and 1, logit model is misspecified and need to be corrected and re-run the model. If still convergence issue persists, try a different technique for maximisation. For that, go to Estmation>maximization>tick use a different stepping algorithm and from technique drop downs try the different techniques.
        Last edited by Roman Mostazir; 26 Oct 2021, 11:01.
        Roman

        Comment


        • #5
          Thank you Roman. My variable of malaria cases (casos2016) is count data and stores the number of malaria cases found in a specific censal area. The conflict I have is that when selecting binomial logit for that variable Im am asked to insert the variable and the denominator. I looked to what these values mean, and I found what is described in the image below
          Click image for larger version

Name:	Capture.PNG
Views:	2
Size:	112.6 KB
ID:	1633388


          Thus, when I set up this information in the gsem. I put in variable tag the number of cases (casos2016) which would be the number of sucesses from the bernoulli trials, and in the tag dependent, the total number of population in that year 2016 (pob2016), or in other words the total number of bernoulli trials. The models does not work. But if I directly insert prevalence (prev2016) in the variable tag it works. Important to say is that I calculated prev2016=casos2016/pob2016. I know logit works with 0-1. My aim is not use cases as response variable but prevalence, but then why stata asks for a denominator, what is this denominator?
          Click image for larger version

Name:	Capture3.PNG
Views:	1
Size:	11.4 KB
ID:	1633389


          Thank you very much.


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int(casos2016 pob2016) float prev2016
          0 131          0
          0 102          0
          0 111          0
          0 170          0
          0 328          0
          0 333          0
          0 420          0
          0  75          0
          0 129          0
          0 139          0
          0 173          0
          0 102          0
          0 135          0
          0 483          0
          0 100          0
          0 316          0
          0  45          0
          0 244          0
          0 647          0
          0 260          0
          0 235          0
          0 245          0
          0  87          0
          0 305          0
          0 356          0
          0 168          0
          0  40          0
          0 102          0
          0 306          0
          0 240          0
          0 296          0
          0 217          0
          0 211          0
          0 178          0
          0 190          0
          0 556          0
          2 708 .002824859
          0 218          0
          0 371          0
          0  59          0
          0   9          0
          0  73          0
          0  28          0
          0  50          0
          0  68          0
          0  34          0
          0 128          0
          0  25          0
          0 144          0
          0 118          0
          0  64          0
          0 340          0
          0  79          0
          0  46          0
          0 169          0
          0  54          0
          0 130          0
          1 200       .005
          0 216          0
          0  19          0
          0 105          0
          0  39          0
          0 101          0
          0  59          0
          0  87          0
          0 157          0
          0 265          0
          0  83          0
          0  37          0
          0  62          0
          0 349          0
          0 100          0
          0 188          0
          0 355          0
          0 268          0
          0  15          0
          0 116          0
          0 151          0
          0  72          0
          0   8          0
          0 416          0
          0 563          0
          0  25          0
          0 134          0
          0 142          0
          0  37          0
          0 120          0
          0 265          0
          0 134          0
          0 108          0
          0 259          0
          0 394          0
          0  37          0
          0  46          0
          0 131          0
          0  29          0
          0 163          0
          0  45          0
          0  22          0
          0   5          0
          end
          Attached Files
          Last edited by Andrea Araujo; 26 Oct 2021, 12:07.

          Comment


          • #6
            I think too many zeros (95%) in your data causing problem for this model to converge. One option would be to use a poisson model or negative binomial. I was able to run a poisson model on your data:

            Code:
            #delimit ;
            gsem (casos2016 <- num_riesgo def16_18 longitud_vias Clime@1, fam(poisson) link(log))
                 (pr2016 tmax2016 tmed2016 ndl_2016 <- Clime),
            ;
            #delimit cr
            
            ---------------------------------------------------------------------------------
                            | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
            ----------------+----------------------------------------------------------------
            casos2016       |
                 num_riesgo |    .003351   .0021926     1.53   0.126    -.0009463    .0076484
                   def16_18 |    -1.4588   .1497834    -9.74   0.000     -1.75237    -1.16523
              longitud_vias |  -2.60e-06   4.68e-06    -0.56   0.579    -.0000118    6.57e-06
                      Clime |          1  (constrained)
                      _cons |  -15.84619   .8554093   -18.52   0.000    -17.52276   -14.16962
            ----------------+----------------------------------------------------------------
            pr2016          |
                      Clime |   34.95279   1.688186    20.70   0.000     31.64401    38.26158
                      _cons |   2782.468   22.13936   125.68   0.000     2739.075     2825.86
            ----------------+----------------------------------------------------------------
            tmax2016        |
                      Clime |   .2032202   .0085041    23.90   0.000     .1865525    .2198879
                      _cons |   25.92399   .0913164   283.89   0.000     25.74501    26.10296
            ----------------+----------------------------------------------------------------
            tmed2016        |
                      Clime |   .1896273   .0079403    23.88   0.000     .1740646    .2051899
                      _cons |   21.90898   .0852511   256.99   0.000     21.74189    22.07607
            ----------------+----------------------------------------------------------------
            ndl_2016        |
                      Clime |   .7459628   .0350535    21.28   0.000     .6772591    .8146665
                      _cons |   87.03674   .4437119   196.16   0.000     86.16709     87.9064
            ----------------+----------------------------------------------------------------
                  var(Clime)|   341.4655   30.99565                      285.8125    407.9552
            ----------------+----------------------------------------------------------------
               var(e.pr2016)|   374695.2   13573.56                        349014    402266.1
             var(e.tmax2016)|   .0571676   .0042196                      .0494677     .066066
             var(e.tmed2016)|   .0609349   .0039484                      .0536673    .0691865
             var(e.ndl_2016)|   130.2608   4.721394                      121.3281    139.8512
            ---------------------------------------------------------------------------------
            Note I removed the covariance path between pr2016 and ndl_2016 as they caused the model to go wild. Also after the result, using post estimation command:
            Code:
             estat eform
            will provide the incidence ratio for poisson or nbreg model.
            Last edited by Roman Mostazir; 26 Oct 2021, 19:31.
            Roman

            Comment


            • #7
              Originally posted by Roman Mostazir View Post
              I think too many zeros (95%) in your data causing problem for this model to converge. One option would be to use a poisson model or negative binomial. I was able to run a poisson model on your data:

              Code:
              #delimit ;
              gsem (casos2016 <- num_riesgo def16_18 longitud_vias Clime@1, fam(poisson) link(log))
              (pr2016 tmax2016 tmed2016 ndl_2016 <- Clime),
              ;
              #delimit cr
              
              ---------------------------------------------------------------------------------
              | Coefficient Std. err. z P>|z| [95% conf. interval]
              ----------------+----------------------------------------------------------------
              casos2016 |
              num_riesgo | .003351 .0021926 1.53 0.126 -.0009463 .0076484
              def16_18 | -1.4588 .1497834 -9.74 0.000 -1.75237 -1.16523
              longitud_vias | -2.60e-06 4.68e-06 -0.56 0.579 -.0000118 6.57e-06
              Clime | 1 (constrained)
              _cons | -15.84619 .8554093 -18.52 0.000 -17.52276 -14.16962
              ----------------+----------------------------------------------------------------
              pr2016 |
              Clime | 34.95279 1.688186 20.70 0.000 31.64401 38.26158
              _cons | 2782.468 22.13936 125.68 0.000 2739.075 2825.86
              ----------------+----------------------------------------------------------------
              tmax2016 |
              Clime | .2032202 .0085041 23.90 0.000 .1865525 .2198879
              _cons | 25.92399 .0913164 283.89 0.000 25.74501 26.10296
              ----------------+----------------------------------------------------------------
              tmed2016 |
              Clime | .1896273 .0079403 23.88 0.000 .1740646 .2051899
              _cons | 21.90898 .0852511 256.99 0.000 21.74189 22.07607
              ----------------+----------------------------------------------------------------
              ndl_2016 |
              Clime | .7459628 .0350535 21.28 0.000 .6772591 .8146665
              _cons | 87.03674 .4437119 196.16 0.000 86.16709 87.9064
              ----------------+----------------------------------------------------------------
              var(Clime)| 341.4655 30.99565 285.8125 407.9552
              ----------------+----------------------------------------------------------------
              var(e.pr2016)| 374695.2 13573.56 349014 402266.1
              var(e.tmax2016)| .0571676 .0042196 .0494677 .066066
              var(e.tmed2016)| .0609349 .0039484 .0536673 .0691865
              var(e.ndl_2016)| 130.2608 4.721394 121.3281 139.8512
              ---------------------------------------------------------------------------------
              Note I removed the covariance path between pr2016 and ndl_2016 as they caused the model to go wild. Also after the result, using post estimation command:
              Code:
               estat eform
              will provide the incidence ratio for poisson or nbreg model.
              Just a quick note: Andrea mentioned that she has total population for each of the administrative units. It should be used as an offset/exposure variable.

              I actually think that Andrea's original choice of a fractional logit model is not wrong. This would give you something like the fraction of the population that contracted malaria, assuming that there are no repeat cases of malaria (or too few such cases to influence the results). I'm not sure how you would manually set up a fractional logit model in gsem, but I think Andrea's original syntax was on the right track. However, Poisson is probably preferable. Also, I think that Poisson would be better known among epidemiologists than fractional logit.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                Thank you very much for your replies. Now I am wondering. All those obsserved variables (num_riesgo, def, and longitud_vias, follow other distributions than normal. Shall I put them in the square with double lines and establish the distribution, or shall I just estadarize them? Thank you!

                Comment


                • #9
                  Originally posted by Andrea Araujo View Post
                  Thank you very much for your replies. Now I am wondering. All those obsserved variables (num_riesgo, def, and longitud_vias, follow other distributions than normal. Shall I put them in the square with double lines and establish the distribution, or shall I just estadarize them? Thank you!
                  No. This is a common misconception. Regression does not require the independent variables to be normally distributed at all. In fact, it doesn't even require the dependent variable to be normally distributed.* We need the distribution of the error term to be normal and random to achieve maximum statistical efficiency in OLS, but that's not critical either. I'm not sure what you mean by put the IVs in the square with double lines because I use the command syntax. If you meant to include squared terms for these IVs, then the distribution of the IV is not a reason to use a squared term by itself. If you think that the IV has a non-linear effect on whatever scale the Poisson model is being fit in.

                  * That said, there are times when it's beneficial to use a generalized linear model with, for example, a log link and some family that isn't normal.
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment

                  Working...
                  X