Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with model parameterization, mepoisson

    I’m having difficulty with a mixed effects model I’m trying to run. I have a data set looking at exposure to Lassa fever, and infectious disease endemic to west Africa. I am interested in building a model that describes correlates of exposure to Lassa fever (i.e., what factors are significantly associated with a positive ELISA antibody test for Lassa fever (LASV) exposure). Our data suggests a high degree of clustering within households – if one person shows evidence of exposure to Lassa virus, the ‘secondary attack rate’ or likelihood that a second person in the house was also exposed was high. We expected this, and anticipated needing to use a hierarchical modeling structure in our multivariable modeling.

    I had selected the Stata command mepoisson, with household (the variable hh_new in our data set) as the random effects variable. I ran that model, then out of curiosity, I ran the model without hh_new as a random effects variable, and got nearly the same outcome. I was surprised by this, and I assume I have miscoded my command, so that Stata doesn’t properly consider the hh_new variable as a random effects variable.

    I am relatively new to this type of model, can you see any obvious errors here? Did I miss some option to properly tell Stata that household should be considered as a random effect?

    Here’s my output:

    . // Set up the hierarchical model with single independent variables as a fixed effect
    . // household is random effects
    . // using a Poisson link
    . tab district lasv_pos , row

    +----------------+
    | Key |
    |----------------|
    | frequency |
    | row percentage |
    +----------------+

    2.7 | LASV ELISA status
    District | Negative Positive | Total
    -----------+----------------------+----------
    Kenema | 1,775 1,405 | 3,180
    | 55.82 44.18 | 100.00
    -----------+----------------------+----------
    Port Loko | 975 70 | 1,045
    | 93.30 6.70 | 100.00
    -----------+----------------------+----------
    Tonkolili | 3,081 837 | 3,918
    | 78.64 21.36 | 100.00
    -----------+----------------------+----------
    Total | 5,831 2,312 | 8,143
    | 71.61 28.39 | 100.00

    . mepoisson lasv_pos district_ken district_ton || hh_new:, irr

    Fitting fixed-effects model:

    Iteration 0: log likelihood = -6537.6159
    Iteration 1: log likelihood = -4975.6947
    Iteration 2: log likelihood = -4940.9195
    Iteration 3: log likelihood = -4940.8151
    Iteration 4: log likelihood = -4940.815

    Refining starting values:

    Grid node 0: log likelihood = -5133.1424

    Fitting full model:

    Iteration 0: log likelihood = -5133.1424 (not concave)
    Iteration 1: log likelihood = -5012.8385 (not concave)
    Iteration 2: log likelihood = -4940.5265
    Iteration 3: log likelihood = -4936.5665
    Iteration 4: log likelihood = -4936.5525
    Iteration 5: log likelihood = -4936.5525

    Mixed-effects Poisson regression Number of obs = 8,143
    Group variable: hh_new Number of groups = 831

    Obs per group:
    min = 1
    avg = 9.8
    max = 20

    Integration method: mvaghermite Integration pts. = 7

    Wald chi2(2) = 406.15
    Log likelihood = -4936.5525 Prob > chi2 = 0.0000
    ------------------------------------------------------------------------------
    lasv_pos | IRR Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    district_ken | 6.614448 .8310112 15.04 0.000 5.170735 8.461258
    district_ton | 3.180607 .4058815 9.07 0.000 2.476778 4.084445
    _cons | .0653874 .0080086 -22.27 0.000 .0514328 .0831282
    -------------+----------------------------------------------------------------
    hh_new |
    var(_cons)| .0502412 .0190299 .0239139 .1055528
    ------------------------------------------------------------------------------
    Note: Estimates are transformed only in the first equation to incidence-rate ratios.
    Note: _cons estimates baseline incidence rate (conditional on zero random effects).
    LR test vs. Poisson model: chibar2(01) = 8.53 Prob >= chibar2 = 0.0018

    . mepoisson lasv_pos district_ken district_ton , irr /* removing the hierarchical structure changes very little. This is not expected. */

    Iteration 0: log likelihood = -6537.6159
    Iteration 1: log likelihood = -4975.6947
    Iteration 2: log likelihood = -4940.9195
    Iteration 3: log likelihood = -4940.8151
    Iteration 4: log likelihood = -4940.815

    Poisson regression Number of obs = 8,143

    Wald chi2(2) = 454.05
    Log likelihood = -4940.815 Prob > chi2 = 0.0000
    ------------------------------------------------------------------------------
    lasv_pos | IRR Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    district_ken | 6.595794 .8077476 15.40 0.000 5.188305 8.385108
    district_ton | 3.189179 .396799 9.32 0.000 2.499033 4.069919
    _cons | .0669857 .0080063 -22.62 0.000 .0529961 .0846682
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline incidence rate (conditional on zero random effects).


    Thank you,
    Matt

  • #2
    Well, I don't see anything wrong with the way you set up your -mepoisson- commands.

    Looking at the results from the model with the random intercepts, we see that the variance of the random intercepts is, to two decimal places, 0.05. That corresponds to a standard deviation sqrt(0.05), or 0.22 to 2 decimal places. So if you think about the model's extended linear predictor = _b[_cons] + _b[district_ken]*district_ken + _b[district_ton]*district_ton + random intercept, notice that _b[district_ken] and _b[district_ton] are much larger than the random intercepts. The smaller coefficient, _b[district_ton], is approximately 3.19, which is 14.5 times the standard deviation of the random intercepts. And the other coefficient is roughly twice as large as that. In other words, the random intercepts make only a tiny contribution to the prediction. It is, therefore, not surprising that when they are omitted altogether the coefficients don't change much.

    Qualitatively, one might interpret this by saying that notwithstanding the apparently high intra-household correlation in the outcome variable that, before the actual study, you had the impression you would find, if that strong correlation exists, much of it arises from the fact that any given household is entirely within a single district, and the district-level effects account for most of what you perceived as a household effect. You might even test this theory by running another model in which you retain the household random effects and omit the district variables: if your initial perception of a strong intra-household correlation is correct, you will find that the variance of the household effect is much larger in this model.

    You might also consider other possible explanations for your unexpected findings. It may be that your data are incorrect, either because they were collected inaccurately, or errors were introduced during data management. It may be that the districts sampled in this study are different in some material way from the out-of-study locations where you originally formed your impression of strong intra-household correlation. And, of course, it may be that your initial impression was simply a cognitive error.

    Side note: I have used the terminology intra-household correlation in this discussion as a shorthand for there being an increased probability of a member of a given household having Lassa fever exposure if another member has. The term intra-household correlation is normally used to refer to a specific statistic, the intra-class correlation. But the intra-class correlation is undefined for Poisson models. So please bear in mind that the use of the term here is informal and intended to be evocative of a concept that takes many words to set out correctly, but is, strictly speaking, an abuse of language.

    Comment


    • #3
      Hi Clyde,

      Thank you for the thoughtful response - your point is well taken about households within districts. I ended up doing this with age, which presumably shouldn't exhibit that relationship, and again I got almost no change in our measures of effect / IIRs. I also looked at participant sex (which was not significantly associated with LASV seropositivity) and again, see no real differences between the IIRs across models controlling for this "intra-household correlation" and not controlling for this.

      We did expect this clustering within households, and I think we used the right methods to confirm that it exists with our data (Britton, Biometrics 97, Tests to detect clustering of infected individuals within families) and I'm comfortable with the quality of our data (though your point about cognitive error is well taken - I'll be reviewing this with another project statistician soon!) which all makes me quite puzzled as to why I'm seeing these results. Your comment that I've not mis-wrote the code for the -mepoisson- command was reassuring, but I wasn't expecting the modeling to look so similar with and without the "intra-household correlation". I'll revisit the data to ensure all the covariates are coded correctly to use in modeling.


      Comment


      • #4
        I think, based on the additional information, that the clustering within households is, itself, a real phenomenon. It's just that its effect is extremely small compared to the much larger district effects, so the inclusion of it in the model results in very little change. It is, so to speak, in another abuse of language, "a rounding error" once district effects are considered.

        Comment


        • #5
          HI again, just to close this loop, I spoke with a colleague who'd relatively recently gotten his PhD doing this type of work, looking at blood pressure and modeling the assumption that it clustered on household too (presumably through effects of diet, habit, exercise, etc). He explained to me that he wasn't too surprised that I wasn't seeing much of an effect - that the cluster size was so small that despite it being a real effect, it did not influence the model much at all. He also felt it was worthwhile to keep in the model, and we will also explore village as a potential variable on which we're also seeing clustering.

          Anyway, thanks again for your advice. It was reassuring to have your thoughts above largely echoed by my colleague.

          Comment

          Working...
          X