Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi-level model (xtmelogit) vs. adjusting for PSU and strata (svy: logit)?

    What is the appropriate way to specify models that incorporate two levels of clustering (if that is the right term)? I initially used xtmelogit (level 1=child, level 2=sibling groups, level 3=counties). These are experimental data; the intervention was implemented separately in 9 counties and served children (many in sibling groups). A colleague recommended that, since I don’t care about estimating county-level impacts, xtmelogit might be overkill and I could run models simply adjusting for strata (county) and PSU (sibling group) which I then did using svy: logit. (If I understand correctly, this suggestion is also made by the authors of GLLAMM.) However, results using the two approaches differ, which makes me think either that I’m doing something wrong, or that one approach is better than the other. Can anyone advise? Thank you in advance!

    Below I've provided some sample output and definitions of my key variables.

    EXPER: 1=treatment, 0=control (Independent variable of interest)
    MOMCLOSE: 1= good outcome, 0=bad outcome
    Siteid=county identifier (level 3 id, with dummy indicators called site# )
    randcid = case id/sibling group id (level 2 id)
    fpcvar =(fpc, calculated per county, number of respondents divided by number of youth in the original sample)


    . svyset randcid, strata (siteid) fpc(fpcvar)

    pweight: <none>
    VCE: linearized
    Single unit: missing
    Strata 1: siteid
    SU 1: randcid
    FPC 1: fpcvar

    MODEL 1

    .
    . foreach var in momclose {
    2. svy: logit `var' exper, or
    3. }
    (running logit on estimation sample)

    Survey: Logistic regression

    Number of strata = 9 Number of obs = 303
    Number of PSUs = 263 Population size = 303
    Design df = 254
    F( 1, 254) = 0.21
    Prob > F = 0.6451

    ------------------------------------------------------------------------------
    | Linearized
    momclose | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    exper | 1.053215 .1184012 0.46 0.645 .8440495 1.314214
    _cons | .6947368 .0540147 -4.68 0.000 .5961066 .8096862
    ------------------------------------------------------------------------------

    MODEL 2
    .
    . foreach var in momclose {
    2. svy: logit `var' exper site268 site269 site271 site272 site273 site274 sit
    > e275 site276, or
    3. }
    (running logit on estimation sample)

    Survey: Logistic regression

    Number of strata = 9 Number of obs = 303
    Number of PSUs = 263 Population size = 303
    Design df = 254
    F( 9, 246) = 6.36
    Prob > F = 0.0000

    ------------------------------------------------------------------------------
    | Linearized
    momclose | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    exper | 1.135454 .1281456 1.13 0.261 .9091681 1.41806
    site268 | 1.169825 .282946 0.65 0.517 .7265324 1.883593
    site269 | .5536779 .1434254 -2.28 0.023 .3324338 .9221662
    site271 | .8257458 .1561435 -1.01 0.312 .5690086 1.198323
    site272 | 1.252144 .280899 1.00 0.317 .8049818 1.947701
    site273 | 1.53346 .2869998 2.28 0.023 1.060719 2.216892
    site274 | .5546835 .1121758 -2.91 0.004 .3724597 .8260591
    site275 | 3.282409 .9750136 4.00 0.000 1.828688 5.891772
    site276 | .7522824 .1431926 -1.50 0.136 .5171112 1.094405
    _cons | .6761679 .0953349 -2.78 0.006 .5122318 .8925705
    ------------------------------------------------------------------------------

    . svyset, clear

    .
    MODEL 3

    . xtmelogit momclose exper || siteid: || randcid: , or

    Refining starting values:

    Iteration 0: log likelihood = -206.31694 (not concave)
    Iteration 1: log likelihood = -203.61326
    Iteration 2: log likelihood = -202.51347

    Performing gradient-based optimization:

    Iteration 0: log likelihood = -202.51347
    Iteration 1: log likelihood = -202.47848
    Iteration 2: log likelihood = -202.4783
    Iteration 3: log likelihood = -202.4783

    Mixed-effects logistic regression Number of obs = 303

    --------------------------------------------------------------------------
    | No. of Observations per Group Integration
    Group Variable | Groups Minimum Average Maximum Points
    ----------------+---------------------------------------------------------
    siteid | 9 20 33.7 57 7
    randcid | 263 1 1.2 4 7
    --------------------------------------------------------------------------

    Wald chi2(1) = 0.11
    Log likelihood = -202.4783 Prob > chi2 = 0.7348

    ------------------------------------------------------------------------------
    momclose | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    exper | 1.144452 .455809 0.34 0.735 .5243044 2.49811
    _cons | .5649229 .1772804 -1.82 0.069 .3054012 1.044979
    ------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
    Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    siteid: Identity |
    sd(_cons) | .2495608 .4194843 .0092555 6.729036
    -----------------------------+------------------------------------------------
    randcid: Identity |
    sd(_cons) | 1.9571 .7909925 .8863118 4.321548
    ------------------------------------------------------------------------------
    LR test vs. logistic regression: chi2(2) = 6.42 Prob > chi2 = 0.0404

    Note: LR test is conservative and provided only for reference.

    .
    .
    end of do-file

  • #2
    Welcome to Statalist, Sharon! It's difficult to read much of your post.. Please read FAQ 12 and repost using CODE delimiters to display commands and results.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      I managed to read your post after all, but I had to copy & paste into a text editor to do it. So, please, next time use CODE delimiters.

      Thee "fpc()" in your svyset statement is invalid. To use the theory of the fpc, the sampled observations must have been selected by random numbers-i.e. they must come from a random sampling design. With a random sample, the value for the fpc option at each site is the sampling-fraction \(n/N\), or \(N\) where \(N\)is the size of the target population, and \(n\) is the size of the random sample. The value you have entered is the response rate; responders are not a random sample.

      To comment further I'd like to know the actual study design, including how treatments were assigned.

      Below I've abstracted the results for the three models you showed. The estimated odds ratios for each of the models are well within one-standard error of one another, and the CIs subtantially overlap.

      Models 1 & Model 2 are very different. Model 1 does not adjust for site differences and would, if properly weighted, reproduce the proportion positive in treatment and control for the nine sites. Model 2 is a better model: it both stratifies on site and includes site predictors as well. The difference between unadjusted and adjusted odds ratios is a common phenomenon in epidemiology, so is not a surprise. Models 2 & 3 have similar predictor lists, so that the closeness of their odds ratios is to be expected.

      Code:
      ----------------------------------------------------------------------------
                   |             Linearized
          momclose | Odds Ratio   Std. Err.      t    P&gt;|t|     [95% Conf. Interval
      -------------+----------------------------------------------------------------
      Model 1
             exper |   1.053215   .1184012     0.46   0.645     .8440495    1.314214
      Model 2
             exper |   1.135454   .1281456     1.13   0.261     .9091681     1.41806
      Model 3
             exper |   1.144452    .455809     0.34   0.735     .5243044     2.49811
      I would recommend the following logit model, which should give results similar to those in Model 2, but doesn't require that there was a random sampling design. It is model-based, not design based. It does, however, require that outcomes, conditional on treatment and site, were independent. With a sample design, such independence is initially induced by the random sampling, followed by randomization. However non-independence can be introduced after selection, for example, if the intervention at a site is applied to groups rather than individuals; or, if treatment is 1 to 1, there is more than one treatment provider.

      I suggest that you use Stata's factor variables . I assume that you have a variable "site" with a baseline value (270?) and other values 268,269,271,272,273,274,275,276. You have enough data to look for interactions between treatment and site, so I show that model.

      Code:
      logit `var' exper site268 site269 site271 ///
         site272 site273 site274 site275 site276, vce(cluster randcid)
      /* or */
      logit `var' exper i.site , vce(cluster randcid)
      
      logit `var' i.site##exper
      testparm i(268/276)#exper  // test of interaction
      To actually interpret these ORs, you'll need to apply margins to the model predictions. For the outcome "momclose" that you display, the predicted site positive rates range from about 28% to 76%. If observed prportions are in this range, predictions from a linear and logit models would be very close. The advantage of the linear model is that effects are in terms of differences in proportions, much easier to understand than odds ratios. The disadvantage is that CIs for proportions can extend beyond the ends of the [0,1] interval. However this doesn't usually happen to the CIs for differences. The linear models would be:
      Code:
      reg `var' exper i.site , vce(cluster randcid)
      reg `var' i.site##exper
      To decide between linear & logit models, you can compare their predictions to the observed proportions.
      Last edited by Steve Samuels; 01 Dec 2015, 08:10.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment

      Working...
      X