Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins and Clustered GEE Bootstrap Sampling Problem

    Hi Everyone,

    I am really struggling with this issue and hope someone can help.

    I have a generalized estimating equation model based on matched pairs (1:3) /clusters. The model is simply the relationship of cost to disease_status (binary variable). I would like an estimate of the change in cost by disease status. The clusters have a pair ID. My goal is to bootstrap the model sampling from clusters and use the margin results to estimate the 95% CI using the percentile method. My code is as follows:

    (I have many costs so I used a program - this is just a demonstration of it on one cost)

    Code:
    capture program drop dydx_margins
    program define dydx_margins, eclass
    xtgee cost i.disease_status fam(gamma) link(log) corr(independent) i(pairid2)
    margins, dydx(disease_status) post
    exit
    end
    xtset, clear
    bootsstrap _b, reps(1000) seed(1234) cluster(pairid) idcluster(pairid2): dydx_margins

    Someone recently asked me to make sure that different cluster IDs were given to replications of the same cluster. This was to make sure I was not overweighting the case-control differences from those clusters which may have been selected more than once. I thought I fixed this by specifying the idcluster command and making sure I told xtset to clear and refer to that new pair variable in the xtgee.

    But to check I ran this code below just removing the idcluster command:

    Code:
    capture program drop dydx_margins2
    program define dydx_margins2, eclass
    xtgee cost i.disease_status fam(gamma) link(log) corr(independent) i(pairid)
    margins, dydx(disease_status) post
    exit
    end
    xtset, clear
    bootstrap _b, reps(1000) seed(1234) cluster(pairid): dydx_margins2
    This gave me the exact same result which is making me nervous that the sampling is not being done correctly and I am not correctly resampling. Does anyone know where I went wrong? Sorry I am coding this from memory.

    Thanks!
    Last edited by andy mend; 04 Dec 2020, 00:09.

  • #2
    Hi Andy,

    Try using the -noisily- option of -bootstrap- and print whatever might help you verify that what you want is being computed. The -noisily- option displays any output issued at each bootstrap replication. Obviously, you may want to reduce replications to a small number (I usually use 3) to visualize what is going on.

    As a general rule, I would suggest that you use the -nose- option of -margins- when using -margins- inside a -bootstrap- routine. At each iteration -margins- is computing standard errors you are not going to use. The -nose- option avoids these computation which, depending on the model, might save you a significant amount of computational time.

    Comment


    • #3
      Originally posted by Enrique Pinzon (StataCorp) View Post
      Hi Andy,

      Try using the -noisily- option of -bootstrap- and print whatever might help you verify that what you want is being computed. The -noisily- option displays any output issued at each bootstrap replication. Obviously, you may want to reduce replications to a small number (I usually use 3) to visualize what is going on.

      As a general rule, I would suggest that you use the -nose- option of -margins- when using -margins- inside a -bootstrap- routine. At each iteration -margins- is computing standard errors you are not going to use. The -nose- option avoids these computation which, depending on the model, might save you a significant amount of computational time.
      Thank you for the tips Enrique!

      I tried this today and although it allows me to check the margin output of each run, I still see that the result is the exact same between the two codes (the one with idcluster specified and the one without). Any ideas why this may be?

      Thanks!

      Comment


      • #4
        Sorry I just wanted to add, is there a way to make the margins command calculate CI through the percentile method at the end of the bootstrap? (CItype does not work)

        Comment


        • #5
          Hi Andy,

          The -idcluster()- option creates a variable. In both cases the sampling is done at the -cluster()- level. This is why you are getting the same results in both cases. -estat bootstrap, percentile- should give you the percentile bootstrap.

          Comment


          • #6
            Originally posted by andy mend View Post
            I still see that the result is the exact same between the two codes (the one with idcluster specified and the one without). Any ideas why this may be?
            It's because your estimation command is ignoring the clustering variable regardless of what you specify in the bootstrapping command.

            If you change your estimation command not to ignore clustering, i.e., from
            Code:
            xtgee . . ., . . . corr(independent) . . .
            to →
            Code:
            xtgee . . ., . . . corr(exchangeable) . . .
            then you'll see a difference. See below. (Begin at the "Begin here" comment; the first part of the output just creates a dataset for illustration.)

            .ÿ
            .ÿ/*ÿReferenceÿhttps://www.stata.com/statalist/archive/2010-09/msg01472.htmlÿ*/
            .ÿ
            .ÿversionÿ16.1

            .ÿ
            .ÿclearÿ*

            .ÿ
            .ÿsetÿseedÿ`=strreverse("1584599")'

            .ÿ
            .ÿtempnameÿCorr

            .ÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(4,ÿ4,ÿ0.5)ÿ+ÿI(4)ÿ*ÿ0.5

            .ÿquietlyÿdrawnormÿe1ÿe2ÿe3ÿe4,ÿdoubleÿcorr(`Corr')ÿn(250)

            .ÿ
            .ÿgenerateÿintÿpidÿ=ÿ_n

            .ÿquietlyÿreshapeÿlongÿe,ÿi(pid)ÿj(dcd)

            .ÿ
            .ÿgenerateÿbyteÿdstÿ=ÿdcdÿ==ÿ1

            .ÿgenerateÿdoubleÿxbÿ=ÿcond(dst,ÿ2ÿ*ÿe,ÿe)

            .ÿgenerateÿdoubleÿcstÿ=ÿrgamma(exp(xb),ÿ1)

            .ÿ
            .ÿ*
            .ÿ*ÿBeginÿhere
            .ÿ*
            .ÿxtgeeÿcstÿi.dst,ÿi(pid)ÿcorr(independent)ÿvce(robust)ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿfamily(gamma)ÿlink(log)ÿnolog

            GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ1,000
            Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿpidÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ250
            Link:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogÿÿÿÿÿÿObsÿperÿgroup:
            Family:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgammaÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ4
            Correlation:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿindependentÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ4.0
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ4
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(1)ÿÿÿÿÿÿ=ÿÿÿÿÿÿ20.60
            Scaleÿparameter:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ15.15985ÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000

            Pearsonÿchi2(1000):ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ15159.85ÿÿÿÿÿÿDevianceÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿ5668.53
            Dispersionÿ(Pearson):ÿÿÿÿÿÿÿÿÿÿÿÿÿ15.15985ÿÿÿÿÿÿDispersionÿÿÿÿÿÿÿÿ=ÿÿÿÿ5.66853

            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Std.ÿErr.ÿadjustedÿforÿclusteringÿonÿpid)
            ------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿSemirobust
            ÿÿÿÿÿÿÿÿÿcstÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
            -------------+----------------------------------------------------------------
            ÿÿÿÿÿÿÿ1.dstÿ|ÿÿÿ2.007464ÿÿÿ.4423488ÿÿÿÿÿ4.54ÿÿÿ0.000ÿÿÿÿÿ1.140476ÿÿÿÿ2.874451
            ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.4903822ÿÿÿ.0639205ÿÿÿÿÿ7.67ÿÿÿ0.000ÿÿÿÿÿ.3651003ÿÿÿÿ.6156642
            ------------------------------------------------------------------------------

            .ÿmarginsÿdst

            AdjustedÿpredictionsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ1,000
            ModelÿVCEÿÿÿÿ:ÿSemirobust

            Expressionÿÿÿ:ÿExponentiatedÿlinearÿprediction,ÿpredict()

            ------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿMarginÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
            -------------+----------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿdstÿ|
            ÿÿÿÿÿÿÿÿÿÿ0ÿÿ|ÿÿÿÿ1.63294ÿÿÿ.1043784ÿÿÿÿ15.64ÿÿÿ0.000ÿÿÿÿÿ1.428362ÿÿÿÿ1.837518
            ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿÿ12.15628ÿÿÿ5.709759ÿÿÿÿÿ2.13ÿÿÿ0.033ÿÿÿÿÿ.9653574ÿÿÿÿÿ23.3472
            ------------------------------------------------------------------------------

            .ÿmarginsÿ,ÿdydx(dst)

            ConditionalÿmarginalÿeffectsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ1,000
            ModelÿVCEÿÿÿÿ:ÿSemirobust

            Expressionÿÿÿ:ÿExponentiatedÿlinearÿprediction,ÿpredict()
            dy/dxÿw.r.t.ÿ:ÿ1.dst

            ------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿdy/dxÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
            -------------+----------------------------------------------------------------
            ÿÿÿÿÿÿÿ1.dstÿ|ÿÿÿ10.52334ÿÿÿ5.660037ÿÿÿÿÿ1.86ÿÿÿ0.063ÿÿÿÿ-.5701304ÿÿÿÿ21.61681
            ------------------------------------------------------------------------------
            Note:ÿdy/dxÿforÿfactorÿlevelsÿisÿtheÿdiscreteÿchangeÿfromÿtheÿbaseÿlevel.

            .ÿ
            .ÿprogramÿdefineÿbootem,ÿrclass
            ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ16.1
            ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿvarname
            ÿÿ3.ÿ
            .ÿÿÿÿÿÿÿÿÿxtgeeÿcstÿi.dst,ÿi(`varlist')ÿcorr(exchangeable)ÿfamily(gamma)ÿlink(log)
            ÿÿ4.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿdltÿ=ÿexp(_b[1.dst]ÿ+ÿ_b[_cons])ÿ-ÿexp(_b[_cons])
            ÿÿ5.ÿend

            .ÿ
            .ÿ//ÿWithout
            .ÿquietlyÿbootstrapÿdltÿ=ÿr(dlt),ÿ///
            >ÿÿÿÿÿÿÿÿÿcluster(pid)ÿ/*ÿidcluster(nid)ÿ*/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿseed(`=strreverse("1584464")')ÿreps(40)ÿnodots:ÿbootemÿpid

            .ÿestatÿbootstrap,ÿnormalÿpercentileÿnoheaderÿnolegend
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Replicationsÿbasedÿonÿ250ÿclustersÿinÿpid)
            ------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿObservedÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿBootstrap
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿCoef.ÿÿÿÿÿÿÿBiasÿÿÿÿStd.ÿErr.ÿÿ[95%ÿConf.ÿInterval]
            -------------+----------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿdltÿ|ÿÿÿ10.523338ÿÿ-.5924271ÿÿÿÿ4.712593ÿÿÿÿ1.286826ÿÿÿ19.75985ÿÿÿ(N)
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ3.745854ÿÿÿ20.18997ÿÿÿ(P)
            ------------------------------------------------------------------------------
            (N)ÿÿÿÿnormalÿconfidenceÿinterval
            (P)ÿÿÿÿpercentileÿconfidenceÿinterval

            .ÿ
            .ÿ//ÿWith
            .ÿquietlyÿbootstrapÿdltÿ=ÿr(dlt),ÿ///
            >ÿÿÿÿÿÿÿÿÿcluster(pid)ÿidcluster(nid)ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿseed(`=strreverse("1584464")')ÿreps(40)ÿnodots:ÿbootemÿnid

            .ÿestatÿbootstrap,ÿnormalÿpercentileÿnoheaderÿnolegend
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Replicationsÿbasedÿonÿ250ÿclustersÿinÿpid)
            ------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿObservedÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿBootstrap
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿCoef.ÿÿÿÿÿÿÿBiasÿÿÿÿStd.ÿErr.ÿÿ[95%ÿConf.ÿInterval]
            -------------+----------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿdltÿ|ÿÿÿ10.523338ÿÿ-.5309686ÿÿÿ5.0660622ÿÿÿÿ.5940389ÿÿÿ20.45264ÿÿÿ(N)
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ3.798211ÿÿÿ22.21078ÿÿÿ(P)
            ------------------------------------------------------------------------------
            (N)ÿÿÿÿnormalÿconfidenceÿinterval
            (P)ÿÿÿÿpercentileÿconfidenceÿinterval

            .ÿ
            .ÿexit

            endÿofÿdo-file


            .


            I used only 40 repetitions in order to save time and exaggerate the sampling differences. Acknowledgement at the top to the late Prof. Joseph Hilbe for his guidance in simulating the distribution properly.

            Comment


            • #7
              Thank you so much Joseph! Any ideas as to why the independent correlation structure influences the bootstrap? I am reading up on a lot of materials but am a bit lost here.

              Comment


              • #8
                Originally posted by andy mend View Post
                Any ideas as to why the independent correlation structure influences the bootstrap?
                It doesn't actually influence the bootstrap; rather, as I mentioned above, it causes the estimation command, xtgee, to ignore the cluster variable.

                You still have the influence of on the bootstrapped results from sampling clusters by the bootstrap command, but there is no recognition of any cluster by the estimation command.

                So, bootstrapping with a cluster ID variable—whether the new one generated by bootstrap or the the original one in your dataset—is immaterial, and the results, which are computed by the estimation command without reference to any clustering, are identical.

                With corr(independent), xtgee behaves as if you had used
                Code:
                glm cost i.disease_status family(gamma) link(log)
                instead.

                Comment

                Working...
                X