Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adjusting Standard Errors for Non-Nested Clusters in Marginsplot with GEE

    Hey everyone,

    I’m interested in calculating and plotting predicted probabilities for 30-day survival using marginsplot in Stata, but I need to adjust my standard errors to account for non-nested clusters.

    I’m working with generalized estimating equations (xtgee) to examine the association between ambulance case volume for a specific medical issue and patient 30-day survival. I need to account for correlations within ambulance units (1-3 persons in the same ambulance treating the same patient) and within ambulance personnel (the same person responding to 0-40 cases per year).

    I’m using a method based on the approach described by Miglioretti & Heagerty (2007),(https://academic.oup.com/aje/article...5/4/453/109343). which allows to account for two non-nested clusters:
    • C1ID: Identifies observations belonging to the same ambulance.
    • C2ID: Identifies observations belonging to the same ambulance personnel.
    • C1C2ID: Identifies observations belonging to both the same ambulance and ambulance personnel.
    This approach helps estimate the covariance matrix for the regression coefficients as a linear combination of three covariance matrices via GEE.

    Code:
    xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C1ID)
    matrix V1= e(V)
     xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C2ID)
    matrix V2=e(V)
    xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C1C2ID)
    matrix V3=e(V)
     
    matrix V = V1 + V2 - V3

    Data example:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(patient_age patient_sex) str36 personnel_id str31 unit_id float(casevol_12month survival_30_days)
    72 1 "1"  "Ambulance 1"   13 1
    72 1 "2"  "Ambulance 1"  1 1
    78 1 "2"  "Ambulance2"  5 0
    64 1 "3"  "Ambulance12"   7 0
    64 1 "4"  "Ambulance12"   4 0
    73 1 "2"  "Ambulance21"  6 0
    73 1 "5" "Ambulance21"  0 0
    72 1 "7"  "Ambulance25"    13 1
    64 1 "8"  "Ambulance20"    10 0
    78 1 "9"  "Ambulance31"    15 0
    end
    label values patient_sex sexlab
    label def sexlab 1 "Male", modify
    label values survival_30_days yes1no0_
    label def yes1no0_ 0 "No", modify
    label def yes1no0_ 1 "Yes", modify
    label var patient_age "Patient's age"
    label var patient_sex "Patient's sex"
    label var survival_30_days "30-day survival"


    Is it possible to change the variance matrix in marginsplot or can I produce the plot manually somehow?

    Thanks in advance!


  • #2
    Your data example does not include the cluster variables. To increase your chances of obtaining a helpful reply, make sure that your problem is reproducible, i.e.,

    i) all variables are present and
    ii) you have sufficient observations.

    Comment


    • #3
      Hey Andrew
      Sorry! Updated example below

      C1ID: Ambulance personnel ID
      C2ID: Ambulance unit ID
      C1C2ID: Observations belonging to both C1ID and C2ID

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(patient_id patient_age patient_sex) long(C1ID C2ID) float(C1C2ID casevol_12month survival_30_days)
      22708 54 1  678 15782 20076  7 0
      22708 54 1  991 15782 29358  9 0
      22709 68 0 1028 31529 30253 24 1
      22709 68 0  201 31529  8386 15 1
      22710 68 0 1642 31528 45902  2 0
      22710 68 0 1177 31528 33657  1 0
      22716 87 1 1146 16575 32945  3 0
      22716 87 1 2886 16575 71260  2 0
      22720 78 0 1067 31532 31024 13 1
      22720 78 0  996 31532 29558  9 1
      22722 88 0  532 31530 16772  4 0
      22722 88 0 2930 31530 71759  4 0
      22729 63 1  591 31531 17872  2 1
      22729 63 1 1784 31531 49787  5 1
      22735 58 1  548 31534 17009  6 1
      22735 58 1 2611 31534 68664  2 1
      22737 61 0 2023   143 57564  2 0
      22737 61 0 2808   143 70421  7 0
      22740 81 0 1765 38642 49155  8 1
      22740 81 0 1831 38642 51436  9 1
      22741 80 1 1234 39813 34962  1 0
      22741 80 1  714 39813 21424  5 0
      22741 80 1 1099 39813 31843  2 0
      22744 70 1 1822 31533 51180  5 0
      22744 70 1 3236 31533 74887  3 0
      22746 83 0 1985 39296 56074  5 0
      22749 76 1 1424 15144 40040 12 0
      22754 85 0 2991 15327 72430  6 0
      22756 91 1 1000 15852 29632  6 0
      22762 88 1 2790 31535 70258  3 0
      end


      Comment


      • #4
        Thanks. You can use erepost from SSC to replace the variance matrix. Then margins will use the active results.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(patient_id patient_age patient_sex) long(C1ID C2ID) float(C1C2ID casevol_12month survival_30_days)
        22708 54 1  678 15782 20076  7 0
        22708 54 1  991 15782 29358  9 0
        22709 68 0 1028 31529 30253 24 1
        22709 68 0  201 31529  8386 15 1
        22710 68 0 1642 31528 45902  2 0
        22710 68 0 1177 31528 33657  1 0
        22716 87 1 1146 16575 32945  3 0
        22716 87 1 2886 16575 71260  2 0
        22720 78 0 1067 31532 31024 13 1
        22720 78 0  996 31532 29558  9 1
        22722 88 0  532 31530 16772  4 0
        22722 88 0 2930 31530 71759  4 0
        22729 63 1  591 31531 17872  2 1
        22729 63 1 1784 31531 49787  5 1
        22735 58 1  548 31534 17009  6 1
        22735 58 1 2611 31534 68664  2 1
        22737 61 0 2023   143 57564  2 0
        22737 61 0 2808   143 70421  7 0
        22740 81 0 1765 38642 49155  8 1
        22740 81 0 1831 38642 51436  9 1
        22741 80 1 1234 39813 34962  1 0
        22741 80 1  714 39813 21424  5 0
        22741 80 1 1099 39813 31843  2 0
        22744 70 1 1822 31533 51180  5 0
        22744 70 1 3236 31533 74887  3 0
        22746 83 0 1985 39296 56074  5 0
        22749 76 1 1424 15144 40040 12 0
        22754 85 0 2991 15327 72430  6 0
        22756 91 1 1000 15852 29632  6 0
        22762 88 1 2790 31535 70258  3 0
        end
        
        xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C1ID)
        matrix V1= e(V)
         xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C2ID)
        matrix V2=e(V)
        xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C1C2ID)
        matrix V3=e(V) 
        matrix V = V1 + V2 - V3
        mat list V
        erepost V= V, rename
        estimates replay
        mat list e(V)
        Res.:

        Code:
        . xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C1ID)
        
        Iteration 1:  Tolerance = 5.936e-12
        
        GEE population-averaged model                      Number of obs    =       30
        Group variable: C1ID                               Number of groups =       30
        Family: Binomial                                   Obs per group:  
        Link:   Logit                                                   min =        1
        Correlation: independent                                        avg =      1.0
                                                                        max =        1
                                                           Wald chi2(1)     =     4.74
        Scale parameter = 1                                Prob > chi2      =   0.0295
        
        Pearson chi2(30)     =    29.76                    Deviance         =    31.01
        Dispersion (Pearson) = .9921479                    Dispersion       = 1.033737
        
                                              (Std. err. adjusted for clustering on C1ID)
        ---------------------------------------------------------------------------------
                        |               Robust
        survival_30_d~s | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        ----------------+----------------------------------------------------------------
        casevol_12month |   .2665514   .1224213     2.18   0.029       .02661    .5064927
                  _cons |  -2.362354   .9104895    -2.59   0.009     -4.14688   -.5778273
        ---------------------------------------------------------------------------------
        
        . 
        . matrix V1= e(V)
        
        . 
        .  xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C2ID)
        warning: existing panel variable is not C2ID
        
        Iteration 1:  Tolerance = 5.934e-12
        
        GEE population-averaged model                      Number of obs    =       30
        Group variable: C2ID                               Number of groups =       17
        Family: Binomial                                   Obs per group:  
        Link:   Logit                                                   min =        1
        Correlation: independent                                        avg =      1.8
                                                                        max =        3
                                                           Wald chi2(1)     =     3.48
        Scale parameter = 1                                Prob > chi2      =   0.0622
        
        Pearson chi2(30)     =    29.76                    Deviance         =    31.01
        Dispersion (Pearson) = .9921479                    Dispersion       = 1.033737
        
                                              (Std. err. adjusted for clustering on C2ID)
        ---------------------------------------------------------------------------------
                        |               Robust
        survival_30_d~s | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        ----------------+----------------------------------------------------------------
        casevol_12month |   .2665514   .1429144     1.87   0.062    -.0135557    .5466584
                  _cons |  -2.362354   1.170613    -2.02   0.044    -4.656713   -.0679945
        ---------------------------------------------------------------------------------
        
        . 
        . matrix V2=e(V)
        
        . 
        . xtgee survival_30_days casevol_12month, family(bin) link(logit) corr(ind) robust i(C1C2ID)
        warning: existing panel variable is not C1C2ID
        
        Iteration 1:  Tolerance = 5.936e-12
        
        GEE population-averaged model                      Number of obs    =       30
        Group variable: C1C2ID                             Number of groups =       30
        Family: Binomial                                   Obs per group:  
        Link:   Logit                                                   min =        1
        Correlation: independent                                        avg =      1.0
                                                                        max =        1
                                                           Wald chi2(1)     =     4.74
        Scale parameter = 1                                Prob > chi2      =   0.0295
        
        Pearson chi2(30)     =    29.76                    Deviance         =    31.01
        Dispersion (Pearson) = .9921479                    Dispersion       = 1.033737
        
                                            (Std. err. adjusted for clustering on C1C2ID)
        ---------------------------------------------------------------------------------
                        |               Robust
        survival_30_d~s | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        ----------------+----------------------------------------------------------------
        casevol_12month |   .2665514   .1224213     2.18   0.029       .02661    .5064927
                  _cons |  -2.362354   .9104895    -2.59   0.009     -4.14688   -.5778273
        ---------------------------------------------------------------------------------
        
        . 
        . matrix V3=e(V) 
        
        . 
        . matrix V = V1 + V2 - V3
        
        . 
        . mat list V
        
        symmetric V[2,2]
                      casevol_12~h         _cons
        casevol_12~h     .02042452
               _cons    -.14375348     1.3703348
        
        . 
        . erepost V= V, rename
        
        . 
        . estimates replay
        
        ------------------------------------------------------------------------------------------------
        active results
        ------------------------------------------------------------------------------------------------
        
        GEE population-averaged model                      Number of obs    =       30
        Group variable: C1C2ID                             Number of groups =       30
        Family: Binomial                                   Obs per group:  
        Link:   Logit                                                   min =        1
        Correlation: independent                                        avg =      1.0
                                                                        max =        1
                                                           Wald chi2(1)     =     4.74
        Scale parameter = 1                                Prob > chi2      =   0.0295
        
        Pearson chi2(30)     =    29.76                    Deviance         =    31.01
        Dispersion (Pearson) = .9921479                    Dispersion       = 1.033737
        
                                            (Std. err. adjusted for clustering on C1C2ID)
        ---------------------------------------------------------------------------------
                        |               Robust
        survival_30_d~s | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        ----------------+----------------------------------------------------------------
        casevol_12month |   .2665514   .1429144     1.87   0.062    -.0135557    .5466584
                  _cons |  -2.362354   1.170613    -2.02   0.044    -4.656713   -.0679945
        ---------------------------------------------------------------------------------
        
        . 
        . mat list e(V)
        
        symmetric e(V)[2,2]
                      casevol_12~h         _cons
        casevol_12~h     .02042452
               _cons    -.14375348     1.3703348

        Comment


        • #5
          However, note that in the example in #3, V1 = V3, and therefore from the calculation, V = V2. In any case, as with any manipulation of estimation results, you are responsible for the correctness and validity of such a procedure.

          Comment


          • #6
            Originally posted by Andrew Musau View Post
            However, note that in the example in #3, V1 = V3, and therefore from the calculation, V = V2. In any case, as with any manipulation of estimation results, you are responsible for the correctness and validity of such a procedure.
            Many thanks, Andrew! This was exactly what I was looking for. You are right in the given example in #3, there is few clusters as you highlight. This is indeed different in the full dataset,

            Comment

            Working...
            X