Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effects of treatment switch in a longitudinal retrospective study

    Dear Statalist community,

    I am performing a research project that tries to determine if there has been any difference in several clinical variables after switching from one treatment to another. I am unsure about the statistical method chosen to answer that question, and I would be most grateful if you could help me see if the way I have proceeded is correct.

    In this study, there will be around 50 patients included, but I started a pilot study with only 10 of them before getting the whole data. Patients were seen once per year, and I have data in a yearly manner before and post to the switch. The main research question is about changes in renal variables: proteinuria values, ACR and PCT, and the direct measurement of the renal function mGFR, which are all numerical variables.

    Therefore, I longitudinally organised the data in a long format in Stata (Stata 17). A subset of the database looks like the following, including some yes/no clinical variables (presence of diabetes or hypertension) gender, age and age at switch. The switch time is identified by the variable beforeafter 0/1

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id year sex ageatswitch) int mgfr_ float proteinuria_ int pcr_ float(acr_ renalevents stroke whitematterlesion) byte(diabetes hypertension) float beforeafter
    1  1 1 50 115   .   .     . 0 1 0 0 1 0
    1  2 1 50  96   .   .     . 0 1 0 0 1 0
    1  3 1 50  94   .   .     . 0 1 0 0 1 0
    1  4 1 50  94 .09  10  2.05 0 1 0 0 1 0
    1  5 1 50  95   .   .     . 0 1 0 0 1 0
    1  6 1 50  89   .  23  2.85 0 1 0 0 1 0
    1  7 1 50 105 .33  35  5.16 0 1 0 0 1 0
    1  8 1 50 109   .   .     . 0 1 0 0 1 0
    1  9 1 50 103   .   .     . 0 1 0 0 1 0
    1 10 1 50 109   .   .     . 0 1 0 0 1 0
    1 11 1 50  96   .   .     . 0 1 0 0 1 0
    1 12 1 50 113   .   .     . 0 1 0 0 1 0
    1 13 1 50 108 .09   9   1.2 0 1 0 0 1 0
    1 14 1 50 105 .13  13   .84 0 1 0 0 1 0
    1 15 1 50  82 .12  12   1.8 0 1 0 0 1 1
    1 16 1 50  85 .11   .  1.44 0 1 0 0 1 1
    1 17 1 50   .   .   .     . 0 1 0 0 1 1
    1 18 1 50   .   .   9     . 0 1 0 0 1 1
    1 19 1 50   . .05   .  2.31 0 1 0 0 1 1
    1 20 1 50   .   .   .     . 0 1 0 0 1 1
    2  1 1 62  94   .   .     . 0 0 1 0 1 0
    2  2 1 62  89   .   .     . 0 0 1 0 1 0
    2  3 1 62   .   .  10     . 0 0 1 0 1 0
    2  4 1 62  92   .   7   .72 0 0 1 0 1 0
    2  5 1 62  84   .   .     . 0 0 1 0 1 0
    2  6 1 62  85   .   .     . 0 0 1 0 1 0
    2  7 1 62  96   0   .     . 0 0 1 0 1 0
    2  8 1 62  88   .   .     . 0 0 1 0 1 0
    2  9 1 62  93   .   .     . 0 0 1 0 1 0
    2 10 1 62 101   .   .     . 0 0 1 0 1 0
    2 11 1 62   .   .   .     . 0 0 1 0 1 0
    2 12 1 62  93   0   .     . 0 0 1 0 1 0
    2 13 1 62   .   .   .     . 0 0 1 0 1 0
    2 14 1 62  93 .11  16  1.27 0 0 1 0 1 0
    2 15 1 62  88 .09  14   .76 0 0 1 0 1 1
    2 16 1 62  80 .09  12   .57 0 0 1 0 1 1
    2 17 1 62   .   .   .     . 0 0 1 0 1 1
    2 18 1 62   .   .   .     . 0 0 1 0 1 1
    2 19 1 62   .   .   .     . 0 0 1 0 1 1
    2 20 1 62  79 .11  15  2.24 0 0 1 0 1 1
    3  1 0 42   .   .   .     . 0 0 0 0 0 0
    3  2 0 42  98   .   .     . 0 0 0 0 0 0
    3  3 0 42  94   .   .   .49 0 0 0 0 0 0
    3  4 0 42   . .32   7     . 0 0 0 0 0 0
    3  5 0 42  87   0   .   .44 0 0 0 0 0 0
    3  6 0 42  73   0   .     . 0 0 0 0 0 0
    3  7 0 42  93   0   .     . 0 0 0 0 0 0
    3  8 0 42  86   .   .     . 0 0 0 0 0 0
    3  9 0 42  92   0   .   .38 0 0 0 0 0 0
    3 10 0 42  94   .   .     . 0 0 0 0 0 0
    3 11 0 42  90   .   .   .72 0 0 0 0 0 0
    3 12 0 42 101 .12  10     . 0 0 0 0 0 0
    3 13 0 42 101 .11   8   .49 0 0 0 0 0 0
    3 14 0 42  91   0   8   .42 0 0 0 0 0 0
    3 15 0 42  98 .14  10   .48 0 0 0 0 0 1
    3 16 0 42  94 .28  19   .53 0 0 0 0 0 1
    3 17 0 42   .   .   .     . 0 0 0 0 0 1
    3 18 0 42  85 .15  13  1.61 0 0 0 0 0 1
    3 19 0 42   .  .1   9   1.1 0 0 0 0 0 1
    3 20 0 42   .   .   .     . 0 0 0 0 0 1
    4  1 1 57   .   .   .     . 0 0 1 1 1 0
    4  2 1 57  63   .   .     . 0 0 1 1 1 0
    4  3 1 57  52   .  16  1.89 0 0 1 1 1 0
    4  4 1 57  57   .   .   .65 0 0 1 1 1 0
    4  5 1 57  54   .   .     . 0 0 1 1 1 0
    4  6 1 57  56   .  18  5.07 0 0 1 1 1 0
    4  7 1 57  72   .   .   .61 0 0 1 1 1 0
    4  8 1 57   .   .   .     . 0 0 1 1 1 0
    4  9 1 57  75   .   .     . 0 0 1 1 1 0
    4 10 1 57  57   .   .     . 0 0 1 1 1 0
    4 11 1 57  67   .   .     . 0 0 1 1 1 0
    4 12 1 57  61   .  55 18.26 0 0 1 1 1 0
    4 13 1 57   .   .   .     . 0 0 1 1 1 0
    4 14 1 57  68   . 124 69.12 0 0 1 1 1 0
    4 15 1 57  41   .  68 32.22 0 0 1 1 1 1
    4 16 1 57  64   .   .     . 0 0 1 1 1 1
    4 17 1 57   .   .   .     . 0 0 1 1 1 1
    4 18 1 57   .   .  32  7.88 0 0 1 1 1 1
    4 19 1 57  48   .  34 17.07 0 0 1 1 1 1
    4 20 1 57   .   .  13   .92 0 0 1 1 1 1
    5  1 0 62   .   .   .     . 0 0 0 0 1 0
    5  2 0 62  82   .   .     . 0 0 0 0 1 0
    5  3 0 62  80   .   7     2 0 0 0 0 1 0
    5  4 0 62  76   .   .     . 0 0 0 0 1 0
    5  5 0 62   .   .   .     . 0 0 0 0 1 0
    5  6 0 62  85   .   .     . 0 0 0 0 1 0
    5  7 0 62  80   .   .  5.77 0 0 0 0 1 0
    5  8 0 62  76   .   .     . 0 0 0 0 1 0
    5  9 0 62  78   .   .     . 0 0 0 0 1 0
    5 10 0 62  84   .   .    76 0 0 0 0 1 0
    5 11 0 62   .   .   .     . 0 0 0 0 1 0
    5 12 0 62  78   .   .  4.93 0 0 0 0 1 0
    5 13 0 62  81  .1  15   7.1 0 0 0 0 1 0
    5 14 0 62   .   0   .  4.94 0 0 0 0 1 0
    5 15 0 62  77 .06   7  1.97 0 0 0 0 1 1
    5 16 0 62  67 .11  13  4.61 0 0 0 0 1 1
    5 17 0 62   .   .   .   5.3 0 0 0 0 1 1
    5 18 0 62   .   .   .     . 0 0 0 0 1 1
    5 19 0 62   .   .   .     . 0 0 0 0 1 1
    5 20 0 62   .   .   .     . 0 0 0 0 1 1
    end
    label values sex MaleFemale
    label def MaleFemale 0 "Male", modify
    label def MaleFemale 1 "Female", modify
    label values renalevents YesNo
    label values stroke YesNo
    label values whitematterlesion YesNo
    label values diabetes YesNo
    label values hypertension YesNo
    label def YesNo 0 "No", modify
    label def YesNo 1 "Yes", modify
    label values beforeafter beforeafter
    label def beforeafter 0 "Before", modify
    label def beforeafter 1 "After", modify
    Initially, I was driven towards the command treatment effects in Stata, as it defined what I intended to do. However, after reading the help file, I realised I needed a control group that had not switched treatment. All the data I have access to is patients who switched, so the idea would be to compare the data before and after acting each patient as their own control.

    To achieve this, I performed longitudinal regressions with the independent variable beforeafter, which identified the switch, and the clinical variable as the dependent one. To nullify any potential cofounder effect by age, I chose age as the time variable of the panel.

    As an example, it looked like this.

    Listed 100 out of 200 observations
    Use the count() option to list more
    Code:
     
    . xtset id age_
     
    Panel variable: id (weakly balanced)
     Time variable: age_, 22 to 74
             Delta: 1 unit
    Code:
    xtreg mgfr_ beforeafter, re
     
    Random-effects GLS regression                   Number of obs     =        135
    Group variable: id                              Number of groups  =         10
     
    R-squared:                                      Obs per group:
         Within  = 0.2163                                         min =          9
         Between = 0.0368                                         avg =       13.5
         Overall = 0.0339                                         max =         16
     
                                                    Wald chi2(1)      =      34.49
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
     
    ------------------------------------------------------------------------------
           mgfr_ | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
     beforeafter |  -9.922148   1.689414    -5.87   0.000    -13.23334   -6.610959
           _cons |   89.87322   8.022397    11.20   0.000     74.14961    105.5968
    -------------+----------------------------------------------------------------
         sigma_u |  25.333941
         sigma_e |  7.6829071
             rho |  .91577617   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    Then, I tried to adjust by possible cofounders like having diabetes or using some types of medication (ARB ACEi)
    Code:
     
    xtreg mgfr_ beforeafter  acei_ arb_ diabetes hypertension, re
     
    Random-effects GLS regression                   Number of obs     =        135
    Group variable: id                              Number of groups  =         10
     
    R-squared:                                      Obs per group:
         Within  = 0.2168                                         min =          9
         Between = 0.1903                                         avg =       13.5
         Overall = 0.1745                                         max =         16
     
                                                    Wald chi2(5)      =      35.50
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
     
    ------------------------------------------------------------------------------
           mgfr_ | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
     beforeafter |    -10.048   1.779538    -5.65   0.000    -13.53583   -6.560166
           acei_ |  -.0579371   2.240219    -0.03   0.979    -4.448686    4.332812
            arb_ |    .933376   3.457939     0.27   0.787    -5.844061    7.710813
        diabetes |  -28.92689   32.37343    -0.89   0.372    -92.37766    34.52387
    hypertension |  -6.379531   19.84124    -0.32   0.748    -45.26765    32.50859
           _cons |   96.56906    14.7622     6.54   0.000     67.63569    125.5024
    -------------+----------------------------------------------------------------
         sigma_u |  29.672306
         sigma_e |  7.7429727
             rho |  .93624663   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    So far, from these results, I interpreted that the switch negatively affected the renal function (mGFR) regardless of age and all the other covariates in the model.

    My overarching question is: Is longitudinal regression correct to answer my research question? If so, is putting age as the time variable of the panel the correct way to adjust by age?
    And finally…am I interpreting my results correctly? For qualitative variables, logistic regression would be used, as they are mainly 0/1 variables.

    Thank you very much for all your help and apologies for the long post!

    Best regards,

    David.

  • #2
    My overarching question is: Is longitudinal regression correct to answer my research question?
    Yes, but I think the choice of random effects is not wise here. This is not a randomized study, so the assumption that the random effects are independent of everything else is likely to be false. Also, your key explanatory variable, switching treatment, is a within-person effect because you have no control group. So a fixed-effects model would be much more appropriate here.

    If so, is putting age as the time variable of the panel the correct way to adjust by age?
    No. Putting age as the time variable in the -xtset- command does nothing at all to the -xtreg- command. All that does is make it possible for you to refer to lag and lead variables or estimate models with autoregressive correlation. To actually adjust for age, you have to explicitly include it as a variable in the regression. (In this regard, the time variable of -xtset- is handled very differently from the panel variable. The latter is automatically used as a fixed or random or grouping effect in -xt- commands. But the time variable is not.)

    So, I would revise this to:
    Code:
    xtreg mgfr_ beforeafter acei_ arb_ diabetes hypertension age_, fe
    


    And finally…am I interpreting my results correctly? For qualitative variables, logistic regression would be used, as they are mainly 0/1 variables.
    I don't understand your question here. Logistic regression is a commonly used option when the outcome variable is dichotomous. But your outcome, measured GFR is continuous. So logistic regression is inapplicable here.

    Note: Your example data set does not contain any age_ variable.

    Comment


    • #3
      Thank you so much Clyde for your comments. They are very constructive.

      Originally posted by Clyde Schechter View Post
      Yes, but I think the choice of random effects is not wise here. This is not a randomized study, so the assumption that the random effects are independent of everything else is likely to be false. Also, your key explanatory variable, switching treatment, is a within-person effect because you have no control group. So a fixed-effects model would be much more appropriate here.
      This probably stems from my understanding of the fe vs re model. It was hard for me to decide between them, but I was afraid that using fe would not allow me to somehow analyse the time-varying effects of some variables.

      Originally posted by Clyde Schechter View Post
      No. Putting age as the time variable in the -xtset- command does nothing at all to the -xtreg- command. All that does is make it possible for you to refer to lag and lead variables or estimate models with autoregressive correlation. To actually adjust for age, you have to explicitly include it as a variable in the regression. (In this regard, the time variable of -xtset- is handled very differently from the panel variable. The latter is automatically used as a fixed or random or grouping effect in -xt- commands. But the time variable is not.)

      So, I would revise this to:
      Code:
      xtreg mgfr_ beforeafter acei_ arb_ diabetes hypertension age_, fe
      
      Thank you very much, I will do so. Not sure why I decided to skip age_ in the dataset example, but it is there

      Originally posted by Clyde Schechter View Post
      I don't understand your question here. Logistic regression is a commonly used option when the outcome variable is dichotomous. But your outcome, measured GFR is continuous. So logistic regression is inapplicable here.
      Apologies, my question was not very clear. I plan to perform both linear and logistic regressions depending on the type of dependent variable, as you say. First, I would do the univariate regression and then an adjusted one with important clinical variables like age.

      This means there will be the following kind of regressions

      Linear regressions: 1- Univariate mGFR as dependent and switch variable as independent
      2- Adjusted by clinical variables mGFR as dependent and switch variable + hypertension + age + diabetes as independent ones


      Logistic regressions (for example, stroke yes/no) : 1- Univariate stroke yes/no as dependent and switch variable as independent
      2- Adjusted by clinical variables, stroke yes/no as dependent and switch + hypertension + age + diabetes as independent ones

      The idea is to present a table with all the results and see if the switch has affected any of those variables when adjusted by age and comorbidities.


      Comment


      • #4
        This probably stems from my understanding of the fe vs re model. It was hard for me to decide between them, but I was afraid that using fe would not allow me to somehow analyse the time-varying effects of some variables.
        It is true that the -fe- model will not be able to estimate analyze the effects of variables that do not vary over time within patients. But it is also true that the -re- model will not correctly estimate the effect of a within-person variable like the switch either. If estimating the effect of switch is the goal, then the -fe- analysis is the appropriate one. Concerning other non-time-varying variables, what is your interest in them? Do you really want to also analyze their effects? Or do you just want to include them to reduce confounding bias? If reducing confounding is the concern, you can forget about these variables in the -fe- model, because the fixed effects analysis automatically adjusts for the confounding effects of all non-time-varying effects--even ones that are unobserved!

        If you really need to estimate both the effect of the switch variable (a purely within-person effect) and you also need, in order to answer your research question, to estimate the effects of other variables that do not vary over time within person, then you should use the hybrid model. The -xthybrid- command, available from SSC, will do that.

        Comment


        • #5
          Dear Clyde,

          Thank you very much for your advice. I know it has been a while (almost half a year), but I have kept working on this study following your line of thought, and I just wanted to double-check everything was ok.

          1. As the main research question is how the treatment switch affects three renal variables: mgfr_ (renal function), acr_ (albumin to creatinine ratio) and pcr_ (protein to creatinine ratio), I performed univariate and then adjusted regression models for age and treatments with -fe- as this allowed me to analyse the switch impact within-subject and all the time-invariant variables (most of the comorbidities that could impact on those renal variables would qualify as this in our database as there have been no change over time- e.g. diabetes, hypertension)

          This means, that for mgfr_, the results looked like this.

          1.1 Univariate analysis

          Code:
           xtreg mgfr_ beforeafter_, fe
          
          Fixed-effects (within) regression               Number of obs     =        359
          Group variable: id                              Number of groups  =         41
          
          R-squared:                                      Obs per group:
               Within  = 0.2308                                         min =          3
               Between = 0.0134                                         avg =        8.8
               Overall = 0.0383                                         max =         16
          
                                                          F(1,317)          =      95.12
          corr(u_i, Xb) = -0.0024                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                 mgfr_ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
          beforeafter_ |  -9.736224    .998266    -9.75   0.000    -11.70029    -7.77216
                 _cons |   93.03396   .5369342   173.27   0.000     91.97756    94.09037
          -------------+----------------------------------------------------------------
               sigma_u |  19.458213
               sigma_e |  8.2494898
                   rho |  .84764335   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(40, 317) = 58.37                    Prob > F = 0.0000
          
          . xtreg mgfr_ age_, fe
          
          Fixed-effects (within) regression               Number of obs     =        359
          Group variable: id                              Number of groups  =         41
          
          R-squared:                                      Obs per group:
               Within  = 0.1345                                         min =          3
               Between = 0.3873                                         avg =        8.8
               Overall = 0.3679                                         max =         16
          
                                                          F(1,317)          =      49.25
          corr(u_i, Xb) = 0.2728                          Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                 mgfr_ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                  age_ |  -.8500573   .1211271    -7.02   0.000    -1.088372   -.6117427
                 _cons |   132.8392   6.126086    21.68   0.000     120.7863    144.8922
          -------------+----------------------------------------------------------------
               sigma_u |  15.530132
               sigma_e |  8.7508762
                   rho |  .75900946   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(40, 317) = 30.80                    Prob > F = 0.0000
          
          . xtreg mgfr_ acei_, fe
          
          Fixed-effects (within) regression               Number of obs     =        359
          Group variable: id                              Number of groups  =         41
          
          R-squared:                                      Obs per group:
               Within  = 0.0113                                         min =          3
               Between = 0.0091                                         avg =        8.8
               Overall = 0.0075                                         max =         16
          
                                                          F(1,317)          =       3.63
          corr(u_i, Xb) = -0.1771                         Prob > F          =     0.0575
          
          ------------------------------------------------------------------------------
                 mgfr_ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                 acei_ |  -3.886873   2.039168    -1.91   0.058    -7.898886    .1251397
                 _cons |    91.1928   .8097128   112.62   0.000     89.59971    92.78589
          -------------+----------------------------------------------------------------
               sigma_u |  19.801387
               sigma_e |  9.3526911
                   rho |  .81760069   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(40, 317) = 45.30                    Prob > F = 0.0000
          
          . xtreg mgfr_ arb_, fe
          
          Fixed-effects (within) regression               Number of obs     =        359
          Group variable: id                              Number of groups  =         41
          
          R-squared:                                      Obs per group:
               Within  = 0.0032                                         min =          3
               Between = 0.1301                                         avg =        8.8
               Overall = 0.1041                                         max =         16
          
                                                          F(1,317)          =       1.03
          corr(u_i, Xb) = 0.3065                          Prob > F          =     0.3119
          
          ------------------------------------------------------------------------------
                 mgfr_ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                  arb_ |  -3.115736   3.076006    -1.01   0.312    -9.167704    2.936231
                 _cons |   90.34255   .6175751   146.29   0.000     89.12749    91.55762
          -------------+----------------------------------------------------------------
               sigma_u |  19.280024
               sigma_e |  9.3909506
                   rho |  .80824489   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(40, 317) = 39.73                    Prob > F = 0.0000
          1.2 Multivariate analysis

          Code:
          xtreg mgfr_ beforeafter_ age_ arb_ acei_ , fe
          
          Fixed-effects (within) regression               Number of obs     =        359
          Group variable: id                              Number of groups  =         41
          
          R-squared:                                      Obs per group:
               Within  = 0.2358                                         min =          3
               Between = 0.1302                                         avg =        8.8
               Overall = 0.0944                                         max =         16
          
                                                          F(4,314)          =      24.22
          corr(u_i, Xb) = 0.1053                          Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                 mgfr_ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
          beforeafter_ |  -8.920066   1.413432    -6.31   0.000    -11.70106   -6.139071
                  age_ |  -.1453183   .1769413    -0.82   0.412    -.4934588    .2028222
                  arb_ |   2.146537   3.029985     0.71   0.479    -3.815103    8.108176
                 acei_ |  -1.223324   1.979288    -0.62   0.537    -5.117668     2.67102
                 _cons |   100.2337   8.325052    12.04   0.000     83.85374    116.6136
          -------------+----------------------------------------------------------------
               sigma_u |  18.794804
               sigma_e |  8.2619084
                   rho |  .83805809   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(40, 314) = 32.73                    Prob > F = 0.0000
          1.3 I repeat the above changing the dependent variable to acr_ and then to pcr_


          One question I had when completing this is: It is known that acr_ and pcr_ might affect the renal function, mgfr_. This is why whenever there are high acr_ or pcr_ values treatment is started (which is arb_ or acei_), to avoid any decline in renal function (mgfr_)

          I checked the correlation among those three variables (acr_, pcr_ and mgfr_) and it was as follows:

          Code:
          pwcorr acr_ pcr_ mgfr_, sig
          
                       |     acr_     pcr_    mgfr_
          -------------+---------------------------
                  acr_ |   1.0000 
                       |
                       |
                  pcr_ |   0.8695   1.0000 
                       |   0.0000
                       |
                 mgfr_ |  -0.3406  -0.3407   1.0000 
                       |   0.0000   0.0000
                       |
          I guess this means that it makes no sense to add both acr_ and pcr_ to the multivariate -fe- regression model, as they are correlated among themselves to a high degree. Would it make sense to add just one of them or given the correlation with mgfr_ is better to avoid that?

          If one is added, it would look like this: (There is a drop in the total number of subjects as I have to complete the data collection of this variable in two patients)

          Code:
          xtreg mgfr_ beforeafter_ age_ arb_ acei_ acr_   , fe
          
          Fixed-effects (within) regression               Number of obs     =        196
          Group variable: id                              Number of groups  =         39
          
          R-squared:                                      Obs per group:
               Within  = 0.3410                                         min =          1
               Between = 0.2957                                         avg =        5.0
               Overall = 0.2370                                         max =         11
          
                                                          F(5,152)          =      15.73
          corr(u_i, Xb) = 0.1741                          Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                 mgfr_ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
          beforeafter_ |  -7.292441   1.842064    -3.96   0.000     -10.9318   -3.653087
                  age_ |  -.5720357   .2574974    -2.22   0.028    -1.080772   -.0632997
                  arb_ |   3.064846   4.548605     0.67   0.501    -5.921805     12.0515
                 acei_ |   .8018237   2.735746     0.29   0.770    -4.603173     6.20682
                  acr_ |   .0660583   .0565705     1.17   0.245    -.0457077    .1778243
                 _cons |   119.6569   12.24487     9.77   0.000     95.46477     143.849
          -------------+----------------------------------------------------------------
               sigma_u |  16.742131
               sigma_e |  7.8220429
                   rho |  .82082769   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(38, 152) = 19.69                    Prob > F = 0.0000
          2. Conclusion: Following your advice, the study could conclude that the switch impacted mgfr negatively, and no need of further tests are required. If we added aacr_ age would be also associated with a drop in renal function.


          3. However, I can see the journal reviewers saying that we have not adjusted by several non varying variables such as specific genotypes, diabetes, hypertension...one way I sorted this was to perform the above mentioned analysis separating by that variable - using if hypertension==0 or 1. Another way I tried was your suggestion of the xthybrid model. I will show the results adding three new variables (hypertension yes/no, diabetes yes/no and the presence of a specific genotype (N215S yes no)). I will also post the results with and without the acr_ variable

          Code:
          
          . xthybrid mgfr_ beforeafter_ age_ arb_ acei_ hypertension diabetes n215s , cluster(id) full
          
          The variable 'hypertension' does not vary sufficiently within clusters
          and will not be used to create additional regressors.
          [~0% of the total variance in 'hypertension' is within clusters]
          The variable 'diabetes' does not vary sufficiently within clusters
          and will not be used to create additional regressors.
          [~0% of the total variance in 'diabetes' is within clusters]
          The variable 'n215s' does not vary sufficiently within clusters
          and will not be used to create additional regressors.
          [~0% of the total variance in 'n215s' is within clusters]
          
          ------------------------------------------------------------------------------------------------------------------------
          Model model
          ------------------------------------------------------------------------------------------------------------------------
          
          Mixed-effects GLM                               Number of obs     =        359
          Family: Gaussian
          Link:   Identity
          Group variable: id                              Number of groups  =         41
          
                                                          Obs per group:
                                                                        min =          3
                                                                        avg =        8.8
                                                                        max =         16
          
          Integration method: mvaghermite                 Integration pts.  =          7
          
                                                          Wald chi2(11)     =     141.05
          Log likelihood = -1328.4987                     Prob > chi2       =     0.0000
          ---------------------------------------------------------------------------------
                    mgfr_ | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
          ----------------+----------------------------------------------------------------
          R__hypertension |  -.2456044   5.811186    -0.04   0.966    -11.63532    11.14411
              R__diabetes |  -2.574405   9.450068    -0.27   0.785     -21.0962    15.94739
                 R__n215s |   8.204856    4.57686     1.79   0.073    -.7656253    17.17534
          W__beforeafter_ |  -8.920067   1.404061    -6.35   0.000    -11.67198   -6.168158
                  W__age_ |  -.1453183   .1757682    -0.83   0.408    -.4898175     .199181
                  W__arb_ |   2.146536   3.009895     0.71   0.476     -3.75275    8.045823
                 W__acei_ |  -1.223324   1.966165    -0.62   0.534    -5.076937    2.630288
          B__beforeafter_ |   16.65406   14.25207     1.17   0.243    -11.27949     44.5876
                  B__age_ |  -1.167644   .2313999    -5.05   0.000    -1.621179   -.7141082
                  B__arb_ |   -11.7572   10.74242    -1.09   0.274    -32.81195    9.297545
                 B__acei_ |   11.88517   6.480439     1.83   0.067     -.816255     24.5866
                    _cons |   137.4677   10.30781    13.34   0.000     117.2647    157.6706
          ----------------+----------------------------------------------------------------
          id              |
                var(_cons)|   176.0467   40.69536                      111.9083     276.945
          ----------------+----------------------------------------------------------------
              var(e.mgfr_)|   67.35699   5.338296                      57.66625    78.67624
          ---------------------------------------------------------------------------------
          LR test vs. linear model: chibar2(01) = 346.06        Prob >= chibar2 = 0.0000
          
          . xthybrid mgfr_ beforeafter_ age_ arb_ acei_ hypertension diabetes n215s acr_, cluster(id) full
          
          The variable 'hypertension' does not vary sufficiently within clusters
          and will not be used to create additional regressors.
          [~0% of the total variance in 'hypertension' is within clusters]
          The variable 'diabetes' does not vary sufficiently within clusters
          and will not be used to create additional regressors.
          [~0% of the total variance in 'diabetes' is within clusters]
          The variable 'n215s' does not vary sufficiently within clusters
          and will not be used to create additional regressors.
          [~0% of the total variance in 'n215s' is within clusters]
          
          ------------------------------------------------------------------------------------------------------------------------
          Model model
          ------------------------------------------------------------------------------------------------------------------------
          
          Mixed-effects GLM                               Number of obs     =        196
          Family: Gaussian
          Link:   Identity
          Group variable: id                              Number of groups  =         39
          
                                                          Obs per group:
                                                                        min =          1
                                                                        avg =        5.0
                                                                        max =         11
          
          Integration method: mvaghermite                 Integration pts.  =          7
          
                                                          Wald chi2(13)     =     137.96
          Log likelihood = -725.70903                     Prob > chi2       =     0.0000
          ---------------------------------------------------------------------------------
                    mgfr_ | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
          ----------------+----------------------------------------------------------------
          R__hypertension |  -.0568046   5.425692    -0.01   0.992    -10.69097    10.57736
              R__diabetes |   5.783071    9.28574     0.62   0.533    -12.41664    23.98279
                 R__n215s |   5.336911   4.315912     1.24   0.216    -3.122121    13.79594
          W__beforeafter_ |  -7.292441   1.807035    -4.04   0.000    -10.83417   -3.750718
                  W__age_ |  -.5720357   .2526008    -2.26   0.024    -1.067124   -.0769472
                  W__arb_ |   3.064846   4.462108     0.69   0.492    -5.680726    11.81042
                 W__acei_ |   .8018237   2.683723     0.30   0.765    -4.458176    6.061824
                  W__acr_ |   .0660583   .0554948     1.19   0.234    -.0427094    .1748261
          B__beforeafter_ |   21.55497   12.04096     1.79   0.073    -2.044873    45.15481
                  B__age_ |  -1.331912   .2558422    -5.21   0.000    -1.833353   -.8304706
                  B__arb_ |  -4.411262   8.764742    -0.50   0.615    -21.58984    12.76732
                 B__acei_ |   10.95928   6.168537     1.78   0.076    -1.130826     23.0494
                  B__acr_ |  -.5920332    .258794    -2.29   0.022     -1.09926   -.0848063
                    _cons |   146.3463   10.70562    13.67   0.000     125.3636    167.3289
          ----------------+----------------------------------------------------------------
          id              |
                var(_cons)|   140.4485   34.31384                      87.00717    226.7146
          ----------------+----------------------------------------------------------------
              var(e.mgfr_)|   58.87951   6.607895                      47.25373    73.36558
          ---------------------------------------------------------------------------------
          LR test vs. linear model: chibar2(01) = 163.33        Prob >= chibar2 = 0.0000
          In the end, the results do not vary from the -fe- model, and we can safely say that neither of the new variables affects renal function. I have to say that, if I interpreted it correctly, the xthybrid model makes things much easier: no need to calculate the -fe- model and do analysis by different groups. However, would it make sense to present the results with both the -fe- and xthybrid models to show that those non-variant variables do not affect? Or would you skip then the -fe- model as the xthybrid would calculate the coefficient for the time-varying variables as well?


          Apologies for the long post and thank you very much for your help as always, I do really appreciate it!




          Comment


          • #6
            I just want to say that you have done a great job with your analysis David Meldon! I am not a medical researcher myself, but I use these types of models all the time in my areas of research (psychology and education), and I would say that the hybrid model with acr_ included is the one I would present. As you said, the journal reviewers will probably be grumpy if they don't see those non-varying, person level comorbidities, and the hybrid model let's you include them.

            What I think is so cool about these hybrid models is that by separating the within from the between part of the key IV (beforeafter), the comorbidities only impact the between person part of mgfr_ (average mgfr_ levels among persons) and thus only covary with the between person part of beforeafter_. This might be worth highlighting in the writeup of your analysis. You have probably read the Schunck & Perales (2017) article on xthybrid previously, but if not, it is worth a read and should be cited in your paper.
            Last edited by Erik Ruzek; 30 Jan 2024, 07:28. Reason: Fixed grammatical error

            Comment


            • #7
              I agree with everything Erik Ruzek said.

              I just want to digress a bit in response to
              I guess this means that it makes no sense to add both acr_ and pcr_ to the multivariate -fe- regression model, as they are correlated among themselves to a high degree. Would it make sense to add just one of them or given the correlation with mgfr_ is better to avoid that?
              The answer is that it depends on what you are trying to do. If your goal is to develop a model that predict mgfr_, then the best action would be to add both acr_ and pcr_ to the model. The fact that they are highly correlated in no way contradicts the fact that you will explain more outcome variance with more predictor variables.

              If, however, your goal is to estimate the effects of acr_ and pcr_ on mgfr_, then the correct answer is: add them both and then observe whether the standard errors are small enough (equivalently, the confidence intervals narrow enough) around these effect estimates that you have useful effect estimates. If you do, then retain them both in the model and move on. If you don't, then you have a serious multicolinearity problem involving acr_ and pcr_. The implication of that is that you cannot obtain usable estimates of their effects with this data. If you try to remove one, the standard error of the coefficient of the other will look better, but it is an illusion, because you will have introduced confounding (omitted variable bias). In fact, the only solution to this problem is to get a much larger data set where, notwithstanding the strong correlation, you can get sufficiently tight bounds on the effect estimates. Alternatively, if possible, one might go to a different study design altogether where participants are selected on acr_ and pcr_ values in such a way as to assure that these are not strongly correlated in the study sample. Both of these approaches, however, involve gathering new data, typically a large amount of new data, and are usually impractical.

              Comment


              • #8
                Thank you very much to both of you for your insight, it has been vital!

                I have been reviewing the xthybrid models' output from my post above, and I could not find the answer to a question that arose from looking at the output.

                The xthybrid model shows the between and within effects separated. I understand that to answer the question "Does the treatment switch affect gfr?" the within effects model is the way to go as you will compare each subject with itself regarding the treatment change. However, the between-effects model shows a coefficient for mgfr that is strongly positive albeit non-statistically significant. How is it possible that the within-effects model estimates a significant negative coefficient for mGFR, but a positive one is shown in the between-effects model? Even if the answer is the inclusion of the -re- covariates, I could not understand how this was the case.

                Am I understanding something wrongly?

                Thank you very much as always!

                Comment


                • #9
                  Run the following to visualize how the within- and between- effects of the same variable can have opposite signs. It is obviously a silly, artificial example--but it is crystal clear, and the same thing can and does happen in real world data.

                  Code:
                  clear
                  set obs 5
                  gen panel_id = _n
                  expand 2
                  
                  set seed 1234
                  by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5)
                  by panel_id: gen x = panel_id + _n
                  
                  xtset panel_id
                  
                  xtreg y x, fe
                  regress y x
                  
                  //    GRAPH THE DATA TO SHOW WHAT'S HAPPENING
                  separate y, by(panel_id)
                  
                  graph twoway connect y? x || lfit y x
                  
                  browse

                  Comment


                  • #10
                    In addition to Clyde's demonstration, consider other examples of how the within-person and between person associations might be different. The problem of inferring that a relation between two variables observed at the group level is the same as the relation between the same variables at the individual level is known as the ecological fallacy. One epidemiological example concerns the association between exercise and cardiac arrest. While doing intense exercise, an individual is at an elevated risk of cardiac arrest. But, on average, people who exercise more have lower risk of cardiac arrest.

                    Comment

                    Working...
                    X