Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyzing two time points with regression

    I am analyzing data with the mental health component score (measured 0-100) for 4500 individuals. This score was measured at baseline and then 30 days following an intervention.
    I would like to use regression to see the average change in score over time.

    I used the following codes to reshape my data:

    rename preop_mcs test1
    rename x30d_mcs test2
    reshape long test, i(sampleid) j(time)
    xtset sampleid
    xtreg test

    I then conducted a mixed effects regression with the following code:

    meglm test time || sampleid:

    Then I added covariates:

    meglm test time age sex smoking race bmi || sampleid:

    I've noticed no matter how many or which covariates I've added the coefficient for time does not change. Because I know age is significantly associated with the score from my previous analysis, I'd expect to see at least a little change. This makes me think something is wrong with my coding and/or approach.

    I appreciate any guidance with the above coding and how to proceed with a regression.


  • #2
    Those are all characteristics of the patient, time-invariant over the course of the month-long pre-post interval. They'll affect the intercept (constant) regression coefficient, but not the regression coefficient for the time-varying variables.

    Comment


    • #3
      Katie:
      why not simply considering -xtreg- with squared age to investigate possible turning points?
      Code:
      xtset sampleid time
      xtreg test i.time c.age##c.age i.sex i.smoking i.race bmi, fe
      estimates store fe
      xtreg test i.time c.age##c.age i.sex i.smoking i.race bmi, re
      estimates store re
      hausman fe re
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you both.
        Carlo, I want to share my results based on the code you provided (thank you!), in case you can tell whether something is wrong with my time or test variables or if it just doesn't make sense what I'm trying to look at.

        . xtset sampleid time
        panel variable: sampleid (strongly balanced)
        time variable: time, 1 to 2
        delta: 1 unit

        . xtreg test i.time c.age##c.age i.sex i.smoking i.race bmi, fe
        note: age omitted because of collinearity
        note: c.age#c.age omitted because of collinearity
        note: 2.sex omitted because of collinearity
        note: 1.smoking omitted because of collinearity
        note: 2.smoking omitted because of collinearity
        note: 13.race omitted because of collinearity
        note: 14.race omitted because of collinearity
        note: bmi omitted because of collinearity

        Fixed-effects (within) regression Number of obs = 9,000
        Group variable: sampleid Number of groups = 4,500

        R-sq: Obs per group:
        within = 0.0019 min = 2
        between = . avg = 2.0
        overall = 0.0004 max = 2

        F(1,4499) = 8.54
        corr(u_i, Xb) = 0.0000 Prob > F = 0.0035

        ------------------------------------------------------------------------------
        test | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        2.time | .4408889 .1509001 2.92 0.003 .1450505 .7367273
        age | 0 (omitted)
        |
        c.age#c.age | 0 (omitted)
        |
        2.sex | 0 (omitted)
        |
        smoking |
        1 | 0 (omitted)
        2 | 0 (omitted)
        |
        race |
        13 | 0 (omitted)
        14 | 0 (omitted)
        |
        bmi | 0 (omitted)
        _cons | 54.11133 .1067025 507.12 0.000 53.90214 54.32052
        -------------+----------------------------------------------------------------
        sigma_u | 9.2393184
        sigma_e | 7.157822
        rho | .62492948 (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(4499, 4499) = 3.33 Prob > F = 0.0000

        . estimates store fe

        . xtreg test i.time c.age##c.age i.sex i.smoking i.race bmi, re

        Random-effects GLS regression Number of obs = 9,000
        Group variable: sampleid Number of groups = 4,500

        R-sq: Obs per group:
        within = 0.0019 min = 2
        between = 0.0361 avg = 2.0
        overall = 0.0282 max = 2

        Wald chi2(9) = 176.58
        corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

        ------------------------------------------------------------------------------
        test | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        2.time | .4408889 .1509001 2.92 0.003 .1451301 .7366477
        age | .0065327 .0645744 0.10 0.919 -.1200308 .1330962
        |
        c.age#c.age | .0010588 .0005528 1.92 0.055 -.0000247 .0021422
        |
        2.sex | .0679355 .2799326 0.24 0.808 -.4807224 .6165933
        |
        smoking |
        1 | -1.459797 .5098573 -2.86 0.004 -2.459099 -.4604951
        2 | -.18731 .3050072 -0.61 0.539 -.7851131 .410493
        |
        race |
        13 | -.3810851 .4787068 -0.80 0.426 -1.319333 .5571629
        14 | .2149739 1.363787 0.16 0.875 -2.457999 2.887947
        |
        bmi | .0186033 .0187558 0.99 0.321 -.0181575 .055364
        _cons | 49.03913 1.91745 25.58 0.000 45.281 52.79726
        -------------+----------------------------------------------------------------
        sigma_u | 7.5376052
        sigma_e | 7.157822
        rho | .52582638 (fraction of variance due to u_i)
        ------------------------------------------------------------------------------

        . estimates store re

        . hausman fe re

        ---- Coefficients ----
        | (b) (B) (b-B) sqrt(diag(V_b-V_B))
        | fe re Difference S.E.
        -------------+----------------------------------------------------------------
        2.time | .4408889 .4408889 4.70e-12 .
        ------------------------------------------------------------------------------
        b = consistent under Ho and Ha; obtained from xtreg
        B = inconsistent under Ha, efficient under Ho; obtained from xtreg

        Test: Ho: difference in coefficients not systematic

        chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
        = -0.00 chi2<0 ==> model fitted on these
        data fails to meet the asymptotic
        assumptions of the Hausman test;
        see suest for a generalized test

        Comment


        • #5
          Katie:
          as expected, the -fe- estimator wipes out all the time-invariant predictors (eg, sex, smoking and race). While is clear that -race- and, in general, -sex- do not change within the same panel as time goes by, -smoking- is wiped out due to the fact that no one give up smoking from the 1st to the 2nd wave.
          In addition, -re- tells that there's no evidence of a quadratic effect of -age- on the regressand: hence, I think you can stay with the linear term only.
          That said, I would test whether -re- is actually the way to go via the community-contributed programme -xtoverid- (just type -search xtoverid- from within Stata to spot and install it).
          Being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation.
          The usual trick is to prefix your code by -xi:-, as you can see from the following toy-example:
          Code:
          . use "http://www.stata-press.com/data/r15/nlswork.dta"
          (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
          
          . xi: xtreg ln_wage age i.race
          i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)
          
          Random-effects GLS regression                   Number of obs     =     28,510
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1026                                         min =          1
               between = 0.1032                                         avg =        6.1
               overall = 0.0945                                         max =         15
          
                                                          Wald chi2(3)      =    3242.34
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
          ------------------------------------------------------------------------------
               ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |    .018534    .000331    55.99   0.000     .0178852    .0191828
              _Irace_2 |  -.1209428   .0129079    -9.37   0.000    -.1462418   -.0956439
              _Irace_3 |   .0981941   .0538424     1.82   0.068    -.0073351    .2037233
                 _cons |    1.15423   .0118069    97.76   0.000     1.131089    1.177371
          -------------+----------------------------------------------------------------
               sigma_u |  .36581626
               sigma_e |  .30349389
                   rho |  .59231394   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . xtoverid
          
          Test of overidentifying restrictions: fixed vs random effects
          Cross-section time-series model: xtreg re  
          Sargan-Hansen statistic  14.662  Chi-sq(1)    P-value = 0.0001
          
          .
          -xtoverid- outcome points out to -fe- specification (being the null: -re- is the way to go).

          Hence, in you case:
          Code:
          xi: xtreg test i.time c.age##c.age i.sex i.smoking i.race bmi, re
          xtoverid
          As you might have already seen, with -xtoverid- there's no need to run -xtreg,fe- too to test whether -re- is the way to go.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you very much for your explanation of my results, Carlo! When I attempted xtoverid, the error code read: cage#c: operator invalid

            Comment


            • #7
              Katie:
              -xtoverid-'s age bites back!
              Create interaction by hand:
              Code:
              g sq_age=age*age
              and replace with it the previous chunck of code (which is, by the way, always the preferred way for creating categorical varaibles and interactions).

              Code:
              c.age##c.age
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X