Multiple linear regression for paired samples, possible?

Tom Hsiung

Join Date: Sep 2017

Posts: 153
#1

Multiple linear regression for paired samples, possible?

03 Jan 2019, 08:46

Hello, guys

The response variable is average daily dose. I measured it at two time points, that during week 1 and during week 1 later. At first glance I shall use paired-t test. However, beside the group difference of time (week 1 vs. week 1 later), there are other difference, which must be adjusted. So I thought if linear regression could be used to fix my issue. But, independent observations is a requirement for linear regression. So what to do?

Tom
Tags: None
Tom Hsiung

Join Date: Sep 2017

Posts: 153
#2

03 Jan 2019, 08:49

Sample example:

observation #, patient name, week1 dose (mg), week 1+ dose (mg)
1, Jim, 3.35, 4.58
2, John, 2.74, 3.21
...
...
...
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#3

03 Jan 2019, 09:22

Tom:
as the same patients are measured twice on the same outcome, observations are not independent as you stated.
If you want to go (pooled) -regress- the usual fix is to -cluster- standard errors on -panelid- (that is, John and Jim).
Due to the panel structure of your data, I would also consider -xtreg-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tom Hsiung

Join Date: Sep 2017

Posts: 153
#4

03 Jan 2019, 09:32

Thank you, Carlo.

According to your answer, the key word is 'cluster', correct? However, could you tell me the statistical logics/methods behind?

Tom
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#5

03 Jan 2019, 09:40

here is another alternative; just as in a one-sample (paired) t-test, generate a new variable equal to the difference in the two measures and that becomes your new outcome (dependent) variable in your regression
Comment
Tom Hsiung

Join Date: Sep 2017

Posts: 153
#6

03 Jan 2019, 09:51

Originally posted by Rich Goldstein View Post

here is another alternative; just as in a one-sample (paired) t-test, generate a new variable equal to the difference in the two measures and that becomes your new outcome (dependent) variable in your regression

Thank you, Rich. Unfortunately, beside the time variable (so called, longitudinal studies), there are other variables which specific their groups. These other variables must be adjusted, which a paired t-test cannot achieve.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

03 Jan 2019, 09:54

Tom:
the statistical logic (that can be easily retrieved from any decent statistics textbook; see for instancer Chapter 31 in https://www.wiley.com/en-it/Essentia...9780865428713; Chapter 7 of the same textbook covers the one-sample paired ttest that Rich suggested), is that observations belonging to the same id () are more similar (as they share common individual effect which are left unexplained by the set of predictors included in the right-hand side of the regression equation) than observations belonging to the remaining ids. Clustering standard errors takes this feature into account.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tom Hsiung

Join Date: Sep 2017

Posts: 153
#8

03 Jan 2019, 10:09

OK. I will read it. Later. However, I found a webpage from Harvard about this. See: https://catalyst.harvard.edu/docs/bi...hop-Slides.pdf
Comment

Rich Goldstein

Join Date: Mar 2014
Posts: 4466

03 Jan 2019, 10:38

what I gave you was a way to use regression to (1) reproduce a paired t-test and (2) extend by adding additional covariates

I don't have the faintest idea what "there are other variables which specific their groups" means and thus can't respond to it

here is an example of, first, reproducing a paired t-test with a regression and then adding additional predictors:

Code:

. sysuse bpwide
(fictional blood-pressure data)
r; t=0.01 12:35:30

. gen diff=bp_after-bp_before
r; t=0.04 12:35:56

. ttest bp_before=bp_after

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
bp_bef~e |     120      156.45    1.039746    11.38985    154.3912    158.5088
bp_after |     120    151.3583    1.294234    14.17762    148.7956     153.921
---------+--------------------------------------------------------------------
    diff |     120    5.091667    1.525736     16.7136    2.070557    8.112776
------------------------------------------------------------------------------
     mean(diff) = mean(bp_before - bp_after)                      t =   3.3372
 Ho: mean(diff) = 0                              degrees of freedom =      119

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.9994         Pr(|T| > |t|) = 0.0011          Pr(T > t) = 0.0006
r; t=0.00 12:36:15

. ttest diff=0

One-sample t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
    diff |     120   -5.091667    1.525736     16.7136   -8.112776   -2.070557
------------------------------------------------------------------------------
    mean = mean(diff)                                             t =  -3.3372
Ho: mean = 0                                     degrees of freedom =      119

    Ha: mean < 0                 Ha: mean != 0                 Ha: mean > 0
 Pr(T < t) = 0.0006         Pr(|T| > |t|) = 0.0011          Pr(T > t) = 0.9994
r; t=0.02 12:36:25

. regress diff

      Source |       SS           df       MS      Number of obs   =       120
-------------+----------------------------------   F(0, 119)       =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |  33241.9917       119  279.344468   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |  33241.9917       119  279.344468   Root MSE        =    16.714

------------------------------------------------------------------------------
        diff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |  -5.091667   1.525736    -3.34   0.001    -8.112776   -2.070557
------------------------------------------------------------------------------
r; t=0.21 12:36:37

. regress diff i.sex

      Source |       SS           df       MS      Number of obs   =       120
-------------+----------------------------------   F(1, 118)       =      0.77
       Model |  216.008333         1  216.008333   Prob > F        =    0.3815
    Residual |  33025.9833       118  279.881215   R-squared       =    0.0065
-------------+----------------------------------   Adj R-squared   =   -0.0019
       Total |  33241.9917       119  279.344468   Root MSE        =     16.73

------------------------------------------------------------------------------
        diff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     Female  |  -2.683333   3.054402    -0.88   0.381    -8.731882    3.365215
       _cons |      -3.75   2.159789    -1.74   0.085    -8.026969    .5269695
------------------------------------------------------------------------------
r; t=0.05 12:36:51

Announcement

Multiple linear regression for paired samples, possible?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment