Removal of batch effects for survival models

John LeQuesne

Join Date: Jan 2018

Posts: 6
#1

Removal of batch effects for survival models

27 Jan 2018, 13:22

I have data on protein levels from several hundred tumour tissue samples, which I am using in longitudinal survival models (Cox). Each experiment quantifies one protein, and is done in 20 batches each containing 120 samples. Batch x from any experiment contains the same tumours. However, I can see that *some* of my experiments show strong artefactual associations with batch. If I'm using one protein in a survival model, I can include batch as a covariate, or stratify by batch in stcox, which both ameliorate the batch effect.
The problem comes when I want to generate Cox models including several variables which are affected by batch in different ways. I feel there should be a way to correct my data for batch, but really don't know how to go about it. These corrected values would also be useful as input for other methods which do not allow for the inclusion of batch as a covariate.
What might you advise?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

27 Jan 2018, 16:27

What you are describing here is what an interaction term accomplishes. See -help fvvarlist- for how to have Stata create them for you. And see https://www3.nd.edu/~rwilliam/stats/Margins01.pdf for Richard Williams' excellent introduction to the -margins- command which, following -stcox- will be very helpful in understanding the results.
Comment
John LeQuesne

Join Date: Jan 2018

Posts: 6
#3

27 Jan 2018, 17:04

Many thanks for this- I will read with interest. I can see how interaction terms may accomplish what I need in survival models. Do you know if -margins- can be used generate adjusted/corrected data for other analyses, such as clustering? I can't immediately see how. Again, thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

27 Jan 2018, 17:20

-margins- is a general purpose tool for calculating adjusted means and marginal effects. Issues such as clustering are dealt with in the regression analyses that go before.

If the clustering you are referring to is actual cluster analysis (e.g. Stata's -cluster kmeans- command), then no, -margins- does not work with that. Only analyses that return results in e() can feed into -margins-.
1 like
Comment
John LeQuesne

Join Date: Jan 2018

Posts: 6
#5

28 Jan 2018, 01:05

Thanks!
Comment

Announcement

Removal of batch effects for survival models

Comment

Comment

Comment

Comment