Joint Risk Genotype Interaction

Dew FR

Join Date: Sep 2019

Posts: 7
#1

Joint Risk Genotype Interaction

20 Sep 2019, 07:32

Hi,

Does anyone know how to do analysis of Joint Risk Genotype Interaction?

I did case-control study analyzing the effect of polymorphism (HTRA and ARMS) on AMD disease. I have excluded all missing data in analysis.

I used these codes:

. logistic Outcome i.HTRA##i.ARMS

HTRA: 1=GG 2=GA 3=AA
ARMS: 1=GG 2=GT 3=TT
AMD (Outcome): 1=disease 0=no disease

It resulted :
note: 2.HTRA#1.ARMS != 0 predicts failure perfectly
2.HTRA#1.ARMS dropped and 1 obs not used

note: 3.HTRA#1.ARMS != 0 predicts failure perfectly
3.HTRA#1.ARMS dropped and 3 obs not used

note: 2.HTRA#3.ARMS omitted because of collinearity
note: 3.HTRA#3.ARMS omitted because of collinearity

My expectation is that I can get the OR for
HTRA GG vs ARMS GG = reference
HTRA GG vs ARMS GT
HTRA GG vs ARMS TT
HTRA GA vs ARMS GG
HTRA GA vs ARMS GT
HTRA GA vs ARMS TT
HTRA AA vs ARMS GG
HTRA AA vs ARMS GT
HTRA AA vs ARMS TT

I don't know why I can't get OR for all 9 possibilities.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

20 Sep 2019, 12:03

Your expectations are misguided. Even in the best of circumstances, there are only 9 possible combinations of HTRA and ARMS, so at most there could only be 8 odds ratios, as one of the categories must serve as the reference (denominator) for the odds ratios of the others.

But, worse, you are not in the best of circumstances. As Stata has told you in its output, two combinationis, 2.HTRA#1.ARMS and 3.HTRA#1.ARMS had to be omitted from the model because the outcome disease never occurs with either of those combinations. When you have a level of a variable (or of an interaction, as here) where the outcome never varies, that is called perfect prediction. With perfect prediction, the maximum likelihood estimate of the logistic regression coefficient is negative inifnity (if the outcome never occurs) or positive infinity (if the outcome always occurs). Since perfect prediction implies that the logistic regression model cannot be convergently estimated, Stata (and all other statistical packages I am familiar with) omit those levels from the model.

Then other problems have cascaded from that. When HTRA = 2 or 3, ARMS = 1 is now excluded, which means that ARMS must be either 2 or 3. So ARMS is reduced to a dichotomous variable when HTRA = 2 or 3; consequently ARMS = 2 and 3 can't both be in the model: there is colinearity (which Stata also told you). So for both HTRA = 2 or 3, one of the ARMS levels of the interaction has to get dropped.

These explain all of your empty and omitted categories. None of them are estimable with logistic regression from this data.

If you want to work around this, try using Joseph Coveney's -firthlogit- (available from SSC). It uses penalized maximum likelihood estimation, which enables you to retain levels of variables that lead to perfect prediction. It also provides better estimates of the coefficients when the number of observations in a group is small. (The exclusion of 2.HTRA#1.ARMS and 3.HTRA#1.ARMS only led to 4 observations being dropped, so these classes are very small and, even if they were not omitted, their estimates from -logistic- would have been biased.)
1 like
Comment
Dew FR

Join Date: Sep 2019

Posts: 7
#3

06 Oct 2019, 15:14

Dear Clyde Schechter

Thanks a lot for a great explanation.

My mistake, I mean that I expect to get 8 OR's (with 1 reference). I have tried using firthlogit command, but it didn't give 8 OR's. It resulted in 4 OR results.
. firthlogit Outcome i.HTRA##i.ARMS, or

Also when I tried to apply on other gene data, it resulted the similar problem on colinearity although no missing data included in the analysis.
. firthlogit Outcome i.HTRA##i.CFH, or
. firthlogit Outcome i.ARMS##i.CFH, or

I am not sure how to solve this. I am glad if you can help me.

I really appreciate your help and guidance.

Kind regards,
Dew
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#4

06 Oct 2019, 16:13

Ok, let's review how to count degrees of freedom in interaction models. Suppose variable x1 has n1 levels, and x2 has n2 levels. When you run -regression_command outcome i.x1##i.x2-, the n1*n2 combinations of x1 and x2 (assuming none are omitted due to perfect prediction or non-occurrence in the data) are represented by n1-1 "main" effects for x1, n2-1 "main" effects for x2, and (n1-1)*(n2-1) "interaction" effects (which represent combinations of the levels of x1 and x2, omitting the reference categories of each). If you add that up it's n1 -1 + n2 -1 + (n1-1)*(n2-1) = n1+n2-1 different terms.

And that's exactly what you have. HTRA and ARMS each have 3 levels. So you should expect 2 HTRA terms, 2 ARMS terms, and 4 HTRA#ARMS terms. If you count them you can see that they are all there. The 9 possible combinations of HTRA and ARMS are represented by these 8 terms (and the constant which carries the information when HTRA and ARMS are both in their reference categories.)

Putting this all together is difficult if you don't have a lot of experience with it. The -margins- command makes it much easier to see what is going on. Re-run the first -firthlogit- command and then run -margins HTRA#ARMS- and you will get the predicted probabilities of your outcome (AMD) for each of the 9 combinations.

For the other models, your data does not contain all of the possible combinations of HTRA with CFH and ARMS with CFH. Consequently the analysis cannot provide any information about the catgeories for which there are no data. Again, it is simpler to see exactly what's going on, if you run -margins- after the regression. But in this case, the -margins- command will not show you all 9 combinations either because your data are not informative about all of them: -margins- will show you what your data does have information about.

So, there is nothing wrong with what Stata is doing, and there is nothing wrong with your commands. The problem is that your data set is inadequate for the task you have set for it.
1 like
Comment
Dew FR

Join Date: Sep 2019

Posts: 7
#5

06 Oct 2019, 17:58

Thanks a lot Clyde Schechter for your time and a great explanation to solve it. Now I understand what the problem as my data on CFH is not sufficient.
Comment

Announcement

Joint Risk Genotype Interaction

Comment

Comment

Comment

Comment