Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Joint Risk Genotype Interaction

    Hi,

    Does anyone know how to do analysis of Joint Risk Genotype Interaction?

    I did case-control study analyzing the effect of polymorphism (HTRA and ARMS) on AMD disease. I have excluded all missing data in analysis.


    I used these codes:

    . logistic Outcome i.HTRA##i.ARMS

    HTRA: 1=GG 2=GA 3=AA
    ARMS: 1=GG 2=GT 3=TT
    AMD (Outcome): 1=disease 0=no disease

    It resulted :
    note: 2.HTRA#1.ARMS != 0 predicts failure perfectly
    2.HTRA#1.ARMS dropped and 1 obs not used

    note: 3.HTRA#1.ARMS != 0 predicts failure perfectly
    3.HTRA#1.ARMS dropped and 3 obs not used

    note: 2.HTRA#3.ARMS omitted because of collinearity
    note: 3.HTRA#3.ARMS omitted because of collinearity

    My expectation is that I can get the OR for
    HTRA GG vs ARMS GG = reference
    HTRA GG vs ARMS GT
    HTRA GG vs ARMS TT
    HTRA GA vs ARMS GG
    HTRA GA vs ARMS GT
    HTRA GA vs ARMS TT
    HTRA AA vs ARMS GG
    HTRA AA vs ARMS GT
    HTRA AA vs ARMS TT

    I don't know why I can't get OR for all 9 possibilities.
    Click image for larger version

Name:	Screenshot 2019-09-20 at 14.28.08.png
Views:	1
Size:	170.1 KB
ID:	1517230

  • #2
    Your expectations are misguided. Even in the best of circumstances, there are only 9 possible combinations of HTRA and ARMS, so at most there could only be 8 odds ratios, as one of the categories must serve as the reference (denominator) for the odds ratios of the others.

    But, worse, you are not in the best of circumstances. As Stata has told you in its output, two combinationis, 2.HTRA#1.ARMS and 3.HTRA#1.ARMS had to be omitted from the model because the outcome disease never occurs with either of those combinations. When you have a level of a variable (or of an interaction, as here) where the outcome never varies, that is called perfect prediction. With perfect prediction, the maximum likelihood estimate of the logistic regression coefficient is negative inifnity (if the outcome never occurs) or positive infinity (if the outcome always occurs). Since perfect prediction implies that the logistic regression model cannot be convergently estimated, Stata (and all other statistical packages I am familiar with) omit those levels from the model.

    Then other problems have cascaded from that. When HTRA = 2 or 3, ARMS = 1 is now excluded, which means that ARMS must be either 2 or 3. So ARMS is reduced to a dichotomous variable when HTRA = 2 or 3; consequently ARMS = 2 and 3 can't both be in the model: there is colinearity (which Stata also told you). So for both HTRA = 2 or 3, one of the ARMS levels of the interaction has to get dropped.

    These explain all of your empty and omitted categories. None of them are estimable with logistic regression from this data.

    If you want to work around this, try using Joseph Coveney's -firthlogit- (available from SSC). It uses penalized maximum likelihood estimation, which enables you to retain levels of variables that lead to perfect prediction. It also provides better estimates of the coefficients when the number of observations in a group is small. (The exclusion of 2.HTRA#1.ARMS and 3.HTRA#1.ARMS only led to 4 observations being dropped, so these classes are very small and, even if they were not omitted, their estimates from -logistic- would have been biased.)

    Comment


    • #3
      Dear Clyde Schechter

      Thanks a lot for a great explanation.

      My mistake, I mean that I expect to get 8 OR's (with 1 reference). I have tried using firthlogit command, but it didn't give 8 OR's. It resulted in 4 OR results.
      . firthlogit Outcome i.HTRA##i.ARMS, or

      Click image for larger version

Name:	Screenshot 2019-10-06 at 21.49.19.png
Views:	1
Size:	193.7 KB
ID:	1519380



      Also when I tried to apply on other gene data, it resulted the similar problem on colinearity although no missing data included in the analysis.
      . firthlogit Outcome i.HTRA##i.CFH, or
      . firthlogit Outcome i.ARMS##i.CFH, or

      Click image for larger version

Name:	Screenshot 2019-10-06 at 22.10.12.png
Views:	1
Size:	243.3 KB
ID:	1519382


      Click image for larger version

Name:	Screenshot 2019-10-06 at 22.10.23.png
Views:	1
Size:	243.6 KB
ID:	1519381


      I am not sure how to solve this. I am glad if you can help me.

      I really appreciate your help and guidance.

      Kind regards,
      Dew


      Comment


      • #4
        Ok, let's review how to count degrees of freedom in interaction models. Suppose variable x1 has n1 levels, and x2 has n2 levels. When you run -regression_command outcome i.x1##i.x2-, the n1*n2 combinations of x1 and x2 (assuming none are omitted due to perfect prediction or non-occurrence in the data) are represented by n1-1 "main" effects for x1, n2-1 "main" effects for x2, and (n1-1)*(n2-1) "interaction" effects (which represent combinations of the levels of x1 and x2, omitting the reference categories of each). If you add that up it's n1 -1 + n2 -1 + (n1-1)*(n2-1) = n1+n2-1 different terms.

        And that's exactly what you have. HTRA and ARMS each have 3 levels. So you should expect 2 HTRA terms, 2 ARMS terms, and 4 HTRA#ARMS terms. If you count them you can see that they are all there. The 9 possible combinations of HTRA and ARMS are represented by these 8 terms (and the constant which carries the information when HTRA and ARMS are both in their reference categories.)

        Putting this all together is difficult if you don't have a lot of experience with it. The -margins- command makes it much easier to see what is going on. Re-run the first -firthlogit- command and then run -margins HTRA#ARMS- and you will get the predicted probabilities of your outcome (AMD) for each of the 9 combinations.

        For the other models, your data does not contain all of the possible combinations of HTRA with CFH and ARMS with CFH. Consequently the analysis cannot provide any information about the catgeories for which there are no data. Again, it is simpler to see exactly what's going on, if you run -margins- after the regression. But in this case, the -margins- command will not show you all 9 combinations either because your data are not informative about all of them: -margins- will show you what your data does have information about.

        So, there is nothing wrong with what Stata is doing, and there is nothing wrong with your commands. The problem is that your data set is inadequate for the task you have set for it.

        Comment


        • #5
          Thanks a lot Clyde Schechter for your time and a great explanation to solve it. Now I understand what the problem as my data on CFH is not sufficient.

          Comment

          Working...
          X