Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Obtaining marginal estimates of an interaction term

    Hi,

    I am running a logit model at the patient level with outcome=whether or not a participant entered into a certain program. The model has patient and PCP-level characteristics as predictors and is clustered at the PCP level. As part of my analysis, I had a model that included an interaction term between Practice (each patient is in one of 30+ practices) and a dummy indicating whether the patient was older than 85. Then I obtained marginal estimates on the interaction term between practice and the age dummy to get marginal estimates for each practice of the chances of going into the program with and without the age condition.

    In order to justify using this model, I tried running a full interaction model, interacting every predictor with the age dummy and running an LR test. The LR test was significant, indicating that the additional interaction terms did add predictive capacity to the model. Thus, I tried to obtain the marginal estimates for the interaction term from the full interaction model; however, every single practice/age 85+=yes combo had "not estimable".

    I noticed that when I ran the full interaction model, 2 variables fell out of the model (or rather, one variable fell out of the model, but for the other only one level of the categorical variable fell out). I tried re-running the full interaction model again without those variables and this time, was able to obtain marginal estimates.

    I'd like to understand exactly why the margins command won't run correctly unless the two variables that fell out of the full interaction model were excluded. Relevant portions of my code and output are attached, but I can probably provide more if need be.

    Thanks!
    Attached Files

  • #2
    Please don't post Excel files as attachments. Some of the most active responders on this forum don't use Microsoft Office products. I do, but because these files can contain active and malicious content, I never download them when I don't personally know the sender.

    Everybody on this forum uses Stata! So you could have posted your .smcl log as an attachment. But simpler and better still is to just copy the relevant output from your log or the Results window and paste it into a code block on this forum. (If you don't know how to create a code block, see the FAQ).

    Comment


    • #3
      This is the code I used:

      Code:
       logit bene_Y2_status_cd_Appr_vs_NHR female##age_85_and_up i.pct_below_pov_lev_cat##age_85_and_up  bene_hospice_12_months_pre##age_85_and_up i.bene_snf_days_12_mths_pre_cat##age_85_and_up bene_first_pcp_vis_before_2011##age_85_and_up PCP_female##age_85_and_up i.pcp_age_cat##age_85_and_up  Pract_Identifier##age_85_and_up, vce(cluster PCP_NPI) or
      This is the notice I got about the two variables falling out of the model (from the log):

      note: 1.bene_hospice_12_months_pre#1.age_85_and_up != 0 predicts success perfectly
      1.bene_hospice_12_months_pre#1.age_85_and_up dropped and 16 obs not used

      note: 3.bene_snf_days_12_mths_pre_cat#1.age_85_and_up != 0 predicts success perfectly
      3.bene_snf_days_12_mths_pre_cat#1.age_85_and_up dropped and 67 obs not used

      This is the margins command I used and a portion of the output in which no practice/age 85+=1 combination was estimable:

      margins Pract_Identifier#age_85_and_up, post asbalanced

      Adjusted predictions
      Model VCE : Robust

      Expression : Pr(bene_Y2_status_cd_Appr_vs_NHR), predict()
      at : female (asbalanced)
      age_85_and_up (asbalanced)
      pct_below_pov_lev_cat (asbalanced)
      bene_hospice_12_months_pre (asbalanced)
      bene_snf_days_12_mths_pre_cat (asbalanced)
      bene_first_pcp_vis_before_2011 (asbalanced)
      PCP_female (asbalanced)
      pcp_age_cat (asbalanced)
      Pract_Identifier (asbalanced)

      -----------------------------------------------------------------------------------------------------
      | Delta-method
      | Margin Std. Err. z P>|z| [95% Conf. Interval]
      ------------------------------------+----------------------------------------------------------------
      Pract_Identifier#age_85_and_up |
      5 0 | .9452707 .040892 23.12 0.000 .8651238 1.025418
      5 1 | . (not estimable)
      8 0 | .915886 .0575876 15.90 0.000 .8030165 1.028756
      8 1 | . (not estimable)
      10 0 | .8979817 .0844838 10.63 0.000 .7323964 1.063567
      10 1 | . (not estimable)

      Comment


      • #4
        Thanks for the clarifications. The first two warnings:
        Code:
        note: 1.bene_hospice_12_months_pre#1.age_85_and_up != 0 predicts success perfectly
        1.bene_hospice_12_months_pre#1.age_85_and_up dropped and 16 obs not used
        
        note: 3.bene_snf_days_12_mths_pre_cat#1.age_85_and_up != 0 predicts success perfectly
        3.bene_snf_days_12_mths_pre_cat#1.age_85_and_up dropped and 67 obs not used
        mean that whenever bene_hospice_12_months_pre == 1 & age_85_and_up == 1, then bene_y2_status is always 1. So it is impossible to estimate an effect for this interaction term. Similarly, whenever bene_snf_days_12_mths_pre_cat ==3 & age_85_and_up == 1, again, bene_y2_status is always 1. So, again it is impossible to estimate an effect for this interaction term. The interaction terms are therefore dropped, and the corresponding observations are excluded from the estimation sample. Remember that -logit- is fitting maximum likelihood, and when a particular variable (or interaction term) has only 1's as corresponding outcomes, the maximum likelihood estimate for that coefficient is infinite!

        As for the -margins- problem, this is likely the result of combinations of Pract_Identifier and age_85_and_up that are not represented in the data. (The non-estimable margins are not necessarily just the ones which are not represented in the data, by the way.)

        Looking at your command, you have included a very large number of variables and interaction terms. Remember that something like Pract_Identifier, which takes on 10 different values contributes 9 variables to the count, not 1. And when you throw in the interactions with age_85_and_up, you're talking 18 variables. Similar considerations may apply to all your other variables.) It does not surprise me that there are numerous combinations of variables that are either not instantiated in your data, or are represented in such small numbers that by chance alone, there is no variation in the outcome among them.

        My overall "diagnosis" is that your model is just too complicated. It might work out in a much larger data set, but basically you are trying to squeeze out more information than is there. Give some careful consideration to all those variables and interactions, and choose a smaller subset of variables that are really important from a scientific (not statistical) perspective, and work with those. I'd probably start by dropping interactions unless they are highly motivated by theory or are the focus of the study. And you might also be able to simplify by reducing the number of categories of some of your other variables by combining levels that are sufficiently similar.
        Last edited by Clyde Schechter; 12 May 2015, 08:32.

        Comment

        Working...
        X