Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • teffects loops, inconsistent covariate coverage, and perfect predictors

    Hello,
    I am stumped on a coding problem. I am measuring the treatment effect on each of multiple outcomes over each of multiple treatments for each of multiple groups in my data set. I am using the canned teffects commands. Unfortunately, some groups have perfect predictors included for some variables, but others do not for those same variables. For each group I want to include as many covariates as possible from the full set. Further, my data set will be updated periodically and bins that are currently 0 may become non-zero in the future (so I need code that is dynamic enough to pick up on the changing data with each future iteration). The problem is that teffects outs when a perfect predictor is included, rather than omitting that variable. So, I want to create a loop that will

    1. preserve the data
    2. run the logit model with full set of covariates, many of which are categorical factor variables
    3. detect perfect predictors from #2
    4. run teffects while omitting perfect predictors detected in #3
    5. restore the data

    I'm good with 1,2,5, but I don't know how (or if there's a way) to code 3-4. Any help on this is greatly appreciated

    On a side note, I understand why teffects outs when including perfect predictors and I tend to agree with the reasoning for balanced data sets and one-off runs of teffects. But my situation is a little different. To this end, I have sent a technical request to Stata to include a non-default option to run the model while omitting perfect predictors and providing a note in the output to that effect, as happens elsewhere in Stata.

    Michael

  • #2
    A slight follow up: I found that logit and probit models both produce the stored matrix result e(rules), which is described as "information about perfect predictors." But I don't know what information is contained and what is reported is not useful:

    Code:
    . matrix list e(rules)
    
    e(rules)[1,4]
        c1  c2  c3  c4
    r1   0   0   0   0

    Comment


    • #3
      While it is true that all questions are easier answered if you provide example data (using dataex), this kind of really complex question is particularly difficult to answer without an example to experiment with.

      With regards to the e(rules) matrix, did you do the above after running a logit model with perfect prediction? Here is an example of what the matrix looks like after a logit with variables dropped due to perfect prediction.

      Code:
      . use http://www.stata-press.com/data/r15/repair.dta, clear
      (1978 Automobile Data)
      
      . logit foreign i.repair
      
      note: 1.repair != 0 predicts failure perfectly
            1.repair dropped and 10 obs not used
      
      note: 3.repair omitted because of collinearity
      Iteration 0:   log likelihood = -26.992087  
      Iteration 1:   log likelihood = -22.483187  
      Iteration 2:   log likelihood = -22.230498  
      Iteration 3:   log likelihood = -22.229139  
      Iteration 4:   log likelihood = -22.229138  
      
      Logistic regression                             Number of obs     =         48
                                                      LR chi2(1)        =       9.53
                                                      Prob > chi2       =     0.0020
      Log likelihood = -22.229138                     Pseudo R2         =     0.1765
      
      ------------------------------------------------------------------------------
           foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
            repair |
                1  |          0  (empty)
                2  |  -2.197225   .7698003    -2.85   0.004    -3.706005   -.6884436
                3  |          0  (omitted)
                   |
             _cons |  -1.85e-17   .4714045    -0.00   1.000    -.9239359    .9239359
      ------------------------------------------------------------------------------
      
      . matrix list e(rules)
      
      e(rules)[2,4]
                 c1  c2  c3  c4
      1b.repair   1   0   0  10
       3.repair   4   0   0   0
      I don't do enough work with matrices to tell you how to extract the row names from a matrix but perhaps someone here can give some advice on how you could extract the variable names of dropped variables, which you could then use to strip those variables out of your logit command for that iteration of the loop.

      Comment


      • #4
        Thanks for the reply, Sarah. This is helpful, as I didn't even know that this is what the e(rules) matrix should even look like or could find any guidance surrounding e(rules) online (I thought my prior code included a perfect predictor, but I have realized that I misspecified it). Yes, I think you posed my question correctly: Is there a way to pull out the row name from e(rules) if e(rules) indicates a perfect predictor. It looks like this is the case if c1 ==1 or if c4>0, but some guidance on this would be helpful.

        Then is there an automated way to call a factorial variable, in this case i.repair, in the teffects command line that omits, in this case, 1.repair based on the output from e(rules)

        Unfortunately, I am not allowed to share any of the data that I am working with, not even snippets. I should have generated a deidentified and obscured data set or a simulated data set with the same problem that I was experiencing. I will do this in the future.




        Comment

        Working...
        X