Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with the commands and specifics of a Logit Regression model in STATA

    Hi everyone, I'm working with a Logit Regression model and having some difficulties dealing with it in STATA. If anyone can offer an help it would be greatly appreciated. My task is to regress a logit model with robust, clustered standard errors that allow observations to be correlated within a group, additionally I have to include dummy variables (for each federal state).
    1) from what I found online, the command to regress it is simply "logit depvar [indepvar]". Is this there anything else i need to do or is it really that simple?
    2) do i need to put an "i.“ in front of the dummy variables that I include?
    3) STATA keeps dropping some variables due to colinearity, is there an option to stop it from doing this? (they arent perfectly colinear and I want to keep them)
    4) I keep ending up with an error “var_1 != 0 predicts failure perfectly”? Does anyone know what this means? I can't figure it out
    5) Lastly, how do i create the marginal effects at the mean?

    I know its a lot, but if anyone has any advice or suggestions for any of these I'd love to hear it

  • #2
    You write "from what I found online". It sounds to me - and I might well be wrong - that you are not making use of the excellent documentation included with Stata,

    When I began using Stata in a serious way, I started by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax.

    With regard to your specific questions,

    1) is discussed at great length in help logit, help logistic, and the documentation for the logit and logistic commands found in the Stata Base Reference Manual PDF.

    2) is discussed in the section on Factor Variables in chapter 11 of the Stata User's Guide PDF.

    3) cannot be addressed without a better understanding of your data and the command you issued. Follow the guidance in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. See especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ.

    4) means exactly what it says: every observation with var1 not equal to zero is a failure (that is, has the dependent variable equal to zero). Perhaps someone else will elaborate on the practical implications of this for your model.

    5) is discussed in help margins and the documentation for the margins command found in the Stata Base Reference Manual PDF, which search margins would have directed you to.

    Note also that the correct spelling is Stata rather than STATA.

    Comment


    • #3
      The answers to all of your questions are in the help files and user's manual entries for the commands you're trying to use. All of them. Advice would be to try there first.

      Comment


      • #4
        Michelle:
        I do share all previous helpful replies.
        As far as your question #3) is concerned, possible bugs apart, Stata is often smarter than the user (that's my experience, at least) on detecting methodological errors and misdemeanors; hence, I find difficult to believe that the variable(s) omitted due to collinearity should have been kept in the model.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          2) do i need to put an "i.“ in front of the dummy variables that I include?
          3) STATA keeps dropping some variables due to colinearity, is there an option to stop it from doing this? (they arent perfectly colinear and I want to keep them)
          4) I keep ending up with an error “var_1 != 0 predicts failure perfectly”? Does anyone know what this means? I can't figure it out
          Regarding 2. If you have a variable that is already dichotomous, or a polytomous variable, you should not create your own indicator ("dummy") variables for the levels of these variables. Rather you should enter them with i. in front of them. While this is at a minimum convenient in all circumstances, it is mandatory to make proper use of the -margins- command later. The i. notation is called factor-variable notation in Stata, and you should read about it in -help fvvarlist- and the corresponding manual sectioni. So if you have, say, a variable called religion which encompasses many levels such as "Christian, Muslim, Jewish, Hindu, Buddhist, Shinto" and more, you just enter it as i.religion, and Stata takes care of the rest for you.

          Regarding 3. If there were such an option, you would regret using it. The inclusion of variables that are close enough to perfectly colinear to cause Stata to drop one or more of them would have the effect of destabilizing the estimation of their coefficients. You would get coefficient results that are meaningless and have standard errors that are astronomical. Now, if these are variables that you want to include in the model solely to adjust for their nuisance effects and you have zero interest in their effects directly, that might not be so terrible, but there is really nothing gained by doing that. You are still better off with a reduced selection from among them.

          I should point out, however, that even if the offending variables not exactly colinear in the entire data set, they might be in the estimation sample. Remember that any observation in your data set which has a missing value on any variable that is part of the model is excluded from the estimation. It is entirely possible that when restricted to the remaining observations the variables in question are, in fact, perfectly colinear.

          Regarding 4. William has already point out what this means. I am here responding to his invitation for somebody to explain the practical implications. Logistic regression models are estimated by maximum likelihood in the -logit- command. When you have a variable that perfectly predicts an outcome at one of its levels, the maximum likelihood estimate of its coefficient is (positive or negative) infinity. If Stata were to proceed with the model including this, you would get an endless series of iterations with that coefficient diverging off to infinity. Convergence could never be achieved. Stata is smart enough to recognize this situation ahead of time and avoid this. If your research goals truly require inclusion of this variable in your model, there are penalized maximum likelihood estimators for logistic regression available which can handle this situation and provide a finite estimate. If you run -search penalized logistic- you will find links to some user written programs you can download for the purpose. There is also an "exact" logistic command (-search exact logistic-; it is "exact" in the same sense that the Fisher exact test is exact) which, I believe, also handles this situation but is only usable with relatively small data sets because it requires large amounts of memory and is computationally intensive.
          Last edited by Clyde Schechter; 12 Jun 2016, 12:06. Reason: Correct grammatical error.

          Comment

          Working...
          X