Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression: Note: 7 failures and 0 successes completely determined.

    Dear Community,

    I am doing a logistic regression.

    My Y-Variable is 1 if a company records non-GAAP earnings and 0 otherwise.
    My X-Variables are several continuous and dummy variables as for example log(market capitalization) and so on.
    Since I have 5 observations per firm, I clustered the standard errors per firm.

    Following is the output of my regression:

    Click image for larger version

Name:	Unbenannt2.png
Views:	1
Size:	22.1 KB
ID:	1407536


    My first problem is, that nearly no of the x-variables is statistically significant. However, even if the resuls are not significant, they can still be valuable for interpretation.

    However, I receive the comment "Note: 7 failures and 0 successes completely determined." According to the stata FAQ (https://www.stata.com/support/faqs/s...ic-regression/) there are two possible explanations for the message.

    The "important note" on such page says: Here there will be no missing standard errors. If you have a missing standard error in your output, see Case 2 below.
    Now my first question is: how do I determind if there are missing standard errors in my model?
    And the second question: if there really are missing standard errors, how can I proceed best?

    Thank you in advance for your help.

    Andrea

  • #2
    how do I determind if there are missing standard errors in my model?
    You just look at the Std. Err. column in the regression output and see if any of them are missing (which would be shown as a . character). In the example you show, there are no missing standard errors.

    Not raised as a question by you, but why are you using -logit- when you have nested data? Shouldn't you be using -xtlogit- or -melogit-?

    The use of cluster robust standard errors requires a sufficiently large number of clusters for validity. You have only 12. While some might say that 12 is enough, others will object that it is not. Be prepared for criticism on this basis. Probably it would be best to do the analysis twice, once with and without--if the results are similar you are home free.

    Finally, please read the FAQ where it very clearly asks people not to use screenshots to show Stata output. This one was readable (on my computer), but often they are not, and an unreadable screenshot just delays your getting a useful response and wastes somebody's time asking you to repost. So in the future, do it right the first time. FAQ #12 tells you how to use code delimiters to post Stata code and output. Please read the FAQ and follow the excellent advice therein; it will make your use of the Forum more efficient and productive.

    Comment


    • #3
      Welcome to the Stata Forum/Statalist.

      With regards to your first question, the output shared in #1 demonstrates there are standard errors for all predictors.

      Note: crossed with Clyde's reply, who gave a broader (and better) perspective.
      Last edited by Marcos Almeida; 23 Aug 2017, 09:35.
      Best regards,

      Marcos

      Comment


      • #4
        Andrea.
        some asides to previous helpful comments:
        - the fact that most of your coefficients do not reach statistical significance is simply a matter of fact (that is, no good or evil in itself); hence they're surely worth explaining/interpreting;
        - you're seemingly dealing with a panel dataset composed of 128 firms with 5 observations each. Although with 128 clusters the robustified standard errors work well, I do share Clyde's remark about why not switching to -xtlogit-;
        -eventually, I would devote some time to learn the highly rewarding magic of -fvvarlist- notation for categorical variables (and interactions, too). Moreover, -fvvarlist- has very good relationships with two other wonderful Stata commands, such as -margins- and -marginsplot-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          First I'd like to say thank you to all of you for the helpful comments.

          Sorry Clyde. It's the first time I use such a forum and I normally only read the FAQ page if I have any questions what in that case I didn't have. No I know and I won't post any screenshots anymore.

          Regarding your input about xtlogit: My statistic knowledge is very basic. I just checked the stata manual about xtlogit and I found that there are basically three ways to run such a model: One model that fits random-effects, one than fits conditional fixed-effects, and a third one that fits population-averaged logit models. Carlo is right - I have data of 128 firms each with 6 observations. Which model do you think would be the most applicable?
          From what I just read about these different models I would go with a fixed effects model like that:
          xtlogit Y X1 X2 ... Xn, fe
          As I understand - while fitting such a model, I already control for grouped standard errors and thus do not have to cluster by company (cluster(company)) as I did in the example I showed in my first post. Is that assumption correct?

          Carlo, yes that's correct and I'll definitely keep some of them in the model even though they don't prove significant just because they have explanatory character. Thanks for the comment re. -fvvarlist- and -margins- and -marginsplot- . I'll have a look at them at the weekend.

          Best,

          Andrea

          Comment


          • #6
            In the world of finance and economics, of the three -xtlogit- estimators, -fe- is far and away the most commonly used. That is not to say that it is always the best, but it probably should be the first one you try. The -pa- estimator is not nearly as much used, and it estimates something different from either of the other two, and use is reserved for when that is what is wanted.

            If you use -xtlogit, fe-, you gain automatic adjustment of the results for any time-invariant factors, observable or not, that characterize the different firms. You do not automatically get adjustment for correlation of errors within firms. For that, you need to specify -vce(robust)- [which, in this estimator, though generally not in others, Stata interprets to mean -vce(cluster firm)-]. However, as I noted earlier, the use of the cluster robust estimator is questionable with only 12 clusters. So, as before, I would do it both with and without -vce(robust)- and hope the results aren't very different.

            Let's revisit -fe- vs -re-. The random effects estimator is more efficient. That is, it will generally provide smaller standard errors (which translate to narrower confidence intervals and more statistical power) than the fixed effects estimator. However, the random effects estimator does not provide consistent estimates of there are correlations between the error terms and the predictors in the model. For that reason, people often use the hausman test (-help hausman-) to choose between -fe- and -re-.

            Comment


            • #7
              Thank you Clyde. I'll definitely have to read some more literature about fe and re models.
              I tried now to do an fe model. I started with defining the data set as panel data:

              Code:
              xtset company fy, yearly
              .

              the answer was:

              panel variable: company (strongly balanced)
              time variable: fy, 2012 to 2016
              delta: 1 year

              Then I fitted the model with
              Code:
              xtlogit  nongaap12 rel_sales listing leverage sd_roa abnorm intang pb_ratio report_dummy goodwill_dummy earn_surp sd_price size, fe vce(robust)
              note: multiple positive outcomes within groups encountered.
              note: 99 groups (495 obs) dropped because of all positive or
              all negative outcomes.
              note: listing omitted because of no within-group variance.

              Iteration 0: log pseudolikelihood = -55.883912
              Iteration 1: log pseudolikelihood = -53.326164
              Iteration 2: log pseudolikelihood = -53.301835
              Iteration 3: log pseudolikelihood = -53.301831

              Conditional fixed-effects logistic regression Number of obs = 145
              Group variable: company Number of groups = 29

              Obs per group: min = 5
              avg = 5.0
              max = 5

              Wald chi2(11) = 10.63
              Log likelihood = -53.301831 Prob > chi2 = 0.4751

              (Std. Err. adjusted for clustering on company)
              ------------------------------------------------------------------------------
              | Robust
              nongaap12 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              rel_sales | -.6484157 1.235027 -0.53 0.600 -3.069024 1.772192
              leverage | .0173991 .0560088 0.31 0.756 -.0923762 .1271744
              sd_roa | -.0128168 .0926885 -0.14 0.890 -.1944829 .1688494
              abnorm | -.0019152 .0072584 -0.26 0.792 -.0161414 .0123111
              intang | -5.10209 5.421687 -0.94 0.347 -15.7284 5.524222
              pb_ratio | -.2060383 .3148468 -0.65 0.513 -.8231268 .4110501
              report_dummy | -.7111733 1.999535 -0.36 0.722 -4.630191 3.207844
              goodwill_d~y | 1.941722 1.079259 1.80 0.072 -.1735864 4.057029
              earn_surp | -.0021266 .0030136 -0.71 0.480 -.0080331 .0037799
              sd_price | -.0295124 .0202485 -1.46 0.145 -.0691987 .0101739
              size | 1.058993 2.724276 0.39 0.697 -4.280491 6.398476
              ------------------------------------------------------------------------------

              (sorry - i couldn't put it in nicer). Anyway, the model is now completely insignificant and I have no idea what the 3 notes mean.

              Comment


              • #8
                Andrea:
                with such a scant handful of observations included in the regression model (145, that is 5 observations for each of the 29 groups) lack of statistical significance (for what it worths) is not surprising at all.
                As a global opinion, your sample seems to have little variation, especially in the dependent variable (495 obs deleted).
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo, this is what I don't understand.
                  It's a logistic function, so my dependend variable is binary. Out of 640 observations the dependend variable takes the value 1 in 310 observations.

                  Comment


                  • #10
                    Andrea:
                    my guess is that you have 310 observations that always take on the value 1 and (495-310)=185 that always take on the value 0 (hence: no within group variance).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hi Carlo
                      So I have 128 firms with each 5 observations. and yes, of course there are firms that always (in each of the 5 observations) have the value 1 as dependend variable and also there are firms that always have a value of 0.
                      Is this what you mean? if yes, how can I alter the model to account for that?

                      Comment


                      • #12
                        Andrea Zaugg You may be dealing with "perfect prediction" or "perfect separation" issue.


                        You may wish to read this thread, particularly my suggestion in #3 to use a penalized mle. There is the - firthlogit - written by Joseph Coveney, a very active member of this forum.

                        Actually, I recommend you take a look at the user-written program and see for yourself, for I haven't so far needed to use penalized mle, hence I'm not aware whether this method "embraces" the remaining assumptions you selected for the model.

                        Good luck!

                        Best regards,

                        Marcos

                        Comment


                        • #13
                          Andrea:
                          yes, that is what I meant.
                          I would follow Marcos' suggestion.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Re #7:

                            Let's go over the three notes one at a time. All of them are warnings, to notify you about conditions in the data that you may not have expected, they are not necessarily serious problems, but could be serious if these conditions are not supposed to arise in your particular data.

                            note: multiple positive outcomes within groups encountered.
                            -xtlogit- is often used in a situation where within each group (panel) there is exactly one positive outcome and all the others are negative. Stata is telling you that your data are not like that, that your groups may include more than one positive outcome. There is nothing wrong with that in principle, but if your data are supposed to only have one positive outcome per group, then your data are not correct and you need to look into it.

                            note: 99 groups (495 obs) dropped because of all positive or all negative outcomes.
                            In a fixed-effects logistic regression, any groups where the outcome is the same for all observations in the group are uninformative and they are dropped from the estimation sample. Again, this isn't illegal and it isn't necessarily a problem. But if you were expecting your data to have a mix of positive and negative outcomes (as is typically the case in data sets where people use -xtlogit-), Stata is just letting you know that you need to re-check your data. If it is expected that some groups will have all 0 or all 1 outcomes, then there is nothing wrong here. Just bear in mind that those groups are uninformative, there is no "adjusting" the model for it--they don't have any information about the outcome-predictor relationships and there is no legitimate analysis that will tell you otherwise.

                            note: listing omitted because of no within-group variance.
                            In any fixed effects model, a covariate that is constant within groups is necessarily colinear with the fixed effects, and therefore has to be omitted from the model. Stata is letting you know that this is true of your variable listing. Again, this isn't necessarily a problem. It's only a problem if estimating the effects of listing is an important part of your project. In that case it's a fatal problem because it cannot be done! But as long as listing was included only to adjust for its possible confounding effects, then there is nothing to worry about--the fixed effects themselves do that job for you and listing is just ignored.

                            Comment


                            • #15
                              Thanks for your explanations!
                              I had a discussion with my professor yesterday. We will most likely stay with logistic. Thanks anyway for your help and input.

                              Comment

                              Working...
                              X