Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit regression including(but not limited to) firm fixed effects

    Hello everybody,

    ​I am doing a research and I finished recently with collecting my data and I am running my regressions. I have an issue with my regression because of the fixed effects. My supervisor did not know how to do this either so he advised to go to the internet for help. Google/YouTube was not working out, I'm hoping it will now!

    My Dependend variable is a Dummy so I cannot do a normal OLS regression. I am doing a probit regression. The first problem I had was the following note:
    note: X9_D != 1 predicts failure perfectly. As a result my sample decreased with 70%. On the internet I found to delete this variable from the regression, and then it was fine.
    (I think I understand the principle because in my sample, whenever Y=1, then X9=1, always. So that is OK

    I have a problem still though , and that is Firm fixed effects. In a normal OLS regression I can do it, but not in a probit regression. I made a dataset with dataex and will explain the variables/problems. I simplified the names etc to make it as simple as possible.

    I collected my data by analysing news articles that report about a fraud in a company. I investigated three companies(Ari, Ben and Clair). Examples such as Size and Press coverage will always be the same, regardless of the particular article I am analysing. In the article though many things vary (certain words said in the article, amount of words) based on the article.



    These are variables that do not vary per article, but just per company:
    X1FE
    X2FE
    X3FE_D



    The others ( ​X4_D X5_D X6_D X7 X8_D X9_D) vary per article.

    My depended variable is Y_D and will be coded either 1 or 0 depending on the content of the article.

    "probit Y_D X1FE X2FE X3FE_D X4_D X5_D X6_D X7 X8_D" is what I used for my regression, but now I do not take into account that X1 X2 X3 are firm fixed effects. Note: I deleted X9_D here (the reasons I explained early in this post).

    Hopefully you can help me with a probit regression with fixed effects!

    Thanks in advance!

    Ruud

    The data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 Firm double X1FE int X2FE byte(X3FE_D X4_D X5_D X6_D) int X7 byte(X8_D X9_D Y_D L M N)
    
    "Ari"   9.985248048844232 1339 0 1 0 0  823 0 0 0 . . .
    "Ari"   9.985248048844232 1339 0 1 0 1  226 0 1 1 . . .
    "Ari"   9.985248048844232 1339 0 1 0 0  114 0 1 0 . . .
    "Ari"   9.985248048844232 1339 0 1 0 0  192 1 0 0 . . .
    "Ari"   9.985248048844232 1339 0 1 0 0  244 0 0 0 . . .
    "Ari"   9.985248048844232 1339 0 1 0 1  262 1 1 0 . . .
    "Ari"   9.985248048844232 1339 0 1 0 0  128 0 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 1 0  519 1 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0  166 0 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0  325 0 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 1 0  469 1 1 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0  525 0 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 1 0  222 1 1 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0  708 1 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0  770 1 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0 1421 0 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0  435 0 1 0 . . .
    "Ben"   9.226599905207358 1521 1 0 0 0  370 1 0 0 . . .
    "Ben"   9.226599905207358 1521 1 0 1 0  800 0 0 0 . . .
    "Clair" 7.879730947834448   24 1 1 0 0   97 0 1 0 . . .
    "Clair" 7.879730947834448   24 1 0 0 0  272 1 0 0 . . .
    "Clair" 7.879730947834448   24 1 0 0 1  227 0 0 0 . . .
    "Clair" 7.879730947834448   24 1 0 0 1  281 0 1 0 . . .
    "Clair" 7.879730947834448   24 1 1 0 0  171 0 1 0 . . .
    
    end
    Last edited by Ruud Elzen; 16 Jun 2016, 12:05. Reason: Deleted of bunch of the data so you do not have to scroll/copy so much here;)

  • #2
    Dear Ruud,

    With respect to the first problem; dropping X9 is not a good option; I would leave it in and let Stata deal with it.

    On the fixed effects, probit with fixed effects is not consistent. So, I would suggest that you switch to a logit and use -xtlogit- with FE.

    Best regards,

    Joao

    Comment


    • #3
      Dear Joao,

      Thank you for your response!
      I switched to xtlogit.

      First I turned my data into panel data with the command:
      xtset X1FE
      and then do the same for all variables.

      Next command:
      xtlogit Y_D X1FE X2FE X3_D X4_D X5_D X6_D X7 X8_D X9_D
      ml
      ml display

      Then i see the following:

      Fitting comparison model:

      Iteration 0: log likelihood = -110.35618
      Iteration 1: log likelihood = -76.047226
      Iteration 2: log likelihood = -67.157988
      Iteration 3: log likelihood = -66.125677
      Iteration 4: log likelihood = -65.921194
      Iteration 5: log likelihood = -65.872987
      Iteration 6: log likelihood = -65.862886
      Iteration 7: log likelihood = -65.861288
      Iteration 8: log likelihood = -65.861104
      Iteration 9: log likelihood = -65.861068
      Iteration 10: log likelihood = -65.86106 (not concave)
      Iteration 11: log likelihood = -65.86106 (not concave)
      .
      .(this continues to number 1000 eventually)
      .
      Iteration 999: log likelihood = -52.773798 (not concave)
      Iteration 1000: log likelihood = -52.773798 (not concave)
      convergence not achieved

      Fitting full model:

      tau = 0.0 log likelihood = -52.773798
      tau = 0.1 log likelihood = -35.588533
      tau = 0.2 log likelihood = -27.290324
      tau = 0.3 log likelihood = -21.911782
      tau = 0.4 log likelihood = -17.930294
      tau = 0.5 log likelihood = -14.740434
      tau = 0.6 log likelihood = -12.034424
      tau = 0.7 log likelihood = -9.599892
      tau = 0.8 log likelihood = -7.3295819

      Iteration 0: log likelihood = -9.6020362
      Iteration 1: log likelihood = -1.5379352 (not concave)
      Iteration 2: log likelihood = -1.4171534 (not concave)
      cannot compute an improvement -- discontinuous region encountered
      r(430);

      .
      . ml
      no ml model defined

      .
      . ml display
      last estimates not found
      r(301);
      Could you please tell me how to fix this Joao? ( If I delete the variable X9_D then the regression does work and I see the results... so it has something to do with this variable I suspect)


      Kind regards,

      Ruud
      Last edited by Ruud Elzen; 16 Jun 2016, 14:39.

      Comment


      • #4
        Dear Ruud,

        That suggests that -xtlogit- is not able to deal with perfect predictors, which I find surprising.

        The solution for this depends on how interesting is X9_D. If from a theory point of view this is not an important variable, you might as well leave it out. If you want to keep it in, I guess that one possible solution for this is to first estimate the model using plain -logit- and then estimating with -xtlogit- using the estimation sub-sample from the -logit- (saved in e(sample)). I am not sure this will work, but it is worth a try.

        All the best,

        Joao

        Comment


        • #5
          Dear Joao,

          Thanks for getting back to me! The variable is not a necessity, but it would be nice to have another variable that is a significant predicter for my Dependend variable (currently, I only have one)

          I run the regression with -logit- which worked when it reached Iteration 16000. The output is normal (R2 is just 0.25 instead of 0.40 but okay)
          except for the variable X9_D. its coefficent is 0 and the std error says 'omitted'. The the other values of X9_D from the regression show only '.'
          Is that a problem?
          Next, I did the following command
          estimates save "logic", replace
          then stata said:
          (note: file logic.ster not found)
          file logic.ster saved
          then I did this command:
          estimates use "logic"

          and then this command:
          xtlogit Y_D X1FE X2FE X3_D X4_D X5_D X6_D X7 X8_D X9_D
          ml
          ml display

          then stata said:
          note: X9_D != 1 predicts failure perfectly
          X9_D dropped and 319 obs not used


          Fitting comparison model:

          Iteration 0: log likelihood = -71.003427
          Iteration 1: log likelihood = -55.384707
          Iteration 2: log likelihood = -53.220854
          Iteration 3: log likelihood = -52.869259
          Iteration 4: log likelihood = -52.795908
          Iteration 5: log likelihood = -52.778511
          Iteration 6: log likelihood = -52.7748
          Iteration 7: log likelihood = -52.774024
          Iteration 8: log likelihood = -52.77385
          Iteration 9: log likelihood = -52.773808
          Iteration 10: log likelihood = -52.773799 (not concave)
          Iteration 11: log likelihood = -52.773798 (not concave)
          .
          .
          .
          Iteration 999: log likelihood = -52.773798 (not concave)
          Iteration 1000: log likelihood = -52.773798 (not concave)
          convergence not achieved

          Fitting full model:

          tau = 0.0 log likelihood = -52.773798
          tau = 0.1 log likelihood = -35.588533
          tau = 0.2 log likelihood = -27.290324
          tau = 0.3 log likelihood = -21.911782
          tau = 0.4 log likelihood = -17.930294
          tau = 0.5 log likelihood = -14.740434
          tau = 0.6 log likelihood = -12.034424
          tau = 0.7 log likelihood = -9.599892
          tau = 0.8 log likelihood = -7.3295819

          Iteration 0: log likelihood = -9.6020362
          Iteration 1: log likelihood = -1.5379352 (not concave)
          Iteration 2: log likelihood = -1.4171534 (not concave)
          cannot compute an improvement -- discontinuous region encountered
          r(430);

          .
          . ml
          no ml model defined

          .
          . ml display
          last estimates not found
          r(301);
          Did I take the right steps? I hope to hear back from you again Thanks again, I really appreciate it!

          Ruud

          Comment


          • #6
            Dear Ruud,

            If the variable is not very important, you might as well drop it because you won't be able to estimate a meaningful coefficient associated with it.

            What you did is not exactly what I had in mind. My idea was something like this:
            Code:
            logit y x1 x2....
            xtlogit y x1 x2...  if e(sample)==1, fe
            So, to differences: a) you need to include the -fe- option; b) use -if- to use just the subsample selected by logit.

            All the best,

            Joao

            Comment


            • #7
              Dear Joao,

              Sorry for my late response

              Thanks for getting back again. You were right, I it was not meaningfull... I did more research and I discovered that I did not do the ' fe' in the end. Then when I found out I saw it here haha, well either way it worked now! When I do:

              xtlogit Y_D X1FE X2FE X3_D X4_D X5_D X6_D X7 X8_D X9_D, fe
              ml
              ml display

              then it works! X9_D had a p-value of 0.95 so not significant and therefore not meaningful at all indeed. Thank you very much for the help. It is much appreciated!

              Have a good day/night,

              Ruud

              Comment


              • #8
                Hi,

                I am trying to run an xtlogit with firm fixed effect using a matched pair sample and panel data. The outcome variable is whether a firm went bankrupt or not (0 = non bankrupt control firms; 1 = for bankrupt firms in year of bankruptcy and all following years). My trouble is that when I include firm fixed effects, Stata drops them, I assume because they are perfectly colinear with the outcome variable for the matched control firms (which always equals zero).

                Is there anyway to include firm fixed effects using a matched sample in a logistic regression model where the dependent variable does not vary at all for one group?

                Thank you in advance.

                RC

                Comment


                • #9
                  Originally posted by Roger Clements View Post
                  Hi,

                  I am trying to run an xtlogit with firm fixed effect using a matched pair sample and panel data. The outcome variable is whether a firm went bankrupt or not (0 = non bankrupt control firms; 1 = for bankrupt firms in year of bankruptcy and all following years). My trouble is that when I include firm fixed effects, Stata drops them, I assume because they are perfectly colinear with the outcome variable for the matched control firms (which always equals zero).

                  Is there anyway to include firm fixed effects using a matched sample in a logistic regression model where the dependent variable does not vary at all for one group?

                  Thank you in advance.

                  RC
                  Hello there , I was wondering if you found a solution to your problem, I have the same issue, used logit with firm and year fixed effect and it just kept dropping the firm varaibles,while it worked with areg , it just doesnt with logit ( my indeptendant variable is bianary)

                  xi: logit Y X1 X2 i.gvkey2 i.year , robust

                  Please help

                  Comment


                  • #10
                    I did not Ruby, so sorry I can't be much help. Still waiting for some help myself. Anyone?

                    Comment


                    • #11
                      I'm not sure what you mean by a "matched pairs" analysis. You can't match on the outcome variable -- at least not in general. And, yes, any firm whose response doesn't change over time does not contribute to the logit FE (conditional MLE) estimates. It is NOT true that these are kept in a linear estimation using areg or xtreg, fe. Stata doesn't tell you these firms are being dropped, but they are.

                      What is the explanatory variable that you're interested in? This seems more like a duration data with grouped duration data: essentially a model of how long it takes before a firm goes bankrupt. Firms that do not go bankrupt play a role in duration analysis, but that is because you make functional form assumptions.

                      Comment


                      • #12
                        Sorry for the delay Jeff. I wanted to make more progress in the data before responding to you. The outcome variable is the likelihood a firm goes bankrupt based on whether or not the firm was targeted by consumer boycotts. I see now how the sample size drops a lot when using FE so that it helpful. So is it true that you can never include a control group AND include firm FE when the treatment variable (e.g., boycott) equals 0 and is constant throughout the panel (i.e., does not change from zero)?

                        Using another dependent variable (firm sales), I have two types of boycotts -- boycotts by consumers and boycotts by employees. How would I test which type of boycott has a stronger effect on sales? I assume I generate one treatment variable (consumer_boycott) using just consumer boycotts in one model and one treatment variable (employee_boycott) in another model and then test whether the coefficients on the treatment variables are significantly different between the two models? E.g., using suest. Does that make sense?

                        Comment

                        Working...
                        X