Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • STATA omitting observations

    Hi all. I have a panel data set with 20 103 companies in the sample over 4 years, meaning that I have 80 412 observations in my data set. When I run a fixed effect logistic regression with a binary dependent variable to measure the effects of a treatment (which only happens in one out of the four years) STATA tells me that only have 9 788 number of observations. I understand that this is due to some criteria, but I wonder which? Is it so that the fixed effect command only apply the observations with the binary variable = 1 in the data set? I know that I have more than 2 447 (9 788/4) companies in that specific year with the binary =1, so could it also be because of my control variables? Does STATA omit the whole observation if one of the control variables are missing in the data set?

    Br
    Sebastian

  • #2
    Fixed effects logistic regression can only use observations where the explained/dependent/left-hand-side/y-variable changes. So all companies with a dependent variable that remains constant (either all 0s or all 1s) will be dropped. Stata will drop an observation (a company within a year, not all 4 years for that company) if one or more of the explanatory/independent/righ-hand-side/x-variables is missing.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you very much for your reply, Maarten. Much appreciated!

      Comment


      • #4
        Maarten's advice was good, of course, but glossed over an important point: you ought to figure out where/why your cases are going missing. If it's due to no change in the dependent variable, then that's one thing; if it's due to missing independent variables, that's another. It can be messy, but important to know. Some cases presumably did not change, because they already were in the state (call it category 1) that the treatment was supposed to induce. Some never changed (call it category 0), with the treatment not having an impact. Some cases are missing due to simply not having data on certain questions, and may be un-balanced data (two or three years, instead of all four). The source of missingness matters.

        So I'm not 100% sure a simple fixed effects logistic regression model is appropriate to begin with, given only four years. To demonstrate whether it's appropriate, check the # of cases not-at-risk (all ones) and the number of cases that were zeros the whole time, and the number of cases missing due to non-response or some other process.

        Cases with all-zeros -- those cases *could* have changed, but no chance to measure the impact of the treatment. Some sort of censoring model might fix things: they *might* have changed in year five or six, and such a model can capture that possibility.

        Cases with all-ones, no chance to measure the impact of the treatment. Not sure what to do with those. I guess you legitimately throw them out as out-of-scope/not-at-risk, like throwing out males when predicting pregnancy.

        Cases with missing data on independent variables, an imputation model or FIML might help, but hopefully you don't have many cases like that.

        Just some thoughts, maybe discuss them with somebody familiar with the literature in your field, or ignore my thoughts. But at a first reading, it seems more complicated than your discussion with Maarten would indicate. He answered your question, but to my reading, there are a lot more questions to be asked. Or maybe I'm overly complicating matters.

        Comment

        Working...
        X