Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question of fixed effects N

    This is an embarrassing question, but I would appreciate an answer.

    I have a panel of individuals surveyed in two time periods. There is quite a bit of missing data on both x/y variables. Xtset says panel is strongly balanced. I run FE but the output table suggests that for some groups there are less than 2 observations? and N of groups multiplied by 2 is not equal to Number of obs. I was assuming stata would kick out individuals for which there are no obs for one of the two years and would only take those that have obs for two time periods for whom there was a change? Could someone please explain this?

    Here is an example of output:

    Fixed-effects (within) regression Number of obs = 23,881
    Group variable: caseid Number of groups = 13,364

    R-sq: Obs per group:
    within = 0.0377 min = 1
    between = 0.0027 avg = 1.8
    overall = 0.0000 max = 2



  • #2
    Mol:
    Stata simply omits observations with missing values in any variables (and appplies this approach in any command).
    In your case, some panels have one observation only (that Stata included in -xtreg,fe-).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      That I understand, but why would it then count cases where there is only one observation in two periods?

      Comment


      • #4
        When xtset says the panel is strongly balanced, it is looking only at the individual identifier and the time identifier. It does not know what variables will be used in your regression. So every individual has observations in each time period.

        When you run your regression is when Stata excludes only those observations with values missing for the model you fit. So some individuals will have observations excluded for one time period but not the other.

        You will have to tell Stata if you want to omit both observations for individuals with missing values in either observation. Something like the following untested code may start you on your way
        Code:
        generate hasmiss = rowmiss(x y)
        bysort individual (hasmiss): generate full = hasmiss[_N]==0 // if the biggest value of hasmiss is 0 both are 0 and nothing is missing
        sort individual time
        regress y x if full

        Comment


        • #5
          Great, thank you so much!

          Comment

          Working...
          X