Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-effects on sibling data: which N to report?

    Dear colleagues, good day! I would be very grateful if you could help me with an issue.

    Setup: I have data on siblings, 2-3 children per family. Totally, there are about 4000 families. Dependent variable is "Educational attainment" (continuous), independent variable is "Cognitive ability" (continuous). I am running family fixed effects (xtreg) to take into account any confounding effects shared by siblings.

    Questions:
    1. Which N should I report in my paper? The nominal N that STATA reports? Or N with only discordant sibling families (where, for example, all the children have different educational attainment and cognitive ability)? For example, xtlogit just drops those groups (families) where individuals do not differ in terms of a dependent variable (thus they are non-discordant) and reports only N with non-discordant families. But obviously, xtreg does not work in that way, because it always reports the nominal N.

    2. Which families are really involved in calculating b-coefficients and standard errors? All the cases despite whether or not the children are discordant? Only those that differ within-family in terms of the depvar? Or only those that differ in terms of the indepvar? Or only when both depvar and indep var are discordant?

    With best regards,
    V


  • #2
    Vardan:
    welcome to this forum.
    The number of observations reported by Stata is the consequence of the code you typed.
    Henceyou should made up your mind about what you're after (by the way, I fail to get why you mention -xtreg-, then -xtlogit- and finally -xtrteg- again: is your regressand discrete or continuous?).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo, thank you for your reply. This is indeed my first message in this forum, but I come to here quite often to check different things related to data analysis in STATA.

      About your question: both of my variables (depvar and indepvar) are continuous. That's why I am using -xtreg-.
      I mentioned -xtlogit- just because -xtlogit- throws out all the groups that are non-discordant in terms of a dependent variable and reports the effective sample size. But -xtreg- reports the full N even when 50% of the cases do not differ from each other within groups in terms of a dependent variable.

      Best,
      Vardan

      Comment


      • #4
        Vardan:
        thanks for clarifying.
        What you experience is due to the different machineries underlying -xtlogit- and -xtreg- (the latter is actually the way to go if your regressand is continuous).
        -logistic- regression works well when there's a good combination of 0 and 1 in the dependent variables and/or when predictors do not predict the outcome perfectly (in the latter case Stata omits some observations as a workaround).
        -xtreg- demeans all the variables (dependent and independent): hence, if there are ni missing values, all the observations are considered in the analysis.
        That's why the number of observations can differ between -xtreg- amd -xtlogit-..
        As an aside, -xtlogit- allows conditional fixed effect estimate (incidental parameters bias, you know), which differ from the fixed effect you obtain from -xtreg-..
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,

          Thank you for the clarification.

          Best regards,
          Vardan

          Comment


          • #6
            I would report N as the number of families and note that, when the response is binary and you use conditional logit, families where siblings all have the same outcome do not contribute to the estimation. Apparently it is not well known that if you use a linear model -- which is fine as an approximation -- any family where y doesn't vary also falls out of the estimation. It's just that it happens without Stata needing to check for it. Look at the formula for the within estimator and you'll easily see that if y(i,t) = y(i) for all t then unit i does not contribute to the FE estimate. This is true whether y is discrete, continuous, mixed.

            And I prefer t view the linear fixed effects estimator as using the within transformation that eliminates the heterogeneity. That's why there's no incidental parameters problem. The fact that it is identical to including N dummy variables is more like an algebraic accident. Conditional logit eliminates the heterogeneity in a way similar to the within transformation. As Carlo says, with logit and small family sizes you should not include family dummy variables.

            I'd try a linear model for all outcomes estimated by FE and conditional logit when y is binary. I might also try correlated random effects probit.

            Comment

            Working...
            X