Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtset in cross-sectional data

    Dear Statalist,

    In my research, I am addressing an issue which is called "sibling rivalry".
    Sibling rivalry occurs when siblings compete for parental investments.

    To account for unobserved family fixed effects common to all siblings, I think that it may be necessary to use a household level fixed effect.
    However, my dataset is from a survey conducted in 2017, so the dataset is cross-sectional.

    An example of my estimation strategy is the following (since my dataset is from a survey, I use svy: command):
    Code:
    svy: reg log_birthweight rain_shock i.race i.gender i.birth_district i.birth_month
    This is the first stage regression where log_birthweight is an endogenous variable and rain_shock is an IV for log_birthweight.

    It is ideal for me to add household level fixed effects in this expression, but if I execute the following code,
    Code:
    svy: reg log_birthweight rain_shock i.race i.gender i.birth_district i.birth_month i.hhid
    the error
    Code:
    r(103);  too many variables specified
    occurs. Since I understand why this happened (because the number of hhid is very large), I thought about taking a different approach which will be explained below.

    I recalled that the professor explained in the econometrics class that cross-sectional data can be handled like panel data.
    The professor used the following example.
    HTML Code:
    Y_ij = \beta_0 + \beta_1 * X_ij + a_j + u_ij
    i: individual, j: school, Y: test score, X: teacher's experience, a: fixed effect, and u: error term.

    I thought that this approach could be used in my own setting as well. That is, my idea is using
    Code:
    xtset sibling_id child_id
    Is this approach useful in my setting?
    Also, if this is true, how should I estimate the 2SLS regression using
    Code:
    svy:
    command?

    I would greatly appreciate any suggestions you might have.

    Best,
    Kentaro

  • #2
    because the number of hhid is very large
    On one technical detail, you can absorb the household dummies since there is no requirement to declare a time variable when you xtset your data. Whether your approach in its entirety makes sense, I do not know. Here is an example based on cross-sectional data

    Code:
    sysuse auto
    encode make, gen(mk)
    set seed 1978
    gen car= runiformint(1, 48)
    regress mpg price weight i.car
    xtset car
    xtreg mpg price weight, fe

    Comment

    Working...
    X