Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mixed model with endogenous variable

    Hello everyone,

    I have multilevel data (individuals nested within municipalities). In my database I have information on individuals and on the municipality where they live. In the table there is an exemple of the structure of my data:
    individual age sex fear municipality homicide rate gini security spending
    1 20 male 0.4 1 8 0.2 1000
    2 25 male 0.2 1 8 0.2 1000
    3 50 female 0.8 2 12 0.5 500
    4 89 male 0.8 3 21 0.4 1200
    5 75 male 0.4 3 21 0.4 1200
    6 12 female 0.2 3 21 0.4 1200
    7 54 female 0.1 4 17 0.3 3000
    8 33 female 0.5 4 17 0.3 3000
    9 60 female 0.7 4 17 0.3 3000
    Overall, there are 740 different municipalities in my dataset, with a minimum of 19 and a maximum of 1700 individuals per municipality.

    My main goal is to estimate the impact of income inequality of a municipality on the fear of its residents. However, I suspect the gini coefficient to be endogenous. As my data are hierarchical, I estimate my model using the mixed command (Stata 14) and 2SLS procedure.

    I'm first regressing the gini coefficient on the instrumental variables and the municipality level controls. I then stored the estimated gini:
    Code:
    reg gini IV1 IV2 homicide_rate security_spending, cluster(municipality)
    Code:
    predict gini_est
    One of my first question is: as my observations are at the individual level, when I'm estimating my first stage equation, results are potentially biased, as some municipalities will have 19 observations and others 1700. Is it enough to correct this problem by adding the cluster(municipality) option ?

    Then I'm using my predicted gini to estimate the mixed model as follow:
    Code:
    mixed fear gini_est homicide_rate security_spending age sex || municipality: , vce(cluster municipality)
    I added the vce(cluster municipality) option to obtain robust clustered standard errors.

    My second question is: Does it seem correct to estimate my model this way, or could I improve something ?

    Thank you very much,
    Lucie

  • #2
    Doing the instrumental variables manually can run into problems. Another option would be reghdfe which allows for endogenous variables and multi-dimensions of panels. However, it does only do fixed effects. Alternatively, you could use SEM/GSEM to explicitly model this. Custered standard errors fixes problems with the standard errors but generally does not fix problems with the consistency of the betas. If you're really worried about varying samples per municipality, a weighted estimator might be considered.

    Comment


    • #3
      Thanks for your suggestions. GSEM was my first choice but it often failed to converge, that's why I turned to 2SLS. I probably need to examine this possibility again. I never heard of reghdfe, I will take a look at it.

      Comment

      Working...
      X