Hi!
I am running a First-Differenced model, where I have immigration change as a key explanatory variable. I look at municipalities, from 2004 to 2019, and the change is calculated every 4 years. Δm_it is immigration change and ΔVote_it is a vote share outcome variable.
ΔVote_it = a + b* Δm_it + e
To exogenize immigration change, I use a past-settlements IV. This uses the past distribution of immigrants across municipalities, to instrument future immigration inflows. My base year for the instrument is 2003 (one year before my sample starts). With this, I obtain a "prediction" of immigration change - as if current immigrants distributed as their peers did in the past.
Please look at this scatter plot for all the actual and predicted immigration changes in my dataset: https://imgur.com/a/XDkG4Hp. Or for a specific year: https://imgur.com/FOBrfqF. Or doing a binscatter, absorbing the year fixed effects: https://imgur.com/Iimo0jx.
Δm_it (Y axis) is the actual change in immigrants (as a share of population, in pp terms). Δm_it hat (X axis) is the predicted change in immigrants. (Note the latter is not the prediction from the First Stage, but simply the calculation of change in immigrants, as if they distributed as their peers did in the past).
Now: when I run my IV model, the first-stage says that Δm_it hat has a coefficient of 0.70, statistically significant at the 1% level. The Cragg-Donald Wald F statistic in ivreg2 output says: 1.4e+04. This is before adding any control variable or fixed effects. After adding all the latter, my F Statistic goes down to 3742. Any idea why these Huge F Statistics?
My dataset consists of 14,000 data points (around 3,000 municipalities). If it helps, in the attached dataset, you will see my variables - Δm_it, Δm_it hat, ΔVote postalcode (municipality), year. The output I mention can be replicated with "ivreg2 vs_RIGHT_D (m_D =m_hat_D) , first"
Many thanks!
I am running a First-Differenced model, where I have immigration change as a key explanatory variable. I look at municipalities, from 2004 to 2019, and the change is calculated every 4 years. Δm_it is immigration change and ΔVote_it is a vote share outcome variable.
ΔVote_it = a + b* Δm_it + e
To exogenize immigration change, I use a past-settlements IV. This uses the past distribution of immigrants across municipalities, to instrument future immigration inflows. My base year for the instrument is 2003 (one year before my sample starts). With this, I obtain a "prediction" of immigration change - as if current immigrants distributed as their peers did in the past.
Please look at this scatter plot for all the actual and predicted immigration changes in my dataset: https://imgur.com/a/XDkG4Hp. Or for a specific year: https://imgur.com/FOBrfqF. Or doing a binscatter, absorbing the year fixed effects: https://imgur.com/Iimo0jx.
Δm_it (Y axis) is the actual change in immigrants (as a share of population, in pp terms). Δm_it hat (X axis) is the predicted change in immigrants. (Note the latter is not the prediction from the First Stage, but simply the calculation of change in immigrants, as if they distributed as their peers did in the past).
Now: when I run my IV model, the first-stage says that Δm_it hat has a coefficient of 0.70, statistically significant at the 1% level. The Cragg-Donald Wald F statistic in ivreg2 output says: 1.4e+04. This is before adding any control variable or fixed effects. After adding all the latter, my F Statistic goes down to 3742. Any idea why these Huge F Statistics?
My dataset consists of 14,000 data points (around 3,000 municipalities). If it helps, in the attached dataset, you will see my variables - Δm_it, Δm_it hat, ΔVote postalcode (municipality), year. The output I mention can be replicated with "ivreg2 vs_RIGHT_D (m_D =m_hat_D) , first"
Many thanks!
Comment