2SLS with polynomial distributed lags in both endogenous variables and instruments

Ryan Long

Join Date: Feb 2019
Posts: 10

2SLS with polynomial distributed lags in both endogenous variables and instruments

10 Mar 2019, 07:28

Hi all,

I am investigating the effect of air pollution (measured by pollutant concentration) on health (proxied by number of hospital visits) through a 2SLS regression. Here's a sample of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year month week) int hosp_visits float(pm25 radiatpw temp prec windsp)
2012  8 34 2669      24.3  6.699171 27.797144 4.5666666 7.519048
2012  9 35 2533  26.57143  6.798932 26.885714  7.595238 6.380952
2012  9 36 2343 33.914284 33.018562  27.67857 2.8619046 8.088095
2012  9 37 2619  27.97143  6.873756  27.59524 2.9285715 6.266667
2012  9 38 2579 35.685715 13.724868 28.207144  3.447619 6.157143
2012  9 39 2517 27.685715 17.431175     27.87  6.035714 6.381905
2012 10 40 2575      28.8  4.580237  28.14857 2.3809524  6.17619
2012 10 41 2638  30.34286  .7070137  27.52238 4.4690475 5.316667
2012 10 42 2695      23.8  .3770984  26.83524 13.909524  5.27619
2012 10 43 2768  18.22857 1.0381021  28.19762  1.545238 6.088095
2012 11 44 2605 14.942857  .7895027  27.38524  5.447619 6.454762
2012 11 45 2578 13.085714 .23479086 26.830954  8.711905 5.733333
2012 11 46 2581 17.114286 .43059835 27.064285 14.583333 5.254762
2012 11 47 2504 17.657143  .2927277 27.190475 11.383333 5.442857
2012 12 48 2542  19.77143 .54017836 27.140953  8.892857 4.780952
2012 12 49 2681 12.857142 1.0204597  26.94286 14.038095 5.707143
2012 12 50 2604  14.17143 1.4329002  26.43333 14.869048 5.057143
2012 12 51 2497 13.685715  2.502648  26.22143 14.228572 4.795238
2012 12 52 2812 11.342857  3.386352 26.524763  2.990476 5.947619
2013  1  1 3022      11.6  3.368256  26.80857 16.009523 6.016667
end

hosp_visits represent number of hospital visits for respiratory conditions in the week. pm25 represents average pollutant concentration. radiatpw is a measure of the radiative power of fire hot spots. temp represents average temperature. prec represents average amount of precipitation. windsp represents average wind speed.

For one of my specifications I plan to do something like this: hosp_visits = pm25 + pm25squared +pm25cubed + pm25(t-1) + pm25(t-1)squared +pm25(t-1)cubed + weather controls + time fixed effects + constant, where pm25(t-1) represents the first lag of pm25 (I define the time unit as the week). I include windsp and radiatpw as instruments that only affect hosp_visits through the channel of pm25. However, I would also like to capture the non-linear and lingering effect of forest fires on pollution by including the first lags of radiatpw with its square.

I would like to check if this specification is sound. If it is, I wish to clarify what is the proper way to implement it in Stata code.

This is my clunky attempt at the code:

Code:

ivregress 2sls hosp_visits `weather' i.month i.year (c.pm25##c.pm25##c.pm25 L.c.pm25##L.c.pm25##L.c.pm25 = c.radiatpw##c.radiatpw L.c.radiatpw##L.c.radiatpw windsp), r

Any advice would be much appreciated.

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

11 Mar 2019, 12:52

Many of the issues around specification depend on substantive and theoretical knowledge of your field which we don't have for the most part. But, let me offer you a few impressions.

First, if you had tried to run this, you probably would have found you can't use factor variable notation on the lhs of the instrument equation. You'll have to calculate these before the ivreg.

Second, I would be surprised if you can estimate a model with all these very similar variables without a colinearity problem. Maybe there is little serial correlation in pm, but most variables most of the time have serial correlation.

Third, you may want to look at user-written ivreg2 also - it provides a bunch of built in diagnostics that ivreg doesn't.
Comment
Ryan Long

Join Date: Feb 2019

Posts: 10
#3

13 Mar 2019, 08:51

Hi Phil, thanks for your reply.

Originally posted by Phil Bromiley View Post

Many of the issues around specification depend on substantive and theoretical knowledge of your field which we don't have for the most part. But, let me offer you a few impressions.

First, if you had tried to run this, you probably would have found you can't use factor variable notation on the lhs of the instrument equation. You'll have to calculate these before the ivreg.
As a matter of fact factor notation works for me. In any case, I have generated new variables for my non-linear terms as using factor notation can be really messy.

Second, I would be surprised if you can estimate a model with all these very similar variables without a colinearity problem. Maybe there is little serial correlation in pm, but most variables most of the time have serial correlation.
That's true. That said, I did manage to get significance on the coefficients on my explanatory variables pm with a cubic model with the way I specified the instruments.

Third, you may want to look at user-written ivreg2 also - it provides a bunch of built in diagnostics that ivreg doesn't.
Good advice. I have switched to ivreg2.

I do have some quick questions as I proceed with my research.

1) If I specify a model with my endogenous variables as pm25 and its lag, what are some possible ways to calculate the effect on hospital visits when pm level rises from say 30 to 70 from my estimated coefficients?

2) In a quadratic specification of pm (pm and pmsq), if pm is not significant while pmsq is, will using

Code:

lincom (70-30) *_b[pm25] + (70^2-30^2) *_b[pm25sq]

to estimate the overall effect on hospital visits of pm increasing from 30 to 70 still be reliable?
Comment

Announcement

2SLS with polynomial distributed lags in both endogenous variables and instruments

Comment

Comment