Panel regression with dependent variables at different frequencies (year and decade)

Lucas Mation

Join Date: Mar 2014

Posts: 39
#1

Panel regression with dependent variables at different frequencies (year and decade)

11 Dec 2014, 07:00

I'm running a regression with yearly panel data on 1300 municipalities over 13 years (2000-2013). Lets say Y on X including municipality fixed effects.

As a robustness check, I would run an additional regression that controls for a variable Z that is only available that is only available at Census years (2000 and 2010). The idea is to control for the the long term impact of that variable within each municipality. I actually also have data on Z for 1991, so I could do a quadratic interpolation.

Is it too much of statistical sin to interpolate Z for every year withini municipalities, and re-estimate the regression with interpolated Z ?
Any specific estimation procedures that I should use in this case?

I could of course be more agnostic and run the regression of Y on X including fixed effects and municipality specific time trends. It just seems counter intuitive not to use the observed signal from Z

thanks in advance
Lucas

PS: I just found this review of time-series methods for different frequency data (http://www.norges-bank.no/pages/9311...er_2013_06.pdf). However, in this case I have a panel structure that is long in N, not T.
Tags: None
Lucas Mation

Join Date: Mar 2014

Posts: 39
#2

11 Dec 2014, 07:13

sorry, I meant INDEPENDENT, right side, variables at different frequencies
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#3

11 Dec 2014, 22:51

Lucas:
interpolating from existing data to fill in missing observations in panel data is an advice that quite often comes alive on this list.
I would perform the panel data analysis with both missing and interpolated Z, take a look at the difference (mainly, how many observations are dropped via casewise deletion due to missing Z, if you do not interpolate) and report the results of those two approaches.

Kind regards,
Carlo
(Stata 19.0)
Comment
Lucas Mation

Join Date: Mar 2014

Posts: 39
#4

12 Dec 2014, 06:13

Carlo,

thank you for your response.
I take your point and will run a regression using only years 2000 and 2010.

I'm still doubtful, though. In this case there is A LOT of interpolation: for every cross section unit (municipality) I observe 2 years of variable Z (2000 and 2010) and interpolate the remaining 12 years. To the point that I don´t even know if interpolation is the correct intuitive way to think about this.

regards
Lucas
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#5

12 Dec 2014, 10:25

Lucas:
another road to follow is -mi-, whith entails all the missing data management machinery, first of all whether your missing data are ignorable or not.
However, you can justify, as a research assumption, the exclusion of Z from the right-hand side of the equation because of too missing values (due to the fact that Census took place only twice during the 2000-2013 span of time), and, just to keep things even simpler, I would also consider which is the expected effect of a variable that was measured only twice in a 13-year period (probably a negligible one, so you can get a rid of it harmlessly),

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Panel regression with dependent variables at different frequencies (year and decade)

Comment

Comment

Comment

Comment