Linear Regression or Panel Regression

Lucca Mancini

Join Date: Mar 2019

Posts: 27
#1

Linear Regression or Panel Regression

13 Jul 2019, 14:43

Hello

I just read a paper and would like to try to replicate it. However, I don't exactly understand whether the author of the paper is doing a linear regression or a panel regression. His regression equation looks like this

with y_it denoting subjective inequality indices for individual i in year t. The regressor of main interest is East_it, a dummy variable denoting current residency in East Germany. The regression also includes a series of control variables and survey-year fixed effects, denoted by x_it and λ_t, respectively.

I know that he has a dataset for 3 years (1987, 1992 and 1999) in which he performs the regression, but he does not mention whether he is performing a linear regression or a panel regression, he only mentions, and I quote: "I run a series of simple regression models".
What do you think the author does in his paper a panel regression or a linear regression? I'm confused because I'm not quite sure if you can do a linear regression with a dataset over 3 years with a dummy.

Thank you so much for your help.

Last edited by Lucca Mancini; 13 Jul 2019, 14:49.
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5011
#2

13 Jul 2019, 18:17

My guess is panel regression, e.g. xtreg. That is why there are all these t subscripts. I don't see why you couldn't have the dummy.

When you say you are going to try to replicate, do you mean that you have the same data? If so you should be able to tell whether you are doing it the same way or not.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Lucca Mancini

Join Date: Mar 2019

Posts: 27
#3

14 Jul 2019, 03:24

Thank you Mr Williams for your feedback.

I try to estimate the same equation with the same data, but compared to the author I have only data sets of 2 years (1992 and 1999) whereas the author has over three years (1987, 1992 and 1999).

I'm unsure whether to do panel regression or linear regression with a 2 year dataset. I don't know where the difference lies.
Comment
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#4

14 Jul 2019, 03:50

with y_it denoting subjective inequality indices for individual i in year t.

This already answers your question. You have data for both i and t , so it's panel data.

I'm unsure whether to do panel regression or linear regression with a 2 year dataset.

A linear regression is a regression where you estimate a linear relationship between your y and x variables. That is the case above. Thus, it's a linear regression with panel data. Panel data doesn't mean that you cannot do linear regression.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#5

14 Jul 2019, 04:58

Originally posted by Wouter Wakker View Post

This already answers your question. You have data for both i and t , so it's panel data.

A linear regression is a regression where you estimate a linear relationship between your y and x variables. That is the case above. Thus, it's a linear regression with panel data. Panel data doesn't mean that you cannot do linear regression.

I agree with Wouter. Empirical researchers would do well to remember the difference between an estimation method and a model. The equation you posted is a model. It an be estimated many different ways. By "regression" you presumably mean "pooled OLS." One could also use random effects, which is a particular GLS estimator. Or, one could use fixed effects, which removes time averages. Yes, these estimation methods are intended for different models, but any estimation method can be applied to the problem.

I would guess that removing heterogeneity is important to infer causality, and so I would tend to use fixed effects. But I almost always check pooled OLS, and maybe even random effects, and also maybe first differencing. One can learn a lot by doing all four.

And as was mentioned above, there is no issue in East(i,t) being a dummy variable. It gets transformed just like any other variable. Don't overthink. Pick up a panel data book and you'll notice that no special treatment is given to dummy variables.

With two years of data, FE and FD will be the same. So there are not two separate ways to remove heterogeneity.

JW
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#6

14 Jul 2019, 05:12

I will add that the fact that the error term, epsilon(i,t), in the equation, is not separated into a heterogeneity term and an idiosyncratic error, strong suggests the original study did not use FE or FD. My guess is pooled OLS was used. It's like that there is little variation in the variable East(i,t) across t for each individual, and so FE probably wipes out the effect.

If pooled OLS is used, cluster the standard errors at a minimum. And I would use FE to see what happens.
Comment
Lucca Mancini

Join Date: Mar 2019

Posts: 27
#7

14 Jul 2019, 09:37

Thank you very much Mr Wooldridge

I would like to show here the table of the paper to be replicated in which the regression equation shown above was applied over the years 1987, 1992 and 1999 for region east and west:

However, when I try to do a pooled OLS regression, I get the answer "repeated time values within panel" as soon as I type "xtset panel_id year". I don't know exactly why I get this error message.

I have for region and panel_id 2 values (East and West) and for year as well (1992 and 1999). However, these are survey data from 2 years in which the same persons were not interviewed.

- In 1992, 579 participants from East and 170 participants from West participated.

- In 1999, 678 participants from East and 201 participants from West participated.

Can one estimate a pooled OLS with these data? Or should I use a different approach?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5011
#8

14 Jul 2019, 09:56

If the same people aren’t being interviewed, it isn’t panel data, is it? It is successive cross-sections. Which to me seems to go counter to what I thought you were saying before.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Lucca Mancini

Join Date: Mar 2019

Posts: 27
#9

14 Jul 2019, 11:13

Absolutely correct, Mr Williams, these are successive cross-sections. Do you have any idea what kind of regression you should use here to create a table similar to the one above? Maybe a linear regression?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5011
#10

14 Jul 2019, 11:49

It looks like an ols regression to me, e.g.

reg y i.year i.region i.region#i.year

if you really have the same data I would think you could reproduce the results for the two years you have.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Lucca Mancini

Join Date: Mar 2019

Posts: 27
#11

14 Jul 2019, 12:31

Thank you very much for your help Mr Williams. I appreciate it very much.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5011
#12

14 Jul 2019, 13:53

What is the citation for the paper? The notation still seems weird to me if it isn’t panel data. But maybe it makes sense in context. Or, maybe it is just wrong.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Lucca Mancini

Join Date: Mar 2019

Posts: 27
#13

14 Jul 2019, 14:05

Dear Mr Williams

The following link contains the paper: http://www.econ.uzh.ch/static/wp/econwp009.pdf
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#14

14 Jul 2019, 17:50

The notation in the equation is misleading. This looks like repeated cross sections, not panel data. With East(i,t) in the equation one gets the impression that not only are the same individuals being followed over time, but that some individuals switch from East to West, or vice versa.

The analysis is like a difference in differences, where East acts like the treatment variable. Then it is interacted with the time dummies. In this case, the interest is in how perceptions have changed over time in the East versus the west.

Estimate using OLS, as suggested by Richard. I would make the standard errors robust to heteroskedasticity.
Comment

Announcement

Linear Regression or Panel Regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment