Panel data preparation and xtreg function

Cleo Chor

Join Date: Aug 2020

Posts: 4
#1

Panel data preparation and xtreg function

30 Aug 2020, 01:13

Hi

I am new to econometrics and stata, and appreciate some help for the following qns related to my dissertation. I am investigating property price impact due to opening of rail station in year 2010.

The intention is to develop a fixed effect model with panel dataset, using difference-in-difference concept where there is treated and controlled transactions.
(a) find price growth within 0-1km, 1-2km (treated) against 2-3km (control) of rail stations
(b) investigate any anticipation effect i.e. which year did the price start increasing before year 2010 and how much each year/percentage. Also the increase trend after year 2010.

I have property transactions from 1995 to 2019 and using GIS software, i have filtered the transactions within 0-1km, 1-2km and 2-3km of stations. And i have created a column name buffer_km with 1/2/3 to denote the corresponding distance.

However i am rather confused over category variable and where to put 0 and 1. I have done two methods and it gave very different coefficients for buffer_km1/2 and r1km/r2km.

Code:

gen logprice = log(price) encode type, gen(typehouse) gen YearsStr = substr(date,1,4) encode YearsStr, gen(YearsS) encode lsoa11, gen(LSOA_num) encode tenure, gen(Tenure) gen YearsN = real(YearsStr) gen r1km = (YearsN>=2010 & buffer_km==1) gen r2km = (YearsN>=2010 & buffer_km==2) gen r3km = (YearsN>=2010 & buffer_km==3) xtset LSOA_num xtreg logprice ib4.typehouse ib2.Tenure i.YearsS ib3.buffer_km, fe xtreg logprice ib4.typehouse ib2.Tenure i.YearsS r1km r2km, fe

Which method should i use to answer qn (a) above that reflects the price growth within 0-1km and 1-2km comparing transactions before and after year 2010?

For qn (b) on the anticipation effect, i was told i have to find a way to regress against year 2010 but i don't really know how to do it. What do the coefficient in the results output under YearsS 1996 to 2019 mean in statistical terms?

Grateful for any advice to the queries above. Thank you!

Regards
Cleo

Attached Files
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

31 Aug 2020, 11:57

Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions-provide Stata code in code delimiters, readable Stata output, and sample data using dataex. You can simplify your model down to the minimum necessary to demonstrate the problem.

What I suspect is going on is that r1km r2km includes only the years after 2009 while it may be that buffer_km includes all the years. Given that you want to look at the effect of a change in 2010, it makes sense to use dummies restricted to after the change.

The year dummies largely show that prices have slowly increased over the years. The values are smooth enough that you might even consider using year as a continuous variable since it's just a control.

Anticipation effect probably means that if you created a dummy that was the distance in 2009 or 2008, would it explain the price in 2010? This could be done by generating the variables using lags or egen by the panel variable to move all the older dummies up into the observation for 2010 and then running the estimate only on the data for 2010. So, you would have the pricing 2010 as the dependent variable, and the dummies for the coming train station as right hand side variables. However, you should consider whether you can simply add that set up dummies to your current model. Since the dummy would equal zero for all the observations except the one associated with 2009, the dummy would essentially be explaining 2010.
Comment
Cleo Chor

Join Date: Aug 2020

Posts: 4
#3

01 Sep 2020, 03:24

Hi Phil

Thank you for your reply!

For r1km r2km columns, I have inserted 1 for transactions that occur after 2009, the rest of the transactions are 0. And for buffer_km, all transactions will have either 1/2/3 depending on which radius it falls under.
Actually what i want is to make the r3km as the control group and r1km r2km to be treated groups, and find the coeff of r1km and r2km relative to r3km, taking into account that the change (station opens) takes place in 2010, but both my methods do not seem to be doing that?

My understanding of anticipation effect is to show that there is an uplift (significant growth) in price starting from say in 2008, 2009, people are willing to pay more as they anticipate the station is opening in 2010. So it may not be the case of explaining the price in 2010? In this case, how best would you suggest to go about doing it?

Thank you!
Comment

Announcement

Panel data preparation and xtreg function

Comment

Comment