Repeat sales regression data setup

Amrik Singh

Join Date: May 2016

Posts: 7
#1

Repeat sales regression data setup

25 May 2016, 21:10

Dear Stata Forum Members:
I would appreciate any help I can get in setting up the data so I can run a repeat sales regression (RSR) which is a methodology commonly used in the real estate literature. I have searched the archives and could not find a solution. I have a dataset of property prices in which each unique property has been sold repeatedly over a specified time period. It is based on the Case and Shiller (1987, 1989) repeat sales methodology for creating a price index. In the first step, an OLS regression is used with no constant term (intercept) in the analysis. In the second step, the squared residuals are regressed on the holding period (number of quarters between each sale) and a constant term. In the third step, a weighted least squares regression (WLS) is applied where the weights use the inverse of the square roots of the predicted values from the second step. The weighted repeat sales or price index going from period 0 to period t is obtained by exponentiating the corresponding regression coefficients.

My original sample data has been sorted by property id and date of sale for each property in Excel and imported into Stata 14 SE.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte propid int date long price 1 20075 1380000 2 19463 3100000 2 19534 3200000 3 19432 2000000 3 19857 3800000 4 19358 2600910 4 19627 2500000 5 19038 4500000 5 19536 5100000 6 19065 3605276 6 20065 2825000 7 19319 5000000 7 19667 5054085 7 20013 5830000 8 19211 4445885 8 19474 6100000 9 19106 3670400 9 19362 5150000 9 19633 7400000 9 19921 10650000 end format %tdnn/dd/CCYY date

What I need is for the new dataset (without changing the original data) to look like the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte propid int date2 long price2 int date1 long price1 byte(holdperiod y2012q2 y2012q3 y2012q4 y2013q1 y2013q2 y2013q3 y2013q4 y2014q1 y2014q2 y2014q3 y2014q4) 1 . . 20075 1380000 . . . . . . . . . . . . 2 19534 3200000 19463 3100000 0 . . . . . . . . . . . 3 19857 3800000 19432 2000000 4 0 0 0 -1 0 0 0 0 1 0 0 4 19627 2500000 19358 2600910 3 0 0 -1 0 0 1 0 0 0 0 0 5 19536 5100000 19038 4500000 5 0 0 0 0 1 0 0 0 0 0 0 6 20065 2825000 19065 3605276 11 0 0 0 0 0 0 0 0 0 0 1 7 19667 5054085 19319 5000000 4 0 0 -1 0 0 0 1 0 0 0 0 7 20013 5830000 19667 5054085 3 0 0 0 0 0 0 -1 0 0 0 1 8 19474 6100000 19211 4445885 2 0 -1 0 0 1 0 0 0 0 0 0 9 19362 5150000 19106 3670400 3 -1 0 0 1 0 0 0 0 0 0 0 9 19633 7400000 19362 5150000 3 0 0 0 -1 0 0 1 0 0 0 0 9 19921 10650000 19633 7400000 3 0 0 0 0 0 0 -1 0 0 1 0 end format %tdnn/dd/CCYY date2 format %tdnn/dd/CCYY date1

In the results, date2 and price2 represent the second sale date and price of the property, date1 and price1 represent the first sale (date1<date2), holdperiod is defined as the difference in numbers of quarters between the two sale dates since this is done on a quarterly basis. In the repeat sales regression, the dependent variable is the difference in the log of the two sale prices (logprice2 - logprice1) regressed on a set of dummy variables corresponding to the time periods (quarters in this case). The dummy variable for the second sale (date 2 price2) has the value of +1, the first sale has a value of -1 (date1 price1), and 0 for no sale over the sample period. The time dummy for the initial (base) period is excluded to set the price index at 1.

Additional information about the results:
* The first and last quarter is identified from the sample dates and used as the basis for creating the required number of time dummy variables, whether in years, quarters or months, in this case quarters.
* Each repeat sales is a pair of observations. Only properties that sell multiple times help in identifying the price index so only observations with repeat sales are used in the analysis.
* Property id no. 1 has only one observation so it is set to missing for the dates and dummy variables and will be excluded from the analysis.
* Property id no. 2 has both sales occurring in the same quarter so it is set to missing for the dummy variables since -1 and 1 in the same quarter equates to 0.
* Property id no. 7 has been sold 3 times (11/12, 11/13, and 10/14). In coding this sale, 11/12 and 11/13 would form one pair of observation (-1, 1) and 11/13 and 10/14 would form the second pair of observation (-1,1) for this property.
* Property id no. 9 has been sold 4 times (4/12, 1/13, 10/13, 7/2014) forming 3 pairs of observations. A property could be sold multiple times over the sample period but the process is the same.

Thank you very much in advance for your help.

References:

Case, K., & Shiller, R. (1987). Prices of Single-Family Homes since 1970. New Indexes for Four Cities. New England Economic Review, (September/October): 45-56.

Case, K., & Shiller, R. (1989). The Efficiency of the Market for Single Family Homes. The American Economic Review, 79(1): 125-137.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

26 May 2016, 14:22

Your question is a bit too complex. You're asking the group to solve a lot of issues all at once. You need to work on this and ask simpler questions. You might look at the FAQ on asking questions - it will help you get useful answers.

It looks to me like you'll need to set up the data by property and date. Then egen with a by property and date to calculate some variables from which you can determine which observations to keep.
Comment
Amrik Singh

Join Date: May 2016

Posts: 7
#3

26 May 2016, 17:28

Thanks for your response Phil. I will work on that.
Comment

Announcement

Repeat sales regression data setup

Comment

Comment