Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeat sales regression data setup

    Dear Stata Forum Members:
    I would appreciate any help I can get in setting up the data so I can run a repeat sales regression (RSR) which is a methodology commonly used in the real estate literature. I have searched the archives and could not find a solution. I have a dataset of property prices in which each unique property has been sold repeatedly over a specified time period. It is based on the Case and Shiller (1987, 1989) repeat sales methodology for creating a price index. In the first step, an OLS regression is used with no constant term (intercept) in the analysis. In the second step, the squared residuals are regressed on the holding period (number of quarters between each sale) and a constant term. In the third step, a weighted least squares regression (WLS) is applied where the weights use the inverse of the square roots of the predicted values from the second step. The weighted repeat sales or price index going from period 0 to period t is obtained by exponentiating the corresponding regression coefficients.

    My original sample data has been sorted by property id and date of sale for each property in Excel and imported into Stata 14 SE.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte propid int date long price
    1 20075  1380000
    2 19463  3100000
    2 19534  3200000
    3 19432  2000000
    3 19857  3800000
    4 19358  2600910
    4 19627  2500000
    5 19038  4500000
    5 19536  5100000
    6 19065  3605276
    6 20065  2825000
    7 19319  5000000
    7 19667  5054085
    7 20013  5830000
    8 19211  4445885
    8 19474  6100000
    9 19106  3670400
    9 19362  5150000
    9 19633  7400000
    9 19921 10650000
    end
    format %tdnn/dd/CCYY date
    What I need is for the new dataset (without changing the original data) to look like the following:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte propid int date2 long price2 int date1 long price1 byte(holdperiod y2012q2 y2012q3 y2012q4 y2013q1 y2013q2 y2013q3 y2013q4 y2014q1 y2014q2 y2014q3 y2014q4)
    1     .        . 20075 1380000  .  .  .  .  . . .  . . . . .
    2 19534  3200000 19463 3100000  0  .  .  .  . . .  . . . . .
    3 19857  3800000 19432 2000000  4  0  0  0 -1 0 0  0 0 1 0 0
    4 19627  2500000 19358 2600910  3  0  0 -1  0 0 1  0 0 0 0 0
    5 19536  5100000 19038 4500000  5  0  0  0  0 1 0  0 0 0 0 0
    6 20065  2825000 19065 3605276 11  0  0  0  0 0 0  0 0 0 0 1
    7 19667  5054085 19319 5000000  4  0  0 -1  0 0 0  1 0 0 0 0
    7 20013  5830000 19667 5054085  3  0  0  0  0 0 0 -1 0 0 0 1
    8 19474  6100000 19211 4445885  2  0 -1  0  0 1 0  0 0 0 0 0
    9 19362  5150000 19106 3670400  3 -1  0  0  1 0 0  0 0 0 0 0
    9 19633  7400000 19362 5150000  3  0  0  0 -1 0 0  1 0 0 0 0
    9 19921 10650000 19633 7400000  3  0  0  0  0 0 0 -1 0 0 1 0
    end
    format %tdnn/dd/CCYY date2
    format %tdnn/dd/CCYY date1
    In the results, date2 and price2 represent the second sale date and price of the property, date1 and price1 represent the first sale (date1<date2), holdperiod is defined as the difference in numbers of quarters between the two sale dates since this is done on a quarterly basis. In the repeat sales regression, the dependent variable is the difference in the log of the two sale prices (logprice2 - logprice1) regressed on a set of dummy variables corresponding to the time periods (quarters in this case). The dummy variable for the second sale (date 2 price2) has the value of +1, the first sale has a value of -1 (date1 price1), and 0 for no sale over the sample period. The time dummy for the initial (base) period is excluded to set the price index at 1.

    Additional information about the results:
    * The first and last quarter is identified from the sample dates and used as the basis for creating the required number of time dummy variables, whether in years, quarters or months, in this case quarters.
    * Each repeat sales is a pair of observations. Only properties that sell multiple times help in identifying the price index so only observations with repeat sales are used in the analysis.
    * Property id no. 1 has only one observation so it is set to missing for the dates and dummy variables and will be excluded from the analysis.
    * Property id no. 2 has both sales occurring in the same quarter so it is set to missing for the dummy variables since -1 and 1 in the same quarter equates to 0.
    * Property id no. 7 has been sold 3 times (11/12, 11/13, and 10/14). In coding this sale, 11/12 and 11/13 would form one pair of observation (-1, 1) and 11/13 and 10/14 would form the second pair of observation (-1,1) for this property.
    * Property id no. 9 has been sold 4 times (4/12, 1/13, 10/13, 7/2014) forming 3 pairs of observations. A property could be sold multiple times over the sample period but the process is the same.

    Thank you very much in advance for your help.

    References:

    Case, K., & Shiller, R. (1987). Prices of Single-Family Homes since 1970. New Indexes for Four Cities. New England Economic Review, (September/October): 45-56.

    Case, K., & Shiller, R. (1989). The Efficiency of the Market for Single Family Homes. The American Economic Review, 79(1): 125-137.



  • #2
    Your question is a bit too complex. You're asking the group to solve a lot of issues all at once. You need to work on this and ask simpler questions. You might look at the FAQ on asking questions - it will help you get useful answers.

    It looks to me like you'll need to set up the data by property and date. Then egen with a by property and date to calculate some variables from which you can determine which observations to keep.

    Comment


    • #3
      Thanks for your response Phil. I will work on that.

      Comment

      Working...
      X