Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating moving average variables from another data set and merge them onto the main data set

    Dear all,

    My data set is of housing prices in a city. I will call this data "sales data set." The unit of analysis is a housing unit being sold. And the data set has a district ID variable(D_id), year, month, and the dependent variable of interest (Y). The year and month information indicate the time of a house being sold. The number of observations is 872.

    D_id year month Y
    a 2012 3 70.2
    a 2015 6 55.6
    b 2013 5 44.5
    b 2013 11 23.5
    c 2013 2 11.4
    c 2013 7 41.5
    c 2013 7 82.0
    c 2013 9 50.5
    c 2014 11 21.1
    ...

    And I have another data set, which contains the average housing price(AP) for all the districts and all months. I will call it "city data set." The years are from 2011 to 2016. The number of observation is 57,239. The number of district is 1,031.

    D_id year month AP
    a 2011 1 23.4
    a 2011 2 35.6
    a 2011 3 99.8
    a 2011 4 60.4
    a 2011 5 70.2
    a 2011 6 55.6
    a 2011 7 23.4
    a 2011 8 35.6
    a 2011 9 99.8
    a 2011 10 60.4
    a 2011 11 70.2
    a 2011 12 55.6
    a 2012 1 23.4
    a 2012 2 35.6
    a 2012 3 99.8
    a 2012 4 60.4
    a 2012 5 70.2
    a 2012 6 55.6
    a 2012 7 23.4
    ...

    My goal is to calculate the 3-month, 6-month, 9-month, and 12-month moving averages before 1 month for a housing unit sold using the city data set. For example, as for the first observation in the sales data set, I have to calculate the 3-month moving average of the district a's average housing prices from 2010.12 to 2011.2 by using the city data set. For the 6-month moving average, I have to calculate it from 2010.9 to 2011.2 for the district a. Then, I have to merge these moving average variables onto the sales data set.

    I am trying to solve this problem for a couple of days, but I am failing. I hope this problem interests you, and experts in Stata programming could provide some help and advice.

    Thank you very much.

    Sincerely,

    Tony Yoon
    Last edited by Tony Yoon; 17 Nov 2017, 23:35.

  • #2
    The first obstacle in your path is the lack of a proper monthly date variable. The separate month and year variables are not useful for calculatng time intervals. Once you have that, the rest is simple if you use the -rangestat- command, written by Robert Picard, Nick Cox, and Roberto Ferrer, and available from SSC (-ssc install rangestat-).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 d_id int year byte month float ap
    "a" 2011  1 23.4
    "a" 2011  2 35.6
    "a" 2011  3 99.8
    "a" 2011  4 60.4
    "a" 2011  5 70.2
    "a" 2011  6 55.6
    "a" 2011  7 23.4
    "a" 2011  8 35.6
    "a" 2011  9 99.8
    "a" 2011 10 60.4
    "a" 2011 11 70.2
    "a" 2011 12 55.6
    "a" 2012  1 23.4
    "a" 2012  2 35.6
    "a" 2012  3 99.8
    "a" 2012  4 60.4
    "a" 2012  5 70.2
    "a" 2012  6 55.6
    "a" 2012  7 23.4
    end
    tempfile avg_data
    save `avg_data'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 d_id int year byte month float y
    "a" 2012  3 70.2
    "a" 2015  6 55.6
    "b" 2013  5 44.5
    "b" 2013 11 23.5
    "c" 2013  2 11.4
    "c" 2013  7 41.5
    "c" 2013  7   82
    "c" 2013  9 50.5
    "c" 2014 11 21.1
    end
    tempfile sales_data
    save `sales_data'
    
    //    CREATE A MONTHLY DATE VARIABLE AND
    //    THEN COMPUTE BACK AVERAGES
    use `avg_data', clear
    gen monthly_date = mofd(mdy(month, 1, year))
    format monthly_date %tm
    drop month year
    rangestat (mean) ma_3 = ap, interval(monthly_date -3 -1) by(d_id)
    rangestat (mean) ma_6 = ap, interval(monthly_date -6 -1) by(d_id)
    rangestat (mean) ma_12 = ap, interval(monthly_date -12 -1) by(d_id)
    isid d_id monthly_date, sort
    tempfile moving_averages
    save `moving_averages'
    
    //    BRING IN SALES DATA AND CALCULATE MONTHLY DATE VARIABLE
    use `sales_data', clear
    gen monthly_date = mofd(mdy(month, 1, year))
    format monthly_date %tm
    drop month year
    
    //    MERGE WITH MOVING AVERAGES
    merge m:1 d_id monthly_date using `moving_averages', keep(master match)
    The example data you gave doesn't really illustrate the solution well because most of your average data is for time periods that do not relate to your sales data. Nevertheless, with real data, this code should do what you ask.

    In the future, please use -dataex- to show example data, as I have done here. Listings of the type you show are inconvenient for importing into Stata, and they cannot include full details about data storage types, formatting, etc., that are sometimes crucial to the solution (though not in this case). By using -dataex- you make it possible for those who want to help you to create a complete and faithful replica of your Stata example with a simple copy/paste operation. Use -dataex- every time. If you are running Stata version 15.1, you already have it as part of official Stata. If you are running 15.0, then you should in any case update to 15.1 by running -update all-. If you are running an earlier version of Stata, you can get -dataex- (by Robert Picard) from SSC, with -ssc install dataex-. Instructions for using -dataex- are obtained by running -help dataex-. Thank you.

    Comment


    • #3
      I am sorry for the late response. Thank you very much! - Tony Yoon.
      Last edited by Tony Yoon; 02 Jan 2018, 05:52.

      Comment

      Working...
      X