Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-Effect regression help

    Hi guys, I'm a 2nd year undergrad student and need some help with the commands to run my regression.
    I am studying the effects solar farms have on nearby property prices. ( i have a list of 300 separate farms)
    For each solar farm i have created a (farm_id) which will be my fixed effect using the command 'egen', i have around 3000 individual house sales, and for each transaction i have created a dummy to show whether the transaction was before its corresponding solar farm was made operational or after (sold_after_farm_operational_dummy). I also have the transaction prices for each transaction (deflated_house_price).

    So for a few of the solar farms i will have transactions shown by the dummy to be before(0) and after(1) the farm was made operational, and i want to see the effects of these solar farms being made operational have had on the house price.

    The regression i want to run on STATA is deflated_house_price = farm_id (fixed effect) + b*sold_after_farm_operational_dummy

    i have attached a screenshot of some of the data (ignore the HOUSE_ID and HPI column)

    Some guidance on how to run this regression would be really appreciated!

    thank you

    Screen Shot 2017-08-18 at 17.05.38.png


    Last edited by sladmin; 08 Sep 2017, 08:29. Reason: anonymize original poster

  • #2
    So assuming you have no need to involve any other variables:

    Code:
    xtset farm_id
    xtreg deflated_house_price i.sold_after_farm_operational_dummy, fe vce(robust)
    Note: This would be the bare bones approach. Usually there are other variables to adjust for. One might also want to adjust for time trends in real estate prices by including either a time-trend term or indicator variables for the date sold.

    In any case, before you actually run anything, I suggest you go to the [XT] volume of the PDF manuals that come with your Stata installation and read the -xt- chapter. That will give you an overview of the commands that are available in Stata for longitudinal or serial cross-sectional data and the overall approach that Stata uses. Then read carefully the -xtset- command, as that is the first step in any such analysis and must be done correctly if all else is to work properly. Then you can read about the other specific commands that may apply to your particular situation.

    In the future, when showing example data, please do not use screenshots. Yours was easy to read, many are not, and some are completely illegible. Moreover, in the event of a more complicated question that might warrant trying to test some code on the data, there is no way to import data into Stata from a screenshot. So to be helpful to those who want to help you, you should use the -dataex- command to post example data. You can get the -dataex- command by running -ssc install dataex-. Then read -help dataex- for the simple instructions. Please use -dataex- whenever you post example data going forward.

    Comment


    • #3
      Hi sorry i will keep that in mind for future reference. Yes it is really simple, its the first regression I'm running. i typed in the commands however on the second 'xtreg' command i get an error 'no observations', any idea why?

      xtset farm_id
      panel variable: farm_id (unbalanced)

      xtreg deflated_house_price i. sold_after_farm_operational_dumm, fe vce(robust)
      no observations
      r(2000);

      I used the dataex command to add the data here

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str37 paon str24 saon str31 street str21 town str30 county str7 postcode str10(soldhouseprice date operational) float capacitymw int farm_id str5 house_id float countyspecifichpi byte sold_after_farm_operational_dumm str10 deflated_house_price
      "50"                     ""            "CHITTERNE ROAD"      "WARMINSTER"     "WILTSHIRE"                    "BA120LZ" "366,000"   "07/09/2011" "19/03/2014"   7.2  1 "1"    92.9 0 "393,972"  
      "39"                     ""            "HILL STREET"         "TROWBRIDGE"     "WILTSHIRE"                    "BA147RR" "48,500"    "29/08/1997" "19/03/2015"     4  2 "2"   32.55 0 "149,002"  
      "4"                      ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "134,668"   "21/03/1996" "15/03/2015"   1.7  3 "3"   27.46 0 "490,415"  
      "8"                      ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "180,000"   "29/11/1999" "15/03/2015"   1.7  3 "4"   43.12 0 "417,440"  
      "8"                      ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "362,500"   "04/03/2009" "15/03/2015"   1.7  3 "4"   83.57 0 "433,768"  
      "10"                     ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "275,000"   "20/10/2003" "15/03/2015"   1.7  3 "5"   76.72 0 "358,446"  
      "12"                     ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "385,000"   "09/09/2003" "15/03/2015"   1.7  3 "6"   75.85 0 "507,581"  
      "12"                     ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "415,000"   "12/05/2010" "15/03/2015"   1.7  3 "6"   91.72 0 "452,464"  
      "14"                     ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "175,000"   "28/06/2001" "15/03/2015"   1.7  3 "7"   53.64 0 "326,249"  
      "14"                     ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "289,000"   "27/08/2004" "15/03/2015"   1.7  3 "7"   84.91 0 "340,360"  
      "14"                     ""            "TROWLE"              "TROWBRIDGE"     "WILTSHIRE"                    "BA149BJ" "340,000"   "30/09/2005" "15/03/2015"   1.7  3 "7"   86.64 0 "392,428"  
      "HILLVIEW HOUSE"         ""            "WHITLEY LANE"        "STREET"         "SOMERSET"                     "BA169RW" "290,000"   "05/03/2007" "19/12/2014"     8  4 "8"   99.63 0 "291,077"  
      "4"                      ""            "HUNSTRETE ROAD"      "BATH"           "BATH AND NORTH EAST SOMERSET" "BA20BS"  "53,550"    "08/07/2002" "04/03/2015" 21.25  5 "9"   53.86 0 "99,424"  
      "6"                      ""            "HUNSTRETE ROAD"      "BATH"           "BATH AND NORTH EAST SOMERSET" "BA20BS"  "71,000"    "03/04/2000" "04/03/2015" 21.25  5 "10"  37.76 0 "188,030"  
      "6"                      ""            "HUNSTRETE ROAD"      "BATH"           "BATH AND NORTH EAST SOMERSET" "BA20BS"  "270,000"   "24/03/2017" "04/03/2015" 21.25  5 "10" 117.56 1 "229,670"  
      "KEEPERS COTTAGE"        ""            ""                    "YEOVIL"         "SOMERSET"                     "BA228BQ" "335,000"   "07/08/2002" "14/03/2013"  7.01  6 "11"   61.7 0 "542,950"  
      "KEEPERS COTTAGE"        ""            ""                    "YEOVIL"         "SOMERSET"                     "BA228BQ" "812,500"   "07/10/2015" "14/03/2013"  7.01  6 "11"  105.1 1 "773,073"  
      "BEARLEY COTTAGES"       "3"           "BEARLEY LANE"        "YEOVIL"         "SOMERSET"                     "BA228PE" "77,500"    "26/09/1996" "21/03/2013"   6.3  7 "12"  28.73 0 "269,753"  
      "FAIRFIELD HOUSE"        ""            ""                    "RADSTOCK"       "SOMERSET"                     "BA34DW"  "195,000"   "03/10/1997" "01/07/2012"   3.6  8 "13"  31.73 0 "614,560"  
      "CHELWOOD"               ""            "CHELYNCH ROAD"       "SHEPTON MALLET" "SOMERSET"                     "BA44PZ"  "380,000"   "18/09/2007" "29/03/2013"   6.4  9 "14" 104.91 0 "362,215"  
      "ELIANA"                 ""            "CHELYNCH ROAD"       "SHEPTON MALLET" "SOMERSET"                     "BA44PZ"  "145,000"   "22/08/2001" "29/03/2013"   6.4  9 "15"  49.27 0 "294,297"  
      "LYNCH COURT"            ""            "CHELYNCH ROAD"       "SHEPTON MALLET" "SOMERSET"                     "BA44PZ"  "225,000"   "02/11/2005" "29/03/2013"   6.4  9 "16"  89.23 0 "252,157"  
      "LYNCH COURT"            ""            "CHELYNCH ROAD"       "SHEPTON MALLET" "SOMERSET"                     "BA44PZ"  "279,950"   "26/07/2012" "29/03/2013"   6.4  9 "16"  92.25 0 "303,469"  
      "MIDWOOD"                ""            "CHELYNCH ROAD"       "SHEPTON MALLET" "SOMERSET"                     "BA44PZ"  "60,000"    "09/08/1995" "29/03/2013"   6.4  9 "17"  28.01 0 "214,209"  
      "SHADAROBA"              ""            "CHELYNCH ROAD"       "SHEPTON MALLET" "SOMERSET"                     "BA44PZ"  "345,000"   "02/05/2017" "29/03/2013"   6.4  9 "18" 115.42 1 "298,908"  
      "WHIDDON"                ""            "CHILCOTE DROVE"      "WELLS"          "SOMERSET"                     "BA53DU"  "712,488"   "26/02/2003" "26/02/2016"   8.6 10 "19"  67.97 0 "1,048,239"
      "77"                     ""            "BUTLEIGH HILL"       "GLASTONBURY"    "SOMERSET"                     "BA68TW"  "272,500"   "10/06/2011" "30/06/2015"   4.9 11 "20"  92.99 0 "293,042"  
      "LARKS RISE"             ""            "DOLMANS HILL"        "POOLE"          "DORSET"                       "BH166HP" "299,500"   "23/07/2001" "01/12/2011"     5 12 "21"  48.17 0 "621,756"  
      "LARKS RISE"             ""            "DOLMANS HILL"        "POOLE"          "DORSET"                       "BH166HP" "590,000"   "29/03/2016" "01/12/2011"     5 12 "21"  107.5 1 "548,837"  
      "OLD PARK COTTAGE"       ""            "DOLMANS HILL"        "POOLE"          "DORSET"                       "BH166HP" "750,000"   "19/07/2007" "01/12/2011"     5 12 "22"  96.98 0 "773,355"  
      "OLD PARK COTTAGE"       ""            "DOLMANS HILL"        "POOLE"          "DORSET"                       "BH166HP" "630,000"   "01/10/2012" "01/12/2011"     5 12 "22"  91.77 1 "686,499"  
      "OLD PARK COTTAGE"       ""            "DOLMANS HILL"        "POOLE"          "DORSET"                       "BH166HP" "677,000"   "26/08/2016" "01/12/2011"     5 12 "22" 108.94 1 "621,443"  
      "2"                      ""            "SLEPE FARM COTTAGES" "POOLE"          "DORSET"                       "BH166HS" "129,950"   "22/12/1999" "19/03/2014"   7.2 13 "23"  38.94 0 "333,719"  
      "SLEPE GREEN"            ""            "DORCHESTER ROAD"     "POOLE"          "DORSET"                       "BH166HS" "600,000"   "19/08/2003" "19/03/2014"   7.2 13 "24"  70.63 0 "849,497"  
      "SLEPE HOUSE"            ""            "DORCHESTER ROAD"     "POOLE"          "DORSET"                       "BH166HS" "900,000"   "28/07/2005" "19/03/2014"   7.2 13 "25"  82.45 0 "1,091,571"
      "NEWTON FARMHOUSE"       ""            "DORCHESTER ROAD"     "POOLE"          "DORSET"                       "BH166HS" "780,000"   "20/10/2015" "19/03/2014"   7.2 13 "26" 105.01 1 "742,786"  
      "BIRCHWOOD"              ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "183,000"   "01/11/2000" "20/12/2015"     8 14 "27"  45.59 0 "401,404"  
      "BIRCHWOOD"              ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "295,000"   "02/08/2002" "20/12/2015"     8 14 "27"  59.77 0 "493,559"  
      "BIRCHWOOD"              ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "489,000"   "21/09/2015" "20/12/2015"     8 14 "27" 103.73 0 "471,416"  
      "FROME VALE"             ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "410,000"   "31/10/2014" "20/12/2015"     8 14 "28" 101.27 0 "404,858"  
      "PRIMROSE FARMHOUSE"     ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "249,950"   "26/07/1999" "20/12/2015"     8 14 "29"  35.84 0 "697,405"  
      "REED COTTAGE"           ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "69,500"    "19/01/1999" "20/12/2015"     8 14 "30"  34.09 0 "203,872"  
      "REED COTTAGE"           ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "221,950"   "12/05/2004" "20/12/2015"     8 14 "30"  77.56 0 "286,166"  
      "REED COTTAGE"           ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "239,073"   "31/03/2011" "20/12/2015"     8 14 "30"  87.66 0 "272,728"  
      "REED COTTAGE"           ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "250,000"   "20/01/2012" "20/12/2015"     8 14 "30"  89.82 0 "278,334"  
      "SHAANTIH"               ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "195,500"   "04/04/2000" "20/12/2015"     8 14 "31"  40.65 0 "480,935"  
      "STOKEFORD COTTAGE"      ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "380,000"   "08/06/2007" "20/12/2015"     8 14 "32"  95.05 0 "399,790"  
      "STOKEFORD COTTAGE"      ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "249,950"   "20/02/2014" "20/12/2015"     8 14 "32"  94.07 0 "265,706"  
      "SUNNYDALE"              ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "110,000"   "26/02/1997" "20/12/2015"     8 14 "33"  26.84 0 "409,836"  
      "SUNNYDALE"              ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "345,000"   "14/09/2006" "20/12/2015"     8 14 "33"  88.73 0 "388,820"  
      "SUNNYDALE"              ""            ""                    "WAREHAM"        "DORSET"                       "BH206AN" "530,000"   "19/05/2016" "20/12/2015"     8 14 "33" 108.68 1 "487,670"  
      "THE OLD CHURCH"         "THE NAVE"    ""                    "WAREHAM"        "DORSET"                       "BH206AN" "115,300"   "27/09/1996" "20/12/2015"     8 14 "34"  26.38 0 "437,074"  
      "THE OLD CHURCH"         "THE TOWER"   ""                    "WAREHAM"        "DORSET"                       "BH206AN" "156,000"   "09/06/1997" "20/12/2015"     8 14 "34"  28.51 0 "547,176"  
      "THE OLD CHURCH"         "THE CHANCEL" ""                    "WAREHAM"        "DORSET"                       "BH206AN" "121,000"   "12/02/1998" "20/12/2015"     8 14 "34"  30.72 0 "393,880"  
      "THE OLD CHURCH"         "THE TOWER"   ""                    "WAREHAM"        "DORSET"                       "BH206AN" "400,000"   "27/04/2007" "20/12/2015"     8 14 "34"  92.32 0 "433,276"  
      "KENSTON"                ""            "RINGWOOD ROAD"       "WIMBORNE"       "DORSET"                       "BH216RE" "220,000"   "14/02/1997" "15/12/2013"  4.78 15 "35"  26.84 0 "819,672"  
      "ASHDEAN"                ""            "RINGWOOD ROAD"       "WIMBORNE"       "DORSET"                       "BH216RE" "262,500"   "01/10/2015" "15/12/2013"  4.78 15 "36" 105.01 1 "249,976"  
      "WICK STREET COTTAGES"   "1"           ""                    "POLEGATE"       "EAST SUSSEX"                  "BN266TN" "195,000"   "22/12/2000" "19/12/2015"  7.06 16 "37"  44.53 0 "437,907"  
      "TIMBERS"                ""            "FORD LANE"           "STEYNING"       "WEST SUSSEX"                  "BN443AT" "490,000"   "01/11/2002" "16/03/2016"   2.2 17 "38"  61.09 0 "802,095"  
      "CLAYLANDS FARM HOUSE"   ""            "FORD LANE"           "STEYNING"       "WEST SUSSEX"                  "BN443AT" "700,000"   "21/02/2007" "16/03/2016"   2.2 17 "39"  83.19 0 "841,447"  
      "THE LODGE"              ""            "HONEYBRIDGE LANE"    "STEYNING"       "WEST SUSSEX"                  "BN443AW" "72,500"    "02/12/1999" "28/03/2014"  9.18 18 "40"  37.89 0 "191,343"  
      "FURZEFIELD"             ""            "HONEYBRIDGE LANE"    "STEYNING"       "WEST SUSSEX"                  "BN443AW" "250,000"   "28/01/2011" "28/03/2014"  9.18 18 "41"     82 0 "304,878"  
      "COPPER BEECH"           ""            "UCKFIELD ROAD"       "LEWES"          "EAST SUSSEX"                  "BN85RR"  "173,000"   "27/01/1998" "16/12/2015"   4.9 19 "42"  29.78 0 "580,927"  
      "9"                      ""            "POPLAR WAY EAST"     "BRISTOL"        "CITY OF BRISTOL"              "BS110YH" "700,000"   "27/05/2015" "18/09/2016"     5 20 "43" 102.92 0 "680,140"  
      "COLESHOP FARM"          ""            "NORTHLOAD LANE"      "WEDMORE"        "SOMERSET"                     "BS284SY" "375,000"   "11/10/2002" "25/03/2017"     5 21 "44"  64.58 0 "580,675"  
      "COLESHOP FARM"          ""            "NORTHLOAD LANE"      "WEDMORE"        "SOMERSET"                     "BS284SY" "690,000"   "08/06/2007" "25/03/2017"     5 21 "44" 102.16 0 "675,411"  
      "3"                      ""            "TOWERHEAD"           "BANWELL"        "NORTH SOMERSET"               "BS296PG" "101,000"   "02/12/1997" "27/03/2015"  7.15 22 "45"  30.34 0 "332,894"  
      "3"                      ""            "TOWERHEAD"           "BANWELL"        "NORTH SOMERSET"               "BS296PG" "435,000"   "28/08/2007" "27/03/2015"  7.15 22 "45"  95.55 0 "455,259"  
      "2"                      ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "285,500"   "29/06/1999" "22/02/2016"     8 23 "46"  32.86 0 "868,837"  
      "2"                      ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "606,000"   "03/05/2013" "22/02/2016"     8 23 "46"  87.32 0 "693,999"  
      "8"                      ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "620,000"   "27/11/2006" "22/02/2016"     8 23 "47"  87.91 0 "705,267"  
      "12"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "240,000"   "08/04/1998" "22/02/2016"     8 23 "48"  30.75 0 "780,488"  
      "16"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "480,000"   "07/08/2006" "22/02/2016"     8 23 "49"  84.96 0 "564,972"  
      "20"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "335,000"   "14/08/1998" "22/02/2016"     8 23 "50"  31.24 0 "1,072,343"
      "20"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "650,000"   "22/07/2004" "22/02/2016"     8 23 "50"  76.99 0 "844,265"  
      "22"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "1,250,000" "27/08/2015" "22/02/2016"     8 23 "51" 106.69 0 "1,171,619"
      "26"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "611,550"   "29/02/2012" "22/02/2016"     8 23 "52"  86.39 0 "707,894"  
      "28"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "168,000"   "27/07/2000" "22/02/2016"     8 23 "53"  42.99 0 "390,789"  
      "30"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "165,000"   "20/07/1999" "22/02/2016"     8 23 "54"  33.41 0 "493,864"  
      "32"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "284,000"   "04/03/2005" "22/02/2016"     8 23 "55"  76.31 0 "372,166"  
      "38"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "495,000"   "30/09/2011" "22/02/2016"     8 23 "56"  85.92 0 "576,117"  
      "40"                     ""            "OVER LANE"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS324BP" "485,000"   "10/01/2000" "22/02/2016"     8 23 "57"  38.45 0 "1,261,378"
      "PARK LODGE"             ""            "OLD GLOUCESTER ROAD" "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS361RR" "129,500"   "13/09/1999" "12/02/2015"  7.25 24 "58"  35.14 0 "368,526"  
      "PARK LODGE"             ""            "OLD GLOUCESTER ROAD" "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS361RR" "406,000"   "05/10/2006" "12/02/2015"  7.25 24 "58"  86.54 0 "469,147"  
      "CHESTNUT FARM"          ""            "MAYS HILL"           "BRISTOL"        "AVON"                         "BS362NS" "247,500"   "15/08/1995" "31/03/2016"  49.6 25 "59"   28.7 0 "862,369"  
      "CHESTNUT FARM"          ""            "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "330,000"   "22/10/1999" "31/03/2016"  49.6 25 "59"  35.94 0 "918,197"  
      "CHESTNUT FARM COTTAGES" "2"           "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "300,000"   "06/03/2006" "31/03/2016"  49.6 25 "60"  80.78 0 "371,379"  
      "CHESTNUT FARM COTTAGES" "2"           "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "240,000"   "12/10/2007" "31/03/2016"  49.6 25 "60"  96.69 0 "248,216"  
      "STABLE HOUSE"           ""            "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "445,000"   "29/09/2006" "31/03/2016"  49.6 25 "61"  85.52 0 "520,346"  
      "STILLINGS BARN"         ""            "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "290,000"   "29/06/2001" "31/03/2016"  49.6 25 "62"  45.52 0 "637,083"  
      "SYCAMORE COTTAGE"       ""            "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "356,000"   "01/08/2011" "31/03/2016"  49.6 25 "63"  86.92 0 "409,572"  
      "THE BUNGALOW"           ""            "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "149,950"   "30/07/1999" "31/03/2016"  49.6 25 "64"  33.41 0 "448,818"  
      "THE BUNGALOW"           ""            "MAYS HILL"           "BRISTOL"        "SOUTH GLOUCESTERSHIRE"        "BS362NS" "427,000"   "29/06/2007" "31/03/2016"  49.6 25 "64"  94.35 0 "452,570"  
      "CHOLWELL HOUSE"         ""            "THE BATCH"           "BRISTOL"        "NORTH SOMERSET"               "BS407UY" "175,000"   "03/08/1998" "15/03/2014"    11 26 "65"  33.09 0 "528,861"  
      "CHOLWELL HOUSE"         ""            "THE BATCH"           "BRISTOL"        "NORTH SOMERSET"               "BS407UY" "168,500"   "29/02/2000" "15/03/2014"    11 26 "65"   39.2 0 "429,847"  
      "HOWGROVE COTTAGE"       ""            "THE BATCH"           "BRISTOL"        "BATH AND NORTH EAST SOMERSET" "BS407UY" "520,000"   "09/12/2009" "15/03/2014"    11 26 "66"  79.12 0 "657,230"  
      "LANE END"               ""            "THE BATCH"           "BRISTOL"        "BATH AND NORTH EAST SOMERSET" "BS407UY" "390,000"   "02/09/2003" "15/03/2014"    11 26 "67"  65.67 0 "593,878"  
      "LINDEN GROVE"           ""            "THE BATCH"           "BRISTOL"        "NORTH SOMERSET"               "BS407UY" "190,000"   "01/12/1999" "15/03/2014"    11 26 "68"  38.54 0 "492,994"  
      "SPRING COTTAGE"         ""            "THE BATCH"           "BRISTOL"        "NORTH SOMERSET"               "BS407UY" "235,000"   "22/08/1997" "15/03/2014"    11 26 "69"  29.74 0 "790,182"  
      "SPRING COTTAGE"         ""            "THE BATCH"           "BRISTOL"        "BATH AND NORTH EAST SOMERSET" "BS407UY" "545,000"   "18/08/2008" "15/03/2014"    11 26 "69"  84.28 0 "646,654"  
      end
      Last edited by sladmin; 08 Sep 2017, 08:30. Reason: anonymize original poster

      Comment


      • #4
        The problem is that in your data, the variable deflated_house_price is a string variable. You can only have numeric variables in a regression. -destring- will not like fixing this, because it also has commas. So you have to tell -destring- to ignore those:

        Code:
        destring deflated_house_price, replace ignore(",")
        xtreg deflated_house_price i.sold_after_farm_operational_dumm, fe vce(robust)
        Note: the space between i. and sold_after... doesn't seem to be getting in the way of anything at the moment, but it doesn't actually belong there and in other commands it might well trigger a syntax error. So I would get in the habit of putting these i. and c. prefixes immediately before the variables they apply to.

        Comment


        • #5
          Hi Clyde, thank you it worked perfectly. With that regression it compares all houses. However i have also created the column HOUSE_ID, where then i can compare the exact same houses being sold before and after the date the solar farm was made operational. Im not entirely sure what regression i need to run to do a repeat sales model, could you perhaps point me in the right direction.

          The results for the first regression are below, could you help me interpret this, if solar farms were a bad addition to a neighbourhood, the coefficient would be negative and positive otherwise, but I'm not sure what the size of the coefficients signify?
          Sorry for all the questions, I'm pretty new to econometrics and this is my first project. Really appreciate the guidance.

          Code:
          . xtset farm_id
                 panel variable:  farm_id (unbalanced)
          
          . destring deflated_house_price, replace ignore(",")
          deflated_house_price already numeric; no replace
          
          . xtreg deflated_house_price i.sold_after_farm_operational_dumm, fe vce(robust)
          
          Fixed-effects (within) regression               Number of obs      =      2825
          Group variable: farm_id                         Number of groups   =       337
          
          R-sq:  within  = 0.0003                         Obs per group: min =         1
                 between = 0.0031                                        avg =       8.4
                 overall = 0.0001                                        max =        81
          
                                                          F(1,336)           =      0.76
          corr(u_i, Xb)  = -0.0136                        Prob > F           =    0.3844
          
                                                            (Std. Err. adjusted for 337 clusters in farm_id)
          --------------------------------------------------------------------------------------------------
                                           |               Robust
                      deflated_house_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------------------------+----------------------------------------------------------------
          1.sold_after_farm_operational_~m |   16649.06   19115.16     0.87   0.384     -20951.4    54249.52
                                     _cons |   332984.8   2523.878   131.93   0.000     328020.2    337949.4
          ---------------------------------+----------------------------------------------------------------
                                   sigma_u |  281747.41
                                   sigma_e |  336316.31
                                       rho |  .41239282   (fraction of variance due to u_i)
          --------------------------------------------------------------------------------------------------

          Comment


          • #6
            So, first the interpretation of the output. The coefficient of any variable represents the difference in the expected value of the outcome associated with a unit change in the variable. The _cons term represents the expected value of the outcome when all other model variables are zero.

            So your _cons coefficient tells us as that the expected value of deflated house-price, in the average solar farm's region, before the farm became operational was $332,984.8. The row above that tells us that houses sold after the farm became operational sold for, on average, $16,649.06 more. (95% CI 20,951.40 less to $54,249.52 more).

            The bottom panel of the table gives the standard deviations of the farm effects and the residuals. Both are very large, of the same general magnitude as the _cons term, so you have a very heterogeneous set of house prices here. The rho of 0.41 says that of the variation not explained by the start of operations of the solar farm, 41% is specific to the solar farm, and the rest is idosyncratic to the particular sale transaction.

            As for doing a paired comparison of the same house sold before and after, you can do that with:

            Code:
            xtreg deflated_house_price i.sold_after_farm_operational_dummy i.house_id, fe vce(robust)
            If you have a very large number of houses, as I suspect is the case, you might not want all the output for the house_id indicators. You could do this with Sergio Correa's -reghdfe-, (available from SSC):

            Code:
            reghdfe deflated_house_price i.sold_after_farm_operational_dummy, absorb(farm_id house_id) vce(cluster farm_id house_id)
            help re

            Comment


            • #7
              Ok yes, i understand that, thank for taking the time to help me out Clyde!

              Im going to take natural log of dependent variable (deflated house price) so the coefficient on the "sold after dummy" is the percentage change in house prices. - This would just be creating a new column with the logged deflated house prices and running the same regression?

              I would also like to compare the fixed effects and random effects using a Hausman test. Am i right in saying i can just use the same command but replace the 'fe' option with 're' to get the random effect? And then how can i compare these two using the Huasman test.

              The final part I'm not sure on how to run is to add an additional variable which is the sold after dummy multiplied by the capacity variable, so in the regression there is a constant term, "sold-after" and "sold-after x capacity" variables. And would i need to then to a fixed effect and random effect on this model too? (And how would this model be able to determine the effect of the solar farms capacity?)

              Comment


              • #8
                Im going to take natural log of dependent variable (deflated house price) so the coefficient on the "sold after dummy" is the percentage change in house prices. - This would just be creating a new column with the logged deflated house prices and running the same regression?
                Yes, you would create a new variable (not column!) with the logged deflated house prices and run the regression with that replacing the dependent variable. By the way, it is not, strictly speaking, true that the coefficient then becomes the percentage change in house prices. That is approximately true provided the percentage change is fairly small, say not exceeding 10%. What is exactly true is that the coefficient is the expected difference in the logarithms of the price after and before solar farm activation, and the difference of logarithms is the logarithm of the ratio. So coefficient = log(price_after/price_before). Now, suppose that the price after is X% higher than the price before. So price_after = (1+X/100)*price_before. So the ratio of prices is 1+X/100, so the coefficient is log(1+X/100). Now it is a property of the log function that log(1+X/100) is approximately equal to X/100, but this is only accurate when X/100 is small. Once you get X/100 above about 0.10, X/100 and log(1+X/100) begin to diverge appreciably, and the discrepancy continues to increase as X/100 gets larger. Since, in the real world, most percentage effects are small, we usually get away with speaking sloppily and saying that the coefficient is the percentage change. And so, go ahead and say that, but be aware that it isn't quite right.

                I would also like to compare the fixed effects and random effects using a Hausman test. Am i right in saying i can just use the same command but replace the 'fe' option with 're' to get the random effect? And then how can i compare these two using the Huasman test.
                You run each regression and follow each one by storing the estimates (-help estimates store-). Then you run the -hausman- command (-help hausman-).

                The final part I'm not sure on how to run is to add an additional variable which is the sold after dummy multiplied by the capacity variable, so in the regression there is a constant term, "sold-after" and "sold-after x capacity" variables. And would i need to then to a fixed effect and random effect on this model too? (And how would this model be able to determine the effect of the solar farms capacity?)
                Really bad idea. Don't do this. It's almost certainly a badly mis-specified model. It sounds like you are trying to do a model where the sold-after effect interacts with the capacity effect. But what you have described is incorrect. You need to include not just sold-after and sold-after x capacity, but you also need to include capacity itself. So I'll assume that this is what you really want to do. You don't need to actually calculate the product variable, and, in fact, you should not do that. You should use Stata's factor variable notation. (-help fvvarlist-).

                Code:
                xtreg outcome_variable i.sold_after_farm_operational_dummy##c.capacity  perhaps_other_variables, perhaps_some_options_like_fe 
                In the above code I assume that capacity is a continuous variable (because the name sounds like that.) If it's actually discrete, then use i.capacity instead of c.capacity.

                The advantage of using factor variable notation is that then you will be able to use the -margins- command afterward to get marginal effects and predicted margins, which are really the best way to understand the implications of your model.

                Let's say that an interesting set of values for the variable capacity is 10 20 30 40 and 50, just to illustrate the use of -margins-. Then you could do things like:

                Code:
                margins i.sold_after_farm_operational_dummy, at(capacity = (10(10)50))
                margins, dydx(sold_after_farm_operational_dummy) at(capacity = (10(10)50))
                marginsplot
                margins i.sold_after_farm_operational_dummy, dydx(capacity) at(capacity = (10(10)50))
                marginsplot
                The first command gives you the predicted values of your outcome variable (deflated price, or perhaps its logarithm), for both before and after sales at each of the interesting values of capacity. The second -margins- command gives you the marginal effect of being sold after at each capacity level. The -marginsplot- command makes a nice graph of the marginal effects as functions of capacity. The third -margins- command gives you the marginal effect of capacity for both sold_before and sold_after transactions at the interesting values of capacity and, again, -marginsplot- makes a nice graph.

                I think the best introduction to the wonderful -margins- command is Richard Williams' outstanding Stata Journal article at http://www.stata-journal.com/sjpdf.h...iclenum=st0260. It covers the basics and clearly explains what it does and how it works. Once you have learned that, the more extensive presentation of all its features in the PDF manuals that come with your installation is easier to read and very helpful. Both the SJ article and the PDF manuals have many good worked examples.

                Comment


                • #9
                  Hi Clyde, thanks for the help once again, the clarification on the logged model is very insightful! too i did not know that! - Further to this, why is it beneficial to create a logged model apart from the coefficient having the effect of being the percentage change in house price (small percentages of course!)

                  And In terms of the repeat sales model, i think it would be helpful to include another variable in the regression which accounts for house price depreciation over time. How can i incorporate a variable which accounts for the time in between the same property being sold?

                  thanks you!

                  Comment


                  • #10
                    There are several reasons people use log-transformations, some of them good, some of them not so good.

                    1. Theory or graphical exploration suggests that the relationship between y and x is actually a log-linear one. This is, in my view, the best reason for doing a log transformation, and really the only one that has a solid rationale.

                    2. The dependent variable varies over many orders of magnitude, making calculations numerically unstable or difficult, so to improve the computations, the log transformed variable, which will typically have a much smaller range, is used.

                    3. To get a dependent variable whose distribution is normal, or closer to normal, than that of y itself. In my view this is a terrible reason for log-transforming and it shouldn't be done.

                    4. To get a model where the coefficient represents a semi-elasticity, as in your situation. This is reasonable to do provided one understands that the "coefficient = percent change" is an approximation, and also provided that the range of values of the y variable is fairly narrow. That last condition arises because if the true x-y relationship is linear, then log y = b0 + b1x is a mis-specified model. Now, it happens to be the case that over relatively narrow ranges of values of y, log y has an approximately linear relationship to y, and so this mis-specification is small enough that for practical purposes it can be ignored. But if y spans many orders of magnitude, the relationship between log y and y is very non-linear and the mis-specification can be quite severe and the results quite misleading.

                    I don't know how to help with your second question. You have already inflation-adjusted your price variable, so that component of price rises is taken care of. But I don't know how you would deal with the secular trend towards rising house prices. First of all, it's irregular. While over the long run, real house prices have risen, they do oscillate up and down. I know: I lost a bundle when I sold my second owned residence, but I had no choice about moving then. Worse still, there is a lot of regional variation in both the level and rate of change in house prices. So I don't think simply incorporating a variable that accounts for elapsed time will do this adequately. I'm not a real estate economist. In fact, I'm not an economist at all: I'm an epidemiologist. This is outside my domain of expertise, even outside my domain of informed-layman knowledge. I think you need to consult a real estate economist about this.

                    Comment


                    • #11
                      okk yes i was taught that the log transformation model is mostly the more interesting model to work with. Thanks for those pointers.

                      For the second question i want to run a regression using OLS which is LogPrice(After)-LogPrice(Before) = a + b*days_between_sales.
                      I already have the HOUSE_ID variable to distinguish individual properties, but struggling to find a way (a command) to produce the days_between_sales variable and the LogPrice(after)-LogPrice(before) dependent variable.

                      Comment


                      • #12
                        I assume you have sale date variable, which is a Stata internal format numeric date variable. We'll call it sales_date for purposes of illustration

                        Code:
                        //  KEEP ONLY HOUSES THAT HAVE BOTH A BEFORE & AFTER SALE, AND NO OTHER SALES
                        by house_id (sold_after_farm_operational), sort: keep if sold_after_farm_operational[_N] == 1 & sold_after_farm_operational[1] == 0 & _N == 2
                        
                        // CALCULATE BEFORE - AFTER DIFFERENCES
                        by house_id (sold_after_farm_operational): gen delta_log_price = log(deflated_house_price[_N]/deflated_house_price[1])
                        by house_id (sold_after_farm_operational): gen days_between_sales = sales_date[_N] - sales_date[1]
                        Note: For delta_log_price I rely on the fact that the difference of logarithms is the logarithm of the ratio.

                        Comment


                        • #13

                          Code:
                                         ---- Coefficients ----
                                       |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                                       |       fe           re         Difference          S.E.
                          -------------+----------------------------------------------------------------
                          1.sold_aft~m |    16649.06      18687.5       -2038.443        5044.227
                          ------------------------------------------------------------------------------
                                                     b = consistent under Ho and Ha; obtained from xtreg
                                      B = inconsistent under Ha, efficient under Ho; obtained from xtreg
                          
                              Test:  Ho:  difference in coefficients not systematic
                          
                                            chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                                    =        0.16
                                          Prob>chi2 =      0.6861

                          Im not entirely sure how to interpret the data, prob chi2 being large mean to accept null and the random effect is more efficient?


                          Given the approx assumption for the logged model providing the house price percentage change, and previously finding the random effects model is preferred would it make sense just having to run the random effects for the logged model?
                          Results for both FE and RE below

                          Code:
                          . xtreg deflatedHP_log i.sold_after_farm_operational_dumm, fe
                          
                          Fixed-effects (within) regression               Number of obs      =      2825
                          Group variable: farm_id                         Number of groups   =       337
                          
                          R-sq:  within  = 0.0018                         Obs per group: min =         1
                                 between = 0.0002                                        avg =       8.4
                                 overall = 0.0022                                        max =        81
                          
                                                                          F(1,2487)          =      4.45
                          corr(u_i, Xb)  = 0.0258                         Prob > F           =    0.0350
                          
                          --------------------------------------------------------------------------------------------------
                                            deflatedHP_log |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          ---------------------------------+----------------------------------------------------------------
                          1.sold_after_farm_operational_~m |  -.0525253   .0248944    -2.11   0.035    -.1013412   -.0037093
                                                     _cons |   12.47881    .008407  1484.33   0.000     12.46232    12.49529
                          ---------------------------------+----------------------------------------------------------------
                                                   sigma_u |   .6216963
                                                   sigma_e |  .41127211
                                                       rho |  .69559176   (fraction of variance due to u_i)
                          --------------------------------------------------------------------------------------------------
                          F test that all u_i=0:     F(336, 2487) =    17.86           Prob > F = 0.0000
                          Code:
                          . xtreg deflatedHP_log i.sold_after_farm_operational_dumm, re
                          
                          Random-effects GLS regression                   Number of obs      =      2825
                          Group variable: farm_id                         Number of groups   =       337
                          
                          R-sq:  within  = 0.0018                         Obs per group: min =         1
                                 between = 0.0002                                        avg =       8.4
                                 overall = 0.0022                                        max =        81
                          
                                                                          Wald chi2(1)       =      4.59
                          corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0322
                          
                          --------------------------------------------------------------------------------------------------
                                            deflatedHP_log |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          ---------------------------------+----------------------------------------------------------------
                          1.sold_after_farm_operational_~m |  -.0526362   .0245691    -2.14   0.032    -.1007909   -.0044816
                                                     _cons |   12.66766   .0336783   376.14   0.000     12.60165    12.73367
                          ---------------------------------+----------------------------------------------------------------
                                                   sigma_u |  .57039422
                                                   sigma_e |  .41127211
                                                       rho |  .65794399   (fraction of variance due to u_i)
                          --------------------------------------------------------------------------------------------------
                          .
                          Last edited by sladmin; 26 Aug 2017, 07:30.

                          Comment


                          • #14
                            Finally, how can i tell if the date of each house sale transaction is stored in state as a date? At the moment they were stored as dates in excel in the format dd/mm/yyyy, how can i make this readable for STATA?
                            Thank you!
                            Last edited by sladmin; 26 Aug 2017, 07:45.

                            Comment


                            • #15
                              Your Hausman test results mean what you think they mean: it finds that the -re- and -fe- results agree sufficiently that the -re- model will have consistent estimates and also be more efficient. So you can use it.

                              But that doesn't mean that this will still be true if you log transform the outcome variable. You have to run the Hausman test using the actual model that you want to estimate. So re-do Hausman using the log-transformed outcome. You may or may not get the same answer (though looking at the outputs you show, I'm guessing it will say -re- is OK). If you do, then you can use the -re- estimates of the model with the log-transformed outcome.

                              Finally, how can i tell if the date of each house sale transaction is stored in state as a date? At the moment they were stored as dates in excel in the format dd/mm/yyyy, how can i make this readable for STATA?
                              Well, when you import data from Excel, you never know what you will get in Stata. If the Excel spreadsheet itself was created in a disciplined way, and all of the cells in the date-variable columns contain valid dates, then Stata's -import excel- command will bring those in as Stata internal format date variables. But if some of the "dates" in the spreadsheet are invalid (say, April 31, 2017), then you probably are getting a string variable in Stata. Similarly if the column has other non-date information contaminating the column. The way to tell what you have in Stata is to -describe- the variable in Stata. If it is a Stata date variable the storage type will be numeric (usually int, but possibly long, float, or double). That's a start, but not the end of the story, because it might contain numbers like 20170826, which look like dates to us but are not the correct numbers for the corresponding Stata dates. So the next step is to look at the display format. For a properly imported date variable, the display format will be %td... (what follows the %td varies, the key is that the format begin with %td). Then -list- or -browse- the date variable and see that what appears on your screen looks to your eyes like correct dates. If it passes those three tests, it's a Stata internal format date variable. If your variable's display format does not begin with %td, run -format name_of_date_variable %td- and then -list- or -browse- to see if it looks like correct dates. If so, you have a proper Stata internal format date variable.

                              Note: Stata also has internal clock, half-year, quarter, and month variables. In those situations, the display format will be %tc ,%th, %tq, and %tm, respectively. Also, for a clock (%tc) variable, the storage type must be double for the data to be correct.

                              Comment

                              Working...
                              X