Using STATA 18 and I am trying to capture variation in the location, (London wards), time fixed effects for property prices in London, using a hedonic pricing model.
This is using property price data with dummies for property characteristics.
I am trying to capture that using reghdfe and create two groups, wards with high/low residents born of foreign countries', with the idea of taking the difference in the residuals between those two groups, in each month.
Problem is that when I create this price difference variable (pdiff_) for each country combination, STATA does not take the difference between the two groups but between each observation. An observation (transaction) cannot be in a ward that has both high pop. of resident from country X and low at the same time - it is one or the other. Hence there will be an empty value when taking the difference.
Can someone please point me out on how I can group these categories so that STATA does not try and take the difference at the individual observation level?
Many thanks
This is using property price data with dummies for property characteristics.
I am trying to capture that using reghdfe and create two groups, wards with high/low residents born of foreign countries', with the idea of taking the difference in the residuals between those two groups, in each month.
Problem is that when I create this price difference variable (pdiff_) for each country combination, STATA does not take the difference between the two groups but between each observation. An observation (transaction) cannot be in a ward that has both high pop. of resident from country X and low at the same time - it is one or the other. Hence there will be an empty value when taking the difference.
Can someone please point me out on how I can group these categories so that STATA does not try and take the difference at the individual observation level?
Many thanks
Code:
encode stata_date, generate(stata_date1) encode ward_code, generate(ward_num) *Calculating ward fixed effects and time effects . reghdfe ln_price_paid propType_* oldNew_* duration_*, absorb(ward_fe=ward_num time_fe=stata_date1) cluster(ward_num) residuals(resid) (MWFE estimator converged in 5 iterations) note: propType_5 omitted because of collinearity note: oldNew_2 omitted because of collinearity note: duration_2 omitted because of collinearity HDFE Linear regression Number of obs = 1,172,017 Absorbing 2 HDFE groups F( 6, 679) = 1301.66 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.4221 Adj R-squared = 0.4217 Within R-sq. = 0.2309 Number of clusters (ward_num) = 680 Root MSE = 0.6098 (Std. err. adjusted for 680 clusters in ward_num) ------------------------------------------------------------------------------ | Robust ln_price_p~d | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- propType_1 | .4683893 .0138025 33.94 0.000 .4412886 .49549 propType_2 | .1664204 .0305239 5.45 0.000 .1064879 .2263529 propType_3 | -.5524648 .0416714 -13.26 0.000 -.634285 -.4706446 propType_4 | .1341584 .0059325 22.61 0.000 .1225101 .1458066 propType_5 | 0 (omitted) oldNew_1 | -.1989015 .0101902 -19.52 0.000 -.2189096 -.1788934 oldNew_2 | 0 (omitted) duration_1 | .7961511 .0327772 24.29 0.000 .7317943 .860508 duration_2 | 0 (omitted) _cons | 12.75263 .0318903 399.89 0.000 12.69001 12.81525 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| ward_num | 680 680 0 *| stata_date1 | 124 1 123 | -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation . . predict residuals, xbd
Code:
*Looping through countries to calculate the price difference based on residuals foreach country in country1 country2 { *Categorising wards into quantiles based on the number of residents from each country. *country variables represent the residents demography in each ward. by stata_date1: egen quantile_`country' = xtile(`country'), nquantiles(5) *Calculating the mean residual for wards in the highest and lowest quantile. by stata_date1: egen high_resid_`country' = mean(resid) if quantile_`country' == 5 by stata_date1: egen low_resid_`country' = mean(resid) if quantile_`country' == 1 *Calculating the price difference for each time period within the same country and date. by stata_date1: gen pdiff_`country' = high_resid_`country' - low_resid_`country' . (940,103 missing values generated) (933,254 missing values generated) (1,172,017 missing values generated) (939,203 missing values generated) (928,913 missing values generated) (1,172,017 missing values generated) }
Comment