DID - right choice of regression?

Nils Haccius

Join Date: Nov 2017
Posts: 12

DID - right choice of regression?

16 Jan 2018, 16:51

Dear Stata community,

I want to estimate the effect of reforms on twelve EU countries on the net fund flows on investment funds. Each fund is matched to a another with respect to its country and performance and has been labeled as either cheap, equally expensive or expensive according to the costs of itself and its respective counterpart.
As already widely discussed in other topics, I created a dummy for time and treatment with this code:

Code:

generate d_time = 0 if date < date("20071101","YMD") & date > date("20051101","YMD")
replace d_time = 1 if date > date("20071101","YMD") & date < date("20091101","YMD")

generate d_treat_all = 0 & date < date("20091101","YMD")
replace d_treat_all = 1 if country <10 & date < date("20091101","YMD")
replace d_treat_all = 1 if country == 11 & date < date("20091101","YMD")

My aim is to show that more expensive funds had lower net fund flows than cheap funds after being treated. Unfortunately I am unsure about which model and how to get the desired results.

I used this code:

Code:

xtset id
xtset id date
xtreg netfundflow i.costs d_time##d_treat, cluster(id)

1 Question) Even though I have three different values for costs, my regression gives me only two values. Do you have any idea, why this could be the case?

HTML Code:

  	 		 			Linear regression 			  			Number of obs = 35916 		 		 			  			F( 5, 1446) = 7.42 		 		 			  			Prob > F = 0.0000 		 		 			  			R-squared = 0.0006 		 		 			  			Root MSE = 548.99 		 		 			  			(Std. 			Err. adjusted for 1447 clusters in id) 		 		 			  			  		 		 			  			Robust 		 		 			netfundflow 			Coef. 			Std. Err. 			t P>t [95% Conf. Interval] 		 		 			  			  		 		 			costs 		 		 			High 			-16.27294 			5.539093 			-2.94 0.003 -27.13846 -5.407421 		 		 			Equal 			-6.636716 			6.54486 			-1.01 0.311 -19.47515 6.201721 		 		 			  		 		 			1.d_time 			33.29411 			12.3204 			2.70 0.007 9.126352 57.46187 		 		 			1.d_treat 			6.112733 			5.944947 			1.03 0.304 -5.548909 17.77438 		 		 			  		 		 			d_time#d_treat 		 		 			1 1 			-27.25072 			12.22966 			-2.23 0.026 -51.2405 -3.260937 		 		 			  		 		 			_cons 			5.901751 			5.675747 			1.04 0.299 -5.231828 17.03533

2) I thought about doing the regression for each costs level, which than can be compared, thus I would circumvent this problem. I used:

Code:

bysort costs: xtreg netfundflow i.costs d_time##d_treat, cluster(id)

Is there any other possibilty to get this in one regression?

However I am not sure if my regression is the right choice for my problem. The Breusch/Pagan test for random effects suggests a FE Model, aswell as the Hausman Test.
But using

Code:

 xtreg netfundflow i.costs d_time##d_treat, cluster(id) fe

seems to misinterpret my data, since I don't want to rely only on individual fund variation of netfundflows, as I assume that variable costs has a major influence on netfundflows.

I therefore thought of the regress comand to control for the fe of costs but not on the individual level. I tried the following code:

Code:

reghdfe netfundflow i.costs i.d_treat##i.d_time, absorb(id) vce(cluster id costs)

I thought that this should absorb the fix effects on the individual level, but adds fixed effects for the costs variable? Am I right?

Since I get this results, I think I might have been wrong with this code.

HTML Code:

  	 		 			(dropped 66 singleton observations) 		 		 			(converged in 1 iterations) 		 		 			note: 1.d_treat omitted because of collinearity 		 		 			Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller 			applied. 		 		 			WARNING: Missing F statistic (dropped variables due to collinearity or too few clusters). 		 		 			HDFE Linear regression Number of obs = 35,850 		 		 			Absorbing 1 HDFE group F( 3, 2) = . 		 		 			Statistics robust to heteroskedasticity Prob > F = . 		 		 			R-squared = 0.0137 		 		 			Adj R-squared = -0.0260 		 		 			Number of clusters (id) = 1,381 Within R-sq. = 0.0009 		 		 			Number of clusters (costs) = 3 Root MSE = 556.7085 		 		 			(Std. Err. adjusted for 3 clusters in id costs) 		 		 			Robust 		 		 			netfundflow Coef. Std. Err. t P>t [95% Conf. Interval] 		 		 			costs 		 		 			High -61.58262 49.06534 -1.26 0.336 -272.6937 149.5285 		 		 			Equal -80.7145 58.56219 -1.38 0.302 -332.6873 171.2583 		 		 			  		 		 			1.d_treat 0 (empty) 		 		 			1.d_time 45.10881 25.86124 1.74 0.223 -66.16312 156.3807 		 		 			  		 		 			d_treat#d_time 		 		 			1 1 -38.66821 21.77473 -1.78 0.218 -132.3573 55.02087 		 		 			  		 		 			Absorbed degrees of freedom: 		 		 			Absorbed FE Num. Coefs. = Categories - Redundant 		 		 			id 0 1381 1381 * 		 		 			* = fixed effect nested within cluster; treated as redundant for DoF computation

Any comment would be highly appreciated

Best regards
Nils

Tags: None

Nils Haccius

Join Date: Nov 2017

Posts: 12
#2

16 Jan 2018, 16:54

As the outputs are not shown correctly:

Output 1)

Random-effects GLS regression Number of obs = 35916
Group variable: id Number of groups = 1447

R-sq: within = 0.0007 Obs per group: min = 1
between = 0.0000 avg = 24.8
overall = 0.0006 max = 48

Wald chi2(5) = 37.11
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. Err. adjusted for 1447 clusters in id)
--------------------------------------------------------------------------------
| Robust
netfundflow | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
costs |
High | -16.27294 5.539093 -2.94 0.003 -27.12936 -5.416516
Equal | -6.636716 6.54486 -1.01 0.311 -19.46441 6.190975
|
1.d_time | 33.29411 12.3204 2.70 0.007 9.146582 57.44164
1.d_treat | 6.112733 5.944947 1.03 0.304 -5.539148 17.76461
|
d_time#d_treat |
1 1 | -27.25072 12.22966 -2.23 0.026 -51.22042 -3.281017
|
_cons | 5.901751 5.675747 1.04 0.298 -5.222509 17.02601
---------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | 556.70848
rho | 0 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

Output 2 of
reghdfe netfundflow i.costs i.d_treat##i.d_time, absorb(id) vce(cluster id costs)

(dropped 66 singleton observations)
(converged in 1 iterations)
note: 1.d_treat omitted because of collinearity
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
WARNING: Missing F statistic (dropped variables due to collinearity or too few clusters).

HDFE Linear regression Number of obs = 35,850
Absorbing 1 HDFE group F( 3, 2) = .
Statistics robust to heteroskedasticity Prob > F = .
R-squared = 0.0137
Adj R-squared = -0.0260
Number of clusters (id) = 1,381 Within R-sq. = 0.0009
Number of clusters (costs) = 3 Root MSE = 556.7085

(Std. Err. adjusted for 3 clusters in id costs)
--------------------------------------------------------------------------------
| Robust
netfundflow | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
costs |
High | -61.58262 49.06534 -1.26 0.336 -272.6937 149.5285
Equal | -80.7145 58.56219 -1.38 0.302 -332.6873 171.2583
|
1.d_treat | 0 (empty)
1.d_time | 45.10881 25.86124 1.74 0.223 -66.16312 156.3807
|
d_treat#d_time |
1 1 | -38.66821 21.77473 -1.78 0.218 -132.3573 55.02087
--------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------------------+
Absorbed FE | Num. Coefs. = Categories - Redundant |
---------------+-------------------------------------------------|
id | 0 1381 1381 * |
-----------------------------------------------------------------+
* = fixed effect nested within cluster; treated as redundant for DoF computation

Last edited by Nils Haccius; 16 Jan 2018, 16:59.
Comment
Nils Haccius

Join Date: Nov 2017

Posts: 12
#3

16 Jan 2018, 16:55

Sorry for not just editing the post. I did forget about this feature.
Any comments are highly welcome

Last edited by Nils Haccius; 16 Jan 2018, 17:00.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

17 Jan 2018, 08:06

Originally posted by Nils Haccius View Post

...

1 Question) Even though I have three different values for costs, my regression gives me only two values. Do you have any idea, why this could be the case?

HTML Code:

Linear regression Number of obs = 35916 F( 5, 1446) = 7.42 Prob > F = 0.0000 R-squared = 0.0006 Root MSE = 548.99 (Std. Err. adjusted for 1447 clusters in id) Robust netfundflow Coef. Std. Err. t P>t [95% Conf. Interval] costs High -16.27294 5.539093 -2.94 0.003 -27.13846 -5.407421 Equal -6.636716 6.54486 -1.01 0.311 -19.47515 6.201721 1.d_time 33.29411 12.3204 2.70 0.007 9.126352 57.46187 1.d_treat 6.112733 5.944947 1.03 0.304 -5.548909 17.77438 d_time#d_treat 1 1 -27.25072 12.22966 -2.23 0.026 -51.2405 -3.260937 _cons 5.901751 5.675747 1.04 0.299 -5.231828 17.03533

...

This is expected. Stata is treating fund type as categorical. You're getting results relative to the base level, which Stata thinks is cheap funds. I think is based on the lowest number as you coded that category, but you can respecify a different base level if you want, e.g.

Code:

xtreg netfundflow ib1.costs d_time##d_treat, cluster(id)

Just change the number in the bolded part to whatever corresponds to the base category you want.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Nils Haccius

Join Date: Nov 2017

Posts: 12
#5

17 Jan 2018, 11:01

Thanks a lot. That will help me!
Comment

Announcement

DID - right choice of regression?

Comment

Comment

Comment

Comment