Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DID - right choice of regression?

    Dear Stata community,

    I want to estimate the effect of reforms on twelve EU countries on the net fund flows on investment funds. Each fund is matched to a another with respect to its country and performance and has been labeled as either cheap, equally expensive or expensive according to the costs of itself and its respective counterpart.
    As already widely discussed in other topics, I created a dummy for time and treatment with this code:

    Code:
    generate d_time = 0 if date < date("20071101","YMD") & date > date("20051101","YMD")
    replace d_time = 1 if date > date("20071101","YMD") & date < date("20091101","YMD")
    
    generate d_treat_all = 0 & date < date("20091101","YMD")
    replace d_treat_all = 1 if country <10 & date < date("20091101","YMD")
    replace d_treat_all = 1 if country == 11 & date < date("20091101","YMD")
    My aim is to show that more expensive funds had lower net fund flows than cheap funds after being treated. Unfortunately I am unsure about which model and how to get the desired results.

    I used this code:
    Code:
    xtset id
    xtset id date
    xtreg netfundflow i.costs d_time##d_treat, cluster(id)
    1 Question) Even though I have three different values for costs, my regression gives me only two values. Do you have any idea, why this could be the case?

    HTML Code:
      	 		 			Linear regression 			  			Number of obs = 35916 		 		 			  			F( 5, 1446) = 7.42 		 		 			  			Prob > F = 0.0000 		 		 			  			R-squared = 0.0006 		 		 			  			Root MSE = 548.99 		 		 			  			(Std. 			Err. adjusted for 1447 clusters in id) 		 		 			  			  		 		 			  			Robust 		 		 			netfundflow 			Coef. 			Std. Err. 			t P>t [95% Conf. Interval] 		 		 			  			  		 		 			costs 		 		 			High 			-16.27294 			5.539093 			-2.94 0.003 -27.13846 -5.407421 		 		 			Equal 			-6.636716 			6.54486 			-1.01 0.311 -19.47515 6.201721 		 		 			  		 		 			1.d_time 			33.29411 			12.3204 			2.70 0.007 9.126352 57.46187 		 		 			1.d_treat 			6.112733 			5.944947 			1.03 0.304 -5.548909 17.77438 		 		 			  		 		 			d_time#d_treat 		 		 			1 1 			-27.25072 			12.22966 			-2.23 0.026 -51.2405 -3.260937 		 		 			  		 		 			_cons 			5.901751 			5.675747 			1.04 0.299 -5.231828 17.03533
    2) I thought about doing the regression for each costs level, which than can be compared, thus I would circumvent this problem. I used:
    Code:
    bysort costs: xtreg netfundflow i.costs d_time##d_treat, cluster(id)
    Is there any other possibilty to get this in one regression?

    However I am not sure if my regression is the right choice for my problem. The Breusch/Pagan test for random effects suggests a FE Model, aswell as the Hausman Test.
    But using
    Code:
     xtreg netfundflow i.costs d_time##d_treat, cluster(id) fe
    seems to misinterpret my data, since I don't want to rely only on individual fund variation of netfundflows, as I assume that variable costs has a major influence on netfundflows.

    I therefore thought of the regress comand to control for the fe of costs but not on the individual level. I tried the following code:
    Code:
    reghdfe netfundflow i.costs i.d_treat##i.d_time, absorb(id) vce(cluster id costs)
    I thought that this should absorb the fix effects on the individual level, but adds fixed effects for the costs variable? Am I right?

    Since I get this results, I think I might have been wrong with this code.
    HTML Code:
      	 		 			(dropped 66 singleton observations) 		 		 			(converged in 1 iterations) 		 		 			note: 1.d_treat omitted because of collinearity 		 		 			Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller 			applied. 		 		 			WARNING: Missing F statistic (dropped variables due to collinearity or too few clusters). 		 		 			HDFE Linear regression Number of obs = 35,850 		 		 			Absorbing 1 HDFE group F( 3, 2) = . 		 		 			Statistics robust to heteroskedasticity Prob > F = . 		 		 			R-squared = 0.0137 		 		 			Adj R-squared = -0.0260 		 		 			Number of clusters (id) = 1,381 Within R-sq. = 0.0009 		 		 			Number of clusters (costs) = 3 Root MSE = 556.7085 		 		 			(Std. Err. adjusted for 3 clusters in id costs) 		 		 			Robust 		 		 			netfundflow Coef. Std. Err. t P>t [95% Conf. Interval] 		 		 			costs 		 		 			High -61.58262 49.06534 -1.26 0.336 -272.6937 149.5285 		 		 			Equal -80.7145 58.56219 -1.38 0.302 -332.6873 171.2583 		 		 			  		 		 			1.d_treat 0 (empty) 		 		 			1.d_time 45.10881 25.86124 1.74 0.223 -66.16312 156.3807 		 		 			  		 		 			d_treat#d_time 		 		 			1 1 -38.66821 21.77473 -1.78 0.218 -132.3573 55.02087 		 		 			  		 		 			Absorbed degrees of freedom: 		 		 			Absorbed FE Num. Coefs. = Categories - Redundant 		 		 			id 0 1381 1381 * 		 		 			* = fixed effect nested within cluster; treated as redundant for DoF computation
    Any comment would be highly appreciated

    Best regards
    Nils

  • #2
    As the outputs are not shown correctly:

    Output 1)

    Random-effects GLS regression Number of obs = 35916
    Group variable: id Number of groups = 1447

    R-sq: within = 0.0007 Obs per group: min = 1
    between = 0.0000 avg = 24.8
    overall = 0.0006 max = 48

    Wald chi2(5) = 37.11
    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

    (Std. Err. adjusted for 1447 clusters in id)
    --------------------------------------------------------------------------------
    | Robust
    netfundflow | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    costs |
    High | -16.27294 5.539093 -2.94 0.003 -27.12936 -5.416516
    Equal | -6.636716 6.54486 -1.01 0.311 -19.46441 6.190975
    |
    1.d_time | 33.29411 12.3204 2.70 0.007 9.146582 57.44164
    1.d_treat | 6.112733 5.944947 1.03 0.304 -5.539148 17.76461
    |
    d_time#d_treat |
    1 1 | -27.25072 12.22966 -2.23 0.026 -51.22042 -3.281017
    |
    _cons | 5.901751 5.675747 1.04 0.298 -5.222509 17.02601
    ---------------+----------------------------------------------------------------
    sigma_u | 0
    sigma_e | 556.70848
    rho | 0 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------


    Output 2 of
    reghdfe netfundflow i.costs i.d_treat##i.d_time, absorb(id) vce(cluster id costs)

    (dropped 66 singleton observations)
    (converged in 1 iterations)
    note: 1.d_treat omitted because of collinearity
    Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
    WARNING: Missing F statistic (dropped variables due to collinearity or too few clusters).

    HDFE Linear regression Number of obs = 35,850
    Absorbing 1 HDFE group F( 3, 2) = .
    Statistics robust to heteroskedasticity Prob > F = .
    R-squared = 0.0137
    Adj R-squared = -0.0260
    Number of clusters (id) = 1,381 Within R-sq. = 0.0009
    Number of clusters (costs) = 3 Root MSE = 556.7085

    (Std. Err. adjusted for 3 clusters in id costs)
    --------------------------------------------------------------------------------
    | Robust
    netfundflow | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    costs |
    High | -61.58262 49.06534 -1.26 0.336 -272.6937 149.5285
    Equal | -80.7145 58.56219 -1.38 0.302 -332.6873 171.2583
    |
    1.d_treat | 0 (empty)
    1.d_time | 45.10881 25.86124 1.74 0.223 -66.16312 156.3807
    |
    d_treat#d_time |
    1 1 | -38.66821 21.77473 -1.78 0.218 -132.3573 55.02087
    --------------------------------------------------------------------------------

    Absorbed degrees of freedom:
    -----------------------------------------------------------------+
    Absorbed FE | Num. Coefs. = Categories - Redundant |
    ---------------+-------------------------------------------------|
    id | 0 1381 1381 * |
    -----------------------------------------------------------------+
    * = fixed effect nested within cluster; treated as redundant for DoF computation
    Last edited by Nils Haccius; 16 Jan 2018, 16:59.

    Comment


    • #3
      Sorry for not just editing the post. I did forget about this feature.
      Any comments are highly welcome
      Last edited by Nils Haccius; 16 Jan 2018, 17:00.

      Comment


      • #4
        Originally posted by Nils Haccius View Post
        ...

        1 Question) Even though I have three different values for costs, my regression gives me only two values. Do you have any idea, why this could be the case?

        HTML Code:
         Linear regression Number of obs = 35916 F( 5, 1446) = 7.42 Prob > F = 0.0000 R-squared = 0.0006 Root MSE = 548.99 (Std. Err. adjusted for 1447 clusters in id) Robust netfundflow Coef. Std. Err. t P>t [95% Conf. Interval] costs High -16.27294 5.539093 -2.94 0.003 -27.13846 -5.407421 Equal -6.636716 6.54486 -1.01 0.311 -19.47515 6.201721 1.d_time 33.29411 12.3204 2.70 0.007 9.126352 57.46187 1.d_treat 6.112733 5.944947 1.03 0.304 -5.548909 17.77438 d_time#d_treat 1 1 -27.25072 12.22966 -2.23 0.026 -51.2405 -3.260937 _cons 5.901751 5.675747 1.04 0.299 -5.231828 17.03533
        ...
        This is expected. Stata is treating fund type as categorical. You're getting results relative to the base level, which Stata thinks is cheap funds. I think is based on the lowest number as you coded that category, but you can respecify a different base level if you want, e.g.

        Code:
         
         xtreg netfundflow ib1.costs d_time##d_treat, cluster(id)
        Just change the number in the bolded part to whatever corresponds to the base category you want.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Thanks a lot. That will help me!

          Comment

          Working...
          X