Dear Statalist users,
I'm currently working on my master's thesis and did compute a fixed effects-model using Stata. As this is my first time working with longitudinal data, there are a few things that confuse me and I'd appreciate help / a shove towards the right direction. A little bit of background: I'm exploring the effect of structural positions in a company network on the success of said companies. For that, I surveyed 43 companies and how they are interlocked over the time span of 15 years, giving me 180 (monthly) time points. My independent variables are (1) local network measures, (2) global network measures and (3) the general behaviour of the German equity index whereas the dependent variable is the equity price.
My current output looks like this (the number of 38 IDs instead of 43 is due to missing values):
I already tested for heteroskedasticity using the xttest3 command (as suggested in this overview: https://www.princeton.edu/~otorres/Panel101.pdf), hence the robust standard errors - this should be correct? I also checked if FE-models are the correct choice using the Hausman-test.
First off, I am very happy that at least parts of my model are significant, even though they were more so before the robust standard errors - but I guess I'll have to live with that. But I do have a few concerns:
1. The surveyed companies are, in my case, not independent from each other. They are active in the same "field" and the independent variables have been specifically computed using the way they interfere with each other. If I'm correct, this should be a pretty big violation of my regression's assumptions. I've stumbled across the term "permutation" to deal with this, but so far failed to understand if this is suitable, let alone how to do it in Stata. Is this a good approach and can someone explain this in rather simple terms?
2. Are there any goodness of fit-measures I should definitely not miss? I am currently only aware of the test for heteroskedasticity and the Hausman-test, following both a (beginners) book about panel regression and the presentation by Princeton (linked above).
3. I've read that areg and xtreg will give me different R-sq values (https://www.stata.com/support/faqs/s...sus-xtreg-fe/#) and that the one from areg is preferable. Am I correct that it would be best to report both in my thesis?
4. What worries me the most is that my dependent variable is very skewed. This is both apparent graphically as well as obvious in numeric measures (such as sktest). While this is not a formal assumption of the FE model, I've stumbled upon a lot of posts where people voice concerns about this. By looking at it, it almost resembles a poisson distribution (which I am not familiar with too, unfortunately). Would it make sense to work with something like the xtgee-command instead of xtreg or am I on the wrong track here? Is there any wise way to deal with the skewed distribution?
Thanks for reading this far and apologies if some of my questions appear to be too simple, it just feels like it is a little over my head right now and since surveying and working with my data has taken up quite some time so far I want to make sure I do this final step correct. I am already super excited that my research hypothesis' seem to hold some ground, judging by the results. :-)
Best regards
I'm currently working on my master's thesis and did compute a fixed effects-model using Stata. As this is my first time working with longitudinal data, there are a few things that confuse me and I'd appreciate help / a shove towards the right direction. A little bit of background: I'm exploring the effect of structural positions in a company network on the success of said companies. For that, I surveyed 43 companies and how they are interlocked over the time span of 15 years, giving me 180 (monthly) time points. My independent variables are (1) local network measures, (2) global network measures and (3) the general behaviour of the German equity index whereas the dependent variable is the equity price.
My current output looks like this (the number of 38 IDs instead of 43 is due to missing values):
Code:
xtreg price dax indegree outdegree closeness constraint centralization density, fe robust Fixed-effects (within) regression Number of obs = 5537 Group variable: id Number of groups = 38 R-sq: within = 0.0492 Obs per group: min = 3 between = 0.0581 avg = 145.7 overall = 0.0431 max = 180 F(7,37) = 4.16 corr(u_i, Xb) = 0.0356 Prob > F = 0.0018 (Std. Err. adjusted for 38 clusters in id) -------------------------------------------------------------------------------- | Robust price | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- dax | 1.372309 .7611821 1.80 0.080 -.1699926 2.91461 indegree | 1.091147 2.588484 0.42 0.676 -4.15362 6.335913 outdegree | .8330976 .844596 0.99 0.330 -.8782164 2.544412 closeness | 12689.14 7250.301 1.75 0.088 -2001.366 27379.65 constraint | 15.38289 12.43083 1.24 0.224 -9.804358 40.57013 centralization | 147.7793 80.39707 1.84 0.074 -15.12063 310.6792 density | -335.4107 445.2656 -0.75 0.456 -1237.604 566.783 _cons | 18.54576 23.84009 0.78 0.442 -29.75884 66.85036 ---------------+---------------------------------------------------------------- sigma_u | 65.845741 sigma_e | 37.70635 rho | .75305497 (fraction of variance due to u_i) --------------------------------------------------------------------------------
First off, I am very happy that at least parts of my model are significant, even though they were more so before the robust standard errors - but I guess I'll have to live with that. But I do have a few concerns:
1. The surveyed companies are, in my case, not independent from each other. They are active in the same "field" and the independent variables have been specifically computed using the way they interfere with each other. If I'm correct, this should be a pretty big violation of my regression's assumptions. I've stumbled across the term "permutation" to deal with this, but so far failed to understand if this is suitable, let alone how to do it in Stata. Is this a good approach and can someone explain this in rather simple terms?
2. Are there any goodness of fit-measures I should definitely not miss? I am currently only aware of the test for heteroskedasticity and the Hausman-test, following both a (beginners) book about panel regression and the presentation by Princeton (linked above).
3. I've read that areg and xtreg will give me different R-sq values (https://www.stata.com/support/faqs/s...sus-xtreg-fe/#) and that the one from areg is preferable. Am I correct that it would be best to report both in my thesis?
4. What worries me the most is that my dependent variable is very skewed. This is both apparent graphically as well as obvious in numeric measures (such as sktest). While this is not a formal assumption of the FE model, I've stumbled upon a lot of posts where people voice concerns about this. By looking at it, it almost resembles a poisson distribution (which I am not familiar with too, unfortunately). Would it make sense to work with something like the xtgee-command instead of xtreg or am I on the wrong track here? Is there any wise way to deal with the skewed distribution?
Thanks for reading this far and apologies if some of my questions appear to be too simple, it just feels like it is a little over my head right now and since surveying and working with my data has taken up quite some time so far I want to make sure I do this final step correct. I am already super excited that my research hypothesis' seem to hold some ground, judging by the results. :-)
Best regards
Comment