Dear all,
my name is Marc and I am currently working on a panel data model with stata. I think I got rather simple questions (in bold) for experienced stata-users, even though I couldn't find answers in previous posts - so hopefully you can help me with the below
.
I am using panel data with approx. 1500-2500 observations (depending on which independent variable I use) and approx. 220 entities (different banks). The dependent variable is metric, the independent and the eight control variables as well. For example variables show financial ratios or total assets of the entities. For preparing the sample I eliminated outliers by winsorising (1% & 99%) the respective variables and transformed four of the control variables (with log, square and 1/cubic) - btw is there any rule of thumb as from which values of kurtosis and skewness you should transform a variable? In all the models stated below I am using lagged independent and control variables and additionally to the dependent variable the lagged dependent variable (all L1.): DV=L1.IV+L1.controls+L1.DV
Next, I am setting up a random effects model and running the Breusch-Pagan test for random effects (xttest0). Results show a p-value of 1 implying that there are no random effects as u is zero. After, I am also conducting the Hausman test to investigate whether a fixed effects model would be preferred over the random effects model. The p-value of zero shows that I should use FE over RE. When running the FE model the specified F-test shows a value of zero implying that FE can be used (?).
Btw: When omitting my independent variable the BP test gives me a p-value of 8%. Don't know if this helps but I cannot omitt this variable anyway and I think 8% is still not significant for the test. Omitting totalassets also gives me a better p-value but again this variable is too important for the model.
Now my main question: Which model should I apply? Does a non-significant Breusch-Pagan (=no random effects) also tell me that OLS is the preferred choice over FE? Or can I just argue that: 1. BP test shows that OLS>RE, 2. Hausman test shows that FE>RE, 3. F test shows that FE is suitable, thus using FE?
After, I am testing the FE model for autocorrelation (no AC) and Heteroskedasticity (yes). Because of the heteroskedasticity I am changing the FE model to ...,fe vce(cluster variable). Also, I am testing for normally distributed error term (according to swilk, sfrancia non-normally distributed but kdensity, pnorm and qnorm are actually not that bad), multicollinearity (no), linearity (yes), zero population mean (yes) and for model misspecification rebuilding the linktest with the FE model predictions and its squares (no omitted variables). I also wanted to test for exogeneity but could not find a working test - any suggestions?
I hope you can follow my approach - happy to provide more information if anything is too vague. In case you find any steps (sample preparation, conducted tests, conclusions) to be questionable or wrong please let me know. Isn't it unusual to use an OLS model for panel data? I am kind of reluctant to apply OLS as I would like to use FE because I think it fits the research question more and the data should actually include fixed effects.
Many thanks & best regards
Marc
my name is Marc and I am currently working on a panel data model with stata. I think I got rather simple questions (in bold) for experienced stata-users, even though I couldn't find answers in previous posts - so hopefully you can help me with the below

I am using panel data with approx. 1500-2500 observations (depending on which independent variable I use) and approx. 220 entities (different banks). The dependent variable is metric, the independent and the eight control variables as well. For example variables show financial ratios or total assets of the entities. For preparing the sample I eliminated outliers by winsorising (1% & 99%) the respective variables and transformed four of the control variables (with log, square and 1/cubic) - btw is there any rule of thumb as from which values of kurtosis and skewness you should transform a variable? In all the models stated below I am using lagged independent and control variables and additionally to the dependent variable the lagged dependent variable (all L1.): DV=L1.IV+L1.controls+L1.DV
Next, I am setting up a random effects model and running the Breusch-Pagan test for random effects (xttest0). Results show a p-value of 1 implying that there are no random effects as u is zero. After, I am also conducting the Hausman test to investigate whether a fixed effects model would be preferred over the random effects model. The p-value of zero shows that I should use FE over RE. When running the FE model the specified F-test shows a value of zero implying that FE can be used (?).
Btw: When omitting my independent variable the BP test gives me a p-value of 8%. Don't know if this helps but I cannot omitt this variable anyway and I think 8% is still not significant for the test. Omitting totalassets also gives me a better p-value but again this variable is too important for the model.
Now my main question: Which model should I apply? Does a non-significant Breusch-Pagan (=no random effects) also tell me that OLS is the preferred choice over FE? Or can I just argue that: 1. BP test shows that OLS>RE, 2. Hausman test shows that FE>RE, 3. F test shows that FE is suitable, thus using FE?
After, I am testing the FE model for autocorrelation (no AC) and Heteroskedasticity (yes). Because of the heteroskedasticity I am changing the FE model to ...,fe vce(cluster variable). Also, I am testing for normally distributed error term (according to swilk, sfrancia non-normally distributed but kdensity, pnorm and qnorm are actually not that bad), multicollinearity (no), linearity (yes), zero population mean (yes) and for model misspecification rebuilding the linktest with the FE model predictions and its squares (no omitted variables). I also wanted to test for exogeneity but could not find a working test - any suggestions?
I hope you can follow my approach - happy to provide more information if anything is too vague. In case you find any steps (sample preparation, conducted tests, conclusions) to be questionable or wrong please let me know. Isn't it unusual to use an OLS model for panel data? I am kind of reluctant to apply OLS as I would like to use FE because I think it fits the research question more and the data should actually include fixed effects.
Many thanks & best regards
Marc
Comment