Hello Statalist Community,
I hope you are well.
I am running a panel data analysis and I am trying to figure out how to model properly my predictors using square or log transformations. I would like to hear from you if I am adopting the right approach to figure out how to transform a variable to get more accurate estimates.
I am presenting the example of two of my independent variables. The first one is NetInterestMargin. Here you can see some result of the statistics.
and this another info regarding the positive skewness.
There is positive skewness and not normal distribution. I even used the hist command but I am not bale to paste it here.
In the case of the NetInterestMargin variable the log transformation does not seem to obtain positive result (still not normal distribution and greater negative skewness).
It does not seem the log transformation to be appropriate. Hovewer, the log transformation, ceteris paribus, is able to significantly affect if a variable is statistically significant or not. Here you can see by the Fixed effect estimator regression example.
In the first attempt I used NetInterstMargin (first variable named log_NetInterestMargin) and this variable is not statistically significant.
Here I used the log transformation (first variable named log_NetInterestMargin)
To sum up, What would recommend me to do with the transformation? Do you think this approach is reasonable and what would you do in a situation in which the log transformation does not seem to make a variable distribution normal or solve the skewness but at the same time a not significant variable may turn to be significant with the log transformation?
In this other variable case, the log transformation seems more appropriate and the variable is in both case significant. Therefore, I would say that the log transformation is appropriate-. Do you agree with me?
In more general terms, do you think that the combination of hist, tabstat, and sktest together with plot dependent variable and independent variable represents a valid and appropriate approach and the right combination of instruments provided by Stata?
Thanks everybody who will help me. Greetings to everyone.
Kind Regards,
I hope you are well.
I am running a panel data analysis and I am trying to figure out how to model properly my predictors using square or log transformations. I would like to hear from you if I am adopting the right approach to figure out how to transform a variable to get more accurate estimates.
I am presenting the example of two of my independent variables. The first one is NetInterestMargin. Here you can see some result of the statistics.
Code:
sktest NetInterestMargin Skewness and kurtosis tests for normality ----- Joint test ----- Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2 ------------------+----------------------------------------------------------------- NetInterestMargin | 1,199 0.0030 0.0002 20.40 0.0000
Code:
tabstat NetInterestMargin, stats (sk) Variable | Skewness -------------+---------- NetInteres~n | .2111044 ------------------------
In the case of the NetInterestMargin variable the log transformation does not seem to obtain positive result (still not normal distribution and greater negative skewness).
Code:
tabstat NetInterestMargin log_NetInterestMargin , stats (sk) Stats | NetInt~n log_Ne~n ---------+-------------------- Skewness | .2111044 -2.151288 ------------------------------
In the first attempt I used NetInterstMargin (first variable named log_NetInterestMargin) and this variable is not statistically significant.
Code:
Fixed-effects (within) regression Number of obs = 1,197 Group variable: id Number of groups = 109 R-squared: Obs per group: Within = 0.7067 min = 9 Between = 0.0007 avg = 11.0 Overall = 0.2757 max = 11 F(21,1067) = 122.40 corr(u_i, Xb) = -0.0924 Prob > F = 0.0000 ------------------------------------------------------------------------------------- log_NPL_perc | Coefficient Std. err. t P>|t| [95% conf. interval] --------------------+---------------------------------------------------------------- NetInterestMargin | -.0170019 .0359333 -0.47 0.636 -.0875098 .0535061 AvgEquityAvgAssets | -.0340388 .0072217 -4.71 0.000 -.0482091 -.0198684 CosttoIncome | -.0038549 .0010209 -3.78 0.000 -.005858 -.0018518 ROAA | -.1165616 .0466176 -2.50 0.013 -.2080341 -.0250891 LLP | -1.21e-07 2.63e-07 -0.46 0.646 -6.36e-07 3.95e-07 Assets | 26.958 10.27897 2.62 0.009 6.788705 47.12729 deltabankloans | -.0233581 .0057626 -4.05 0.000 -.0346655 -.0120507 deltaFTSEMIB | .0056594 .0008663 6.53 0.000 .0039596 .0073592 RealGDPGrowth | .0743855 .0041371 17.98 0.000 .0662677 .0825033 deltaNCLDeposits | -.0564172 .0032417 -17.40 0.000 -.062778 -.0500564 dummy_25 | -.1135303 .0980685 -1.16 0.247 -.3059594 .0788987 dummy_50_75 | .0063863 .059466 0.11 0.914 -.1102973 .12307 dummy_25_50 | -.165802 .0798618 -2.08 0.038 -.3225061 -.009098 SIZE_25_ROAA | .001881 .0640499 0.03 0.977 -.1237971 .1275591 SIZE_50_ROAA | -.0522251 .0774365 -0.67 0.500 -.2041703 .09972 SIZE_75_ROAA | -.0612316 .0615014 -1.00 0.320 -.181909 .0594457 L1_RealGDPGrowth | .1579416 .006119 25.81 0.000 .1459349 .1699482 L2_RealGDPGrowth | .090434 .005898 15.33 0.000 .078861 .1020071 L1_deltaFTSEMIB | .0041794 .0005927 7.05 0.000 .0030165 .0053423 L1_deltabankloans | -.06782 .0060265 -11.25 0.000 -.0796451 -.0559949 L1_deltaNCLDeposits | -.0751297 .0032144 -23.37 0.000 -.081437 -.0688224 _cons | -2.004619 .1169712 -17.14 0.000 -2.234139 -1.775099 --------------------+---------------------------------------------------------------- sigma_u | .65158704 sigma_e | .3224297 rho | .80330051 (fraction of variance due to u_i) ------------------------------------------------------------------------------------- F test that all u_i=0: F(108, 1067) = 26.96 Prob > F = 0.0000
Code:
Fixed-effects (within) regression Number of obs = 1,197 Group variable: id Number of groups = 109 R-squared: Obs per group: Within = 0.7077 min = 9 Between = 0.0102 avg = 11.0 Overall = 0.2452 max = 11 F(21,1067) = 123.05 corr(u_i, Xb) = -0.1402 Prob > F = 0.0000 --------------------------------------------------------------------------------------- log_NPL_perc | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------------+---------------------------------------------------------------- log_NetInterestMargin | -.1176943 .0573763 -2.05 0.040 -.2302774 -.0051111 AvgEquityAvgAssets | -.0319452 .0071842 -4.45 0.000 -.0460419 -.0178484 CosttoIncome | -.0040174 .0010115 -3.97 0.000 -.0060022 -.0020325 ROAA | -.1093108 .0466724 -2.34 0.019 -.2008909 -.0177306 LLP | -8.41e-08 2.63e-07 -0.32 0.749 -5.99e-07 4.31e-07 Assets | 31.68001 10.45083 3.03 0.002 11.1735 52.18651 deltabankloans | -.0262004 .0056822 -4.61 0.000 -.0373499 -.0150508 deltaFTSEMIB | .0058295 .0008614 6.77 0.000 .0041394 .0075197 RealGDPGrowth | .0737951 .0041224 17.90 0.000 .0657063 .081884 deltaNCLDeposits | -.0576963 .0032084 -17.98 0.000 -.0639918 -.0514008 dummy_25 | -.1096407 .0978907 -1.12 0.263 -.3017208 .0824395 dummy_50_75 | .0115595 .0594111 0.19 0.846 -.1050163 .1281352 dummy_25_50 | -.1628232 .0797169 -2.04 0.041 -.3192428 -.0064035 SIZE_25_ROAA | -.0033336 .0639285 -0.05 0.958 -.1287733 .1221062 SIZE_50_ROAA | -.0571957 .077265 -0.74 0.459 -.2088043 .0944128 SIZE_75_ROAA | -.0649643 .0613895 -1.06 0.290 -.1854221 .0554935 L1_RealGDPGrowth | .1595236 .0060378 26.42 0.000 .1476762 .1713709 L2_RealGDPGrowth | .0894354 .0058843 15.20 0.000 .0778893 .1009814 L1_deltaFTSEMIB | .0040979 .0005889 6.96 0.000 .0029423 .0052535 L1_deltabankloans | -.0646026 .0057948 -11.15 0.000 -.0759731 -.0532321 L1_deltaNCLDeposits | -.0744212 .003186 -23.36 0.000 -.0806728 -.0681697 _cons | -1.987116 .101714 -19.54 0.000 -2.186698 -1.787534 ----------------------+---------------------------------------------------------------- sigma_u | .67220621 sigma_e | .32182958 rho | .81352599 (fraction of variance due to u_i) --------------------------------------------------------------------------------------- F test that all u_i=0: F(108, 1067) = 26.62 Prob > F = 0.0000
In this other variable case, the log transformation seems more appropriate and the variable is in both case significant. Therefore, I would say that the log transformation is appropriate-. Do you agree with me?
Code:
tabstat log_AvgEquityAvgAssets AvgEquityAvgAssets,stats (sk) Stats | log_Av~s AvgEqu~s ---------+-------------------- Skewness | -.346894 1.60862 ------------------------------
Thanks everybody who will help me. Greetings to everyone.
Kind Regards,
Comment