Hello Statalist Community,
I hope you are well.
I am running a panel data analysis and I am trying to figure out how to model properly my predictors using square or log transformations. I would like to hear from you if I am adopting the right approach to figure out how to transform a variable to get more accurate estimates.
I am presenting the example of two of my independent variables. The first one is NetInterestMargin. Here you can see some result of the statistics.
and this another info regarding the positive skewness.
There is positive skewness and not normal distribution. I even used the hist command but I am not bale to paste it here.
In the case of the NetInterestMargin variable the log transformation does not seem to obtain positive result (still not normal distribution and greater negative skewness).
It does not seem the log transformation to be appropriate. Hovewer, the log transformation, ceteris paribus, is able to significantly affect if a variable is statistically significant or not. Here you can see by the Fixed effect estimator regression example.
In the first attempt I used NetInterstMargin (first variable named log_NetInterestMargin) and this variable is not statistically significant.
Here I used the log transformation (first variable named log_NetInterestMargin)
To sum up, What would recommend me to do with the transformation? Do you think this approach is reasonable and what would you do in a situation in which the log transformation does not seem to make a variable distribution normal or solve the skewness but at the same time a not significant variable may turn to be significant with the log transformation?
In this other variable case, the log transformation seems more appropriate and the variable is in both case significant. Therefore, I would say that the log transformation is appropriate-. Do you agree with me?
In more general terms, do you think that the combination of hist, tabstat, and sktest together with plot dependent variable and independent variable represents a valid and appropriate approach and the right combination of instruments provided by Stata?
Thanks everybody who will help me. Greetings to everyone.
Kind Regards,
I hope you are well.
I am running a panel data analysis and I am trying to figure out how to model properly my predictors using square or log transformations. I would like to hear from you if I am adopting the right approach to figure out how to transform a variable to get more accurate estimates.
I am presenting the example of two of my independent variables. The first one is NetInterestMargin. Here you can see some result of the statistics.
Code:
sktest NetInterestMargin
Skewness and kurtosis tests for normality
----- Joint test -----
Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2
------------------+-----------------------------------------------------------------
NetInterestMargin | 1,199 0.0030 0.0002 20.40 0.0000
Code:
tabstat NetInterestMargin, stats (sk)
Variable | Skewness
-------------+----------
NetInteres~n | .2111044
------------------------
In the case of the NetInterestMargin variable the log transformation does not seem to obtain positive result (still not normal distribution and greater negative skewness).
Code:
tabstat NetInterestMargin log_NetInterestMargin , stats (sk) Stats | NetInt~n log_Ne~n ---------+-------------------- Skewness | .2111044 -2.151288 ------------------------------
In the first attempt I used NetInterstMargin (first variable named log_NetInterestMargin) and this variable is not statistically significant.
Code:
Fixed-effects (within) regression Number of obs = 1,197
Group variable: id Number of groups = 109
R-squared: Obs per group:
Within = 0.7067 min = 9
Between = 0.0007 avg = 11.0
Overall = 0.2757 max = 11
F(21,1067) = 122.40
corr(u_i, Xb) = -0.0924 Prob > F = 0.0000
-------------------------------------------------------------------------------------
log_NPL_perc | Coefficient Std. err. t P>|t| [95% conf. interval]
--------------------+----------------------------------------------------------------
NetInterestMargin | -.0170019 .0359333 -0.47 0.636 -.0875098 .0535061
AvgEquityAvgAssets | -.0340388 .0072217 -4.71 0.000 -.0482091 -.0198684
CosttoIncome | -.0038549 .0010209 -3.78 0.000 -.005858 -.0018518
ROAA | -.1165616 .0466176 -2.50 0.013 -.2080341 -.0250891
LLP | -1.21e-07 2.63e-07 -0.46 0.646 -6.36e-07 3.95e-07
Assets | 26.958 10.27897 2.62 0.009 6.788705 47.12729
deltabankloans | -.0233581 .0057626 -4.05 0.000 -.0346655 -.0120507
deltaFTSEMIB | .0056594 .0008663 6.53 0.000 .0039596 .0073592
RealGDPGrowth | .0743855 .0041371 17.98 0.000 .0662677 .0825033
deltaNCLDeposits | -.0564172 .0032417 -17.40 0.000 -.062778 -.0500564
dummy_25 | -.1135303 .0980685 -1.16 0.247 -.3059594 .0788987
dummy_50_75 | .0063863 .059466 0.11 0.914 -.1102973 .12307
dummy_25_50 | -.165802 .0798618 -2.08 0.038 -.3225061 -.009098
SIZE_25_ROAA | .001881 .0640499 0.03 0.977 -.1237971 .1275591
SIZE_50_ROAA | -.0522251 .0774365 -0.67 0.500 -.2041703 .09972
SIZE_75_ROAA | -.0612316 .0615014 -1.00 0.320 -.181909 .0594457
L1_RealGDPGrowth | .1579416 .006119 25.81 0.000 .1459349 .1699482
L2_RealGDPGrowth | .090434 .005898 15.33 0.000 .078861 .1020071
L1_deltaFTSEMIB | .0041794 .0005927 7.05 0.000 .0030165 .0053423
L1_deltabankloans | -.06782 .0060265 -11.25 0.000 -.0796451 -.0559949
L1_deltaNCLDeposits | -.0751297 .0032144 -23.37 0.000 -.081437 -.0688224
_cons | -2.004619 .1169712 -17.14 0.000 -2.234139 -1.775099
--------------------+----------------------------------------------------------------
sigma_u | .65158704
sigma_e | .3224297
rho | .80330051 (fraction of variance due to u_i)
-------------------------------------------------------------------------------------
F test that all u_i=0: F(108, 1067) = 26.96 Prob > F = 0.0000
Code:
Fixed-effects (within) regression Number of obs = 1,197
Group variable: id Number of groups = 109
R-squared: Obs per group:
Within = 0.7077 min = 9
Between = 0.0102 avg = 11.0
Overall = 0.2452 max = 11
F(21,1067) = 123.05
corr(u_i, Xb) = -0.1402 Prob > F = 0.0000
---------------------------------------------------------------------------------------
log_NPL_perc | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
log_NetInterestMargin | -.1176943 .0573763 -2.05 0.040 -.2302774 -.0051111
AvgEquityAvgAssets | -.0319452 .0071842 -4.45 0.000 -.0460419 -.0178484
CosttoIncome | -.0040174 .0010115 -3.97 0.000 -.0060022 -.0020325
ROAA | -.1093108 .0466724 -2.34 0.019 -.2008909 -.0177306
LLP | -8.41e-08 2.63e-07 -0.32 0.749 -5.99e-07 4.31e-07
Assets | 31.68001 10.45083 3.03 0.002 11.1735 52.18651
deltabankloans | -.0262004 .0056822 -4.61 0.000 -.0373499 -.0150508
deltaFTSEMIB | .0058295 .0008614 6.77 0.000 .0041394 .0075197
RealGDPGrowth | .0737951 .0041224 17.90 0.000 .0657063 .081884
deltaNCLDeposits | -.0576963 .0032084 -17.98 0.000 -.0639918 -.0514008
dummy_25 | -.1096407 .0978907 -1.12 0.263 -.3017208 .0824395
dummy_50_75 | .0115595 .0594111 0.19 0.846 -.1050163 .1281352
dummy_25_50 | -.1628232 .0797169 -2.04 0.041 -.3192428 -.0064035
SIZE_25_ROAA | -.0033336 .0639285 -0.05 0.958 -.1287733 .1221062
SIZE_50_ROAA | -.0571957 .077265 -0.74 0.459 -.2088043 .0944128
SIZE_75_ROAA | -.0649643 .0613895 -1.06 0.290 -.1854221 .0554935
L1_RealGDPGrowth | .1595236 .0060378 26.42 0.000 .1476762 .1713709
L2_RealGDPGrowth | .0894354 .0058843 15.20 0.000 .0778893 .1009814
L1_deltaFTSEMIB | .0040979 .0005889 6.96 0.000 .0029423 .0052535
L1_deltabankloans | -.0646026 .0057948 -11.15 0.000 -.0759731 -.0532321
L1_deltaNCLDeposits | -.0744212 .003186 -23.36 0.000 -.0806728 -.0681697
_cons | -1.987116 .101714 -19.54 0.000 -2.186698 -1.787534
----------------------+----------------------------------------------------------------
sigma_u | .67220621
sigma_e | .32182958
rho | .81352599 (fraction of variance due to u_i)
---------------------------------------------------------------------------------------
F test that all u_i=0: F(108, 1067) = 26.62 Prob > F = 0.0000
In this other variable case, the log transformation seems more appropriate and the variable is in both case significant. Therefore, I would say that the log transformation is appropriate-. Do you agree with me?
Code:
tabstat log_AvgEquityAvgAssets AvgEquityAvgAssets,stats (sk) Stats | log_Av~s AvgEqu~s ---------+-------------------- Skewness | -.346894 1.60862 ------------------------------
Thanks everybody who will help me. Greetings to everyone.
Kind Regards,

Comment