Hello everyone, I'm having some troubles with the following regression analysis.
As my last post did not get much of a reaction at all, and I was advised to be more specific, I'm trying again for a similar question. If I got any technicalities wrong, please tell me, I'm eager to get this right.
I'm using stata 13
My variables are as follows:
Dependent:
par30 - portfolio at risk > 30 (percentage of loans overdue more than 30 days)
Independent:
perfem - percentage of female borrowers
TAK - total assets (in $1000)
PSK - average portfolio size per borrower (in $1000)
MFIage = indicates if an MFI is new, young or mature (1 = 1-4 years old, 2 = 4-8 years old, 3 = 8+ years old)
MFI = Microfinance Institution, also called financial service provider
Concerning the group variables, my data contains a variable called mfiname which has the individual name of each MFI so I did
As I want to do a linear regression analysis on my data I tried to follow this advice, so first I tested for linearity with -nlcheck- and as I wanted to have a look at it graphically aswell to include it in my thesis, I did another test following this advice.
This implies perfem, TAK and MFIage are linear, PSK however isn't.
Then I checked linearity with the following command
TAK and MFIage looked great, however



Now I have two questions
1. -nlcheck- indicated perfem was linear, however to me that doesn't look too great, any thoughts on that?
2. Any ideas on transforming PSK? I tried to log it on e which gave me the following results


Which does look better I guess, however -nlcheck- still rejects the null
Any ideas on how to properly transform PSK to get it linear, or is there no chance on doing this regression linearly with this model?
Thanks in advance!
As my last post did not get much of a reaction at all, and I was advised to be more specific, I'm trying again for a similar question. If I got any technicalities wrong, please tell me, I'm eager to get this right.
I'm using stata 13
My variables are as follows:
Dependent:
par30 - portfolio at risk > 30 (percentage of loans overdue more than 30 days)
Independent:
perfem - percentage of female borrowers
TAK - total assets (in $1000)
PSK - average portfolio size per borrower (in $1000)
MFIage = indicates if an MFI is new, young or mature (1 = 1-4 years old, 2 = 4-8 years old, 3 = 8+ years old)
- I used a different variable beforehand (0 for new, +1 for each year the MFI is active) however I encountered huge problems due to linearity, so I decided to use this one
MFI = Microfinance Institution, also called financial service provider
Concerning the group variables, my data contains a variable called mfiname which has the individual name of each MFI so I did
Code:
. egen numMFI = group(mfiname)
. xtset numMFI fiscalyear
panel variable: numMFI (unbalanced)
time variable: fiscalyear, 2003 to 2012, but with gaps
delta: 1 unit
Code:
. quietly xtreg par30 perfem TAK PSK MFIage
Code:
. nlcheck perfem
Nonlinearity test:
chi2( 9) = 10.11
Prob > chi2 = 0.3414
Code:
. nlcheck TAK
Nonlinearity test:
chi2( 9) = 7.91
Prob > chi2 = 0.5434
Code:
. nlcheck PSK
Nonlinearity test:
chi2( 9) = 20.61
Prob > chi2 = 0.0145
Code:
. nlcheck MFIage
Nonlinearity test:
chi2( 1) = 0.13
Prob > chi2 = 0.7229
Code:
. quietly regress par30 perfem TAK PSK MFIage . predict r, resid
Code:
. acprplot perfem , lowess . acprplot TAK , lowess . acprplot PSK , lowess . acprplot MFIage , lowess
Code:
. acprplot perfem , lowess
Code:
. acprplot PSK , lowess
Code:
. kdensity PSK, normal
Code:
. summarize PSK, detail
PSK
-------------------------------------------------------------
Percentiles Smallest
1% .0266504 .0149792
5% .0484809 .0195255
10% .061839 .0266504 Obs 209
25% .1161648 .0330702 Sum of Wgt. 209
50% .2528287 Mean .431633
Largest Std. Dev. .4882636
75% .5544006 2.138109
90% 1.126634 2.336616 Variance .2384014
95% 1.406009 2.36487 Skewness 2.467618
99% 2.336616 3.398852 Kurtosis 11.24819
1. -nlcheck- indicated perfem was linear, however to me that doesn't look too great, any thoughts on that?
2. Any ideas on transforming PSK? I tried to log it on e which gave me the following results
Code:
. generate lnPSK = ln(PSK) . quietly regress par30 perfem TAK lnPSK MFIage . predict r, resid (81 missing values generated)
Code:
. acprplot lnPSK , lowess
Code:
. kdensity lnPSK, normal
Which does look better I guess, however -nlcheck- still rejects the null
Code:
. quietly xtreg par30 perfem TAK lnPSK MFIage
. nlcheck lnPSK
Nonlinearity test:
chi2( 9) = 20.58
Prob > chi2 = 0.0146
Thanks in advance!

Comment