Hello everyone, I'm having some troubles with the following regression analysis.
As my last post did not get much of a reaction at all, and I was advised to be more specific, I'm trying again for a similar question. If I got any technicalities wrong, please tell me, I'm eager to get this right.
I'm using stata 13
My variables are as follows:
Dependent:
par30 - portfolio at risk > 30 (percentage of loans overdue more than 30 days)
Independent:
perfem - percentage of female borrowers
TAK - total assets (in $1000)
PSK - average portfolio size per borrower (in $1000)
MFIage = indicates if an MFI is new, young or mature (1 = 1-4 years old, 2 = 4-8 years old, 3 = 8+ years old)
MFI = Microfinance Institution, also called financial service provider
Concerning the group variables, my data contains a variable called mfiname which has the individual name of each MFI so I did
As I want to do a linear regression analysis on my data I tried to follow this advice, so first I tested for linearity with -nlcheck- and as I wanted to have a look at it graphically aswell to include it in my thesis, I did another test following this advice.
This implies perfem, TAK and MFIage are linear, PSK however isn't.
Then I checked linearity with the following command
TAK and MFIage looked great, however



Now I have two questions
1. -nlcheck- indicated perfem was linear, however to me that doesn't look too great, any thoughts on that?
2. Any ideas on transforming PSK? I tried to log it on e which gave me the following results


Which does look better I guess, however -nlcheck- still rejects the null
Any ideas on how to properly transform PSK to get it linear, or is there no chance on doing this regression linearly with this model?
Thanks in advance!
As my last post did not get much of a reaction at all, and I was advised to be more specific, I'm trying again for a similar question. If I got any technicalities wrong, please tell me, I'm eager to get this right.
I'm using stata 13
My variables are as follows:
Dependent:
par30 - portfolio at risk > 30 (percentage of loans overdue more than 30 days)
Independent:
perfem - percentage of female borrowers
TAK - total assets (in $1000)
PSK - average portfolio size per borrower (in $1000)
MFIage = indicates if an MFI is new, young or mature (1 = 1-4 years old, 2 = 4-8 years old, 3 = 8+ years old)
- I used a different variable beforehand (0 for new, +1 for each year the MFI is active) however I encountered huge problems due to linearity, so I decided to use this one
MFI = Microfinance Institution, also called financial service provider
Concerning the group variables, my data contains a variable called mfiname which has the individual name of each MFI so I did
Code:
. egen numMFI = group(mfiname) . xtset numMFI fiscalyear panel variable: numMFI (unbalanced) time variable: fiscalyear, 2003 to 2012, but with gaps delta: 1 unit
Code:
. quietly xtreg par30 perfem TAK PSK MFIage
Code:
. nlcheck perfem Nonlinearity test: chi2( 9) = 10.11 Prob > chi2 = 0.3414
Code:
. nlcheck TAK Nonlinearity test: chi2( 9) = 7.91 Prob > chi2 = 0.5434
Code:
. nlcheck PSK Nonlinearity test: chi2( 9) = 20.61 Prob > chi2 = 0.0145
Code:
. nlcheck MFIage Nonlinearity test: chi2( 1) = 0.13 Prob > chi2 = 0.7229
Code:
. quietly regress par30 perfem TAK PSK MFIage . predict r, resid
Code:
. acprplot perfem , lowess . acprplot TAK , lowess . acprplot PSK , lowess . acprplot MFIage , lowess
Code:
. acprplot perfem , lowess
Code:
. acprplot PSK , lowess
Code:
. kdensity PSK, normal
Code:
. summarize PSK, detail PSK ------------------------------------------------------------- Percentiles Smallest 1% .0266504 .0149792 5% .0484809 .0195255 10% .061839 .0266504 Obs 209 25% .1161648 .0330702 Sum of Wgt. 209 50% .2528287 Mean .431633 Largest Std. Dev. .4882636 75% .5544006 2.138109 90% 1.126634 2.336616 Variance .2384014 95% 1.406009 2.36487 Skewness 2.467618 99% 2.336616 3.398852 Kurtosis 11.24819
1. -nlcheck- indicated perfem was linear, however to me that doesn't look too great, any thoughts on that?
2. Any ideas on transforming PSK? I tried to log it on e which gave me the following results
Code:
. generate lnPSK = ln(PSK) . quietly regress par30 perfem TAK lnPSK MFIage . predict r, resid (81 missing values generated)
Code:
. acprplot lnPSK , lowess
Code:
. kdensity lnPSK, normal
Which does look better I guess, however -nlcheck- still rejects the null
Code:
. quietly xtreg par30 perfem TAK lnPSK MFIage . nlcheck lnPSK Nonlinearity test: chi2( 9) = 20.58 Prob > chi2 = 0.0146
Thanks in advance!
Comment