I encountered a problem with linearity and transformation in linear regression

Dejan Toscano

Join Date: Feb 2019

Posts: 13
#1

I encountered a problem with linearity and transformation in linear regression

08 Feb 2019, 12:33

Hello everyone, I'm having some troubles with the following regression analysis.
As my last post did not get much of a reaction at all, and I was advised to be more specific, I'm trying again for a similar question. If I got any technicalities wrong, please tell me, I'm eager to get this right.
I'm using stata 13

My variables are as follows:
Dependent:
par30 - portfolio at risk > 30 (percentage of loans overdue more than 30 days)
Independent:
perfem - percentage of female borrowers
TAK - total assets (in $1000)
PSK - average portfolio size per borrower (in $1000)
MFIage = indicates if an MFI is new, young or mature (1 = 1-4 years old, 2 = 4-8 years old, 3 = 8+ years old)
I used a different variable beforehand (0 for new, +1 for each year the MFI is active) however I encountered huge problems due to linearity, so I decided to use this one

Group variable: numMFI
MFI = Microfinance Institution, also called financial service provider

Concerning the group variables, my data contains a variable called mfiname which has the individual name of each MFI so I did

Code:

. egen numMFI = group(mfiname) . xtset numMFI fiscalyear panel variable: numMFI (unbalanced) time variable: fiscalyear, 2003 to 2012, but with gaps delta: 1 unit

As I want to do a linear regression analysis on my data I tried to follow this advice, so first I tested for linearity with -nlcheck- and as I wanted to have a look at it graphically aswell to include it in my thesis, I did another test following this advice.

Code:

. quietly xtreg par30 perfem TAK PSK MFIage

Code:

. nlcheck perfem Nonlinearity test: chi2( 9) = 10.11 Prob > chi2 = 0.3414

Code:

. nlcheck TAK Nonlinearity test: chi2( 9) = 7.91 Prob > chi2 = 0.5434

Code:

. nlcheck PSK Nonlinearity test: chi2( 9) = 20.61 Prob > chi2 = 0.0145

Code:

. nlcheck MFIage Nonlinearity test: chi2( 1) = 0.13 Prob > chi2 = 0.7229

This implies perfem, TAK and MFIage are linear, PSK however isn't.

Code:

. quietly regress par30 perfem TAK PSK MFIage . predict r, resid

Then I checked linearity with the following command

Code:

. acprplot perfem , lowess . acprplot TAK , lowess . acprplot PSK , lowess . acprplot MFIage , lowess

TAK and MFIage looked great, however

Code:

. acprplot perfem , lowess

Code:

. acprplot PSK , lowess

Code:

. kdensity PSK, normal

Code:

. summarize PSK, detail PSK ------------------------------------------------------------- Percentiles Smallest 1% .0266504 .0149792 5% .0484809 .0195255 10% .061839 .0266504 Obs 209 25% .1161648 .0330702 Sum of Wgt. 209 50% .2528287 Mean .431633 Largest Std. Dev. .4882636 75% .5544006 2.138109 90% 1.126634 2.336616 Variance .2384014 95% 1.406009 2.36487 Skewness 2.467618 99% 2.336616 3.398852 Kurtosis 11.24819

Now I have two questions

1. -nlcheck- indicated perfem was linear, however to me that doesn't look too great, any thoughts on that?
2. Any ideas on transforming PSK? I tried to log it on e which gave me the following results

Code:

. generate lnPSK = ln(PSK) . quietly regress par30 perfem TAK lnPSK MFIage . predict r, resid (81 missing values generated)

Code:

. acprplot lnPSK , lowess

Code:

. kdensity lnPSK, normal

Which does look better I guess, however -nlcheck- still rejects the null

Code:

. quietly xtreg par30 perfem TAK lnPSK MFIage . nlcheck lnPSK Nonlinearity test: chi2( 9) = 20.58 Prob > chi2 = 0.0146

Any ideas on how to properly transform PSK to get it linear, or is there no chance on doing this regression linearly with this model?
Thanks in advance!

Last edited by Dejan Toscano; 08 Feb 2019, 12:36.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

08 Feb 2019, 13:05

I think you're being far too finicky about PSK. My view of the acprplot is that things are rather close to linear, except for a rather tiny blip when PSK is close to zero. Predictably enough, when the problem lies near zero, log-transforming makes matters worse. If any such transformation were likely to be helpful in this setting, it would be something like 1/x or 1/sqrt(x) or 1/x², but in all honesty I think that any simple transformation is likely to just wreck the rather good linearity you are seeing everywhere but that tiny region. Bear in mind also that the lowess fit that you are using as a guide here is going to be less stable at the edges of the data than it is in the middle, and you have some data points at the left edge that are rather outlying and may be distorting the lowess picture, so I'm not sure how seriously I would even take this.

I suppose the p-value from -nlcheck- bothers you, but it wouldn't bother me: I very rarely let p-values influence my model specifications.

If it really irks you to have that degree of possible non-linearity near zero (and I can't fault you for wanting to dot your i's and cross your t's on the model specification), then I might use a linear spline with a knot at around 0.1 or something like that.
2 likes
Comment
Dejan Toscano

Join Date: Feb 2019

Posts: 13
#3

08 Feb 2019, 14:13

Clyde thank you so much for your quick response!
I'll take your advice and use PSK as it is. As you understood correctly the p-value from -nlcheck- really bothered me, so its great to know I don't have to rely on it completely. I'll also use a spline at around 0.1 to run the model a second time and check how the results differ.
Thanks again, your reply really helped me a great amount!
Comment

Announcement

I encountered a problem with linearity and transformation in linear regression

Comment

Comment