Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I encountered a problem with linearity and transformation in linear regression

    Hello everyone, I'm having some troubles with the following regression analysis.
    As my last post did not get much of a reaction at all, and I was advised to be more specific, I'm trying again for a similar question. If I got any technicalities wrong, please tell me, I'm eager to get this right.
    I'm using stata 13

    My variables are as follows:
    Dependent:
    par30 - portfolio at risk > 30 (percentage of loans overdue more than 30 days)
    Independent:
    perfem - percentage of female borrowers
    TAK - total assets (in $1000)
    PSK - average portfolio size per borrower (in $1000)
    MFIage = indicates if an MFI is new, young or mature (1 = 1-4 years old, 2 = 4-8 years old, 3 = 8+ years old)
    • I used a different variable beforehand (0 for new, +1 for each year the MFI is active) however I encountered huge problems due to linearity, so I decided to use this one
    Group variable: numMFI
    MFI = Microfinance Institution, also called financial service provider

    Concerning the group variables, my data contains a variable called mfiname which has the individual name of each MFI so I did

    Code:
    . egen numMFI = group(mfiname)
    
    . xtset numMFI fiscalyear
           panel variable:  numMFI (unbalanced)
            time variable:  fiscalyear, 2003 to 2012, but with gaps
                    delta:  1 unit
    As I want to do a linear regression analysis on my data I tried to follow this advice, so first I tested for linearity with -nlcheck- and as I wanted to have a look at it graphically aswell to include it in my thesis, I did another test following this advice.

    Code:
    . quietly xtreg par30 perfem TAK PSK MFIage
    Code:
    . nlcheck perfem
    
    Nonlinearity test:
    
               chi2(  9) =   10.11
             Prob > chi2 =    0.3414
    Code:
    . nlcheck TAK
    
    Nonlinearity test:
    
               chi2(  9) =    7.91
             Prob > chi2 =    0.5434
    Code:
    . nlcheck PSK
    
    Nonlinearity test:
    
               chi2(  9) =   20.61
             Prob > chi2 =    0.0145
    Code:
    . nlcheck MFIage
    
    Nonlinearity test:
    
               chi2(  1) =    0.13
             Prob > chi2 =    0.7229
    This implies perfem, TAK and MFIage are linear, PSK however isn't.

    Code:
    . quietly regress par30 perfem TAK PSK MFIage
    
    . predict r, resid
    Then I checked linearity with the following command
    Code:
    . acprplot perfem , lowess
    
    . acprplot TAK , lowess
    
    . acprplot PSK , lowess
    
    . acprplot MFIage , lowess
    TAK and MFIage looked great, however
    Code:
    . acprplot perfem , lowess
    Click image for larger version

Name:	acprplot perfem.png
Views:	1
Size:	50.1 KB
ID:	1482735

    Code:
    . acprplot PSK , lowess
    Click image for larger version

Name:	acprplot PSK.png
Views:	1
Size:	44.6 KB
ID:	1482736

    Code:
    . kdensity PSK, normal
    Click image for larger version

Name:	kdensity PSK.png
Views:	1
Size:	45.3 KB
ID:	1482739

    Code:
    . summarize PSK, detail
    
                                 PSK
    -------------------------------------------------------------
          Percentiles      Smallest
     1%     .0266504       .0149792
     5%     .0484809       .0195255
    10%      .061839       .0266504       Obs                 209
    25%     .1161648       .0330702       Sum of Wgt.         209
    
    50%     .2528287                      Mean            .431633
                            Largest       Std. Dev.      .4882636
    75%     .5544006       2.138109
    90%     1.126634       2.336616       Variance       .2384014
    95%     1.406009        2.36487       Skewness       2.467618
    99%     2.336616       3.398852       Kurtosis       11.24819
    Now I have two questions

    1. -nlcheck- indicated perfem was linear, however to me that doesn't look too great, any thoughts on that?
    2. Any ideas on transforming PSK? I tried to log it on e which gave me the following results

    Code:
    . generate lnPSK = ln(PSK)
    
    . quietly regress par30 perfem TAK lnPSK MFIage
    
    . predict r, resid
    (81 missing values generated)
    Code:
    . acprplot lnPSK , lowess
    Click image for larger version

Name:	acprplot lnPSK.png
Views:	1
Size:	48.5 KB
ID:	1482737

    Code:
    . kdensity lnPSK, normal
    Click image for larger version

Name:	kdensity lnPSK.png
Views:	1
Size:	49.7 KB
ID:	1482738

    Which does look better I guess, however -nlcheck- still rejects the null

    Code:
    . quietly xtreg par30 perfem TAK lnPSK MFIage
    
    . nlcheck lnPSK
    
    Nonlinearity test:
    
               chi2(  9) =   20.58
             Prob > chi2 =    0.0146
    Any ideas on how to properly transform PSK to get it linear, or is there no chance on doing this regression linearly with this model?
    Thanks in advance!
    Last edited by Dejan Toscano; 08 Feb 2019, 12:36.

  • #2
    I think you're being far too finicky about PSK. My view of the acprplot is that things are rather close to linear, except for a rather tiny blip when PSK is close to zero. Predictably enough, when the problem lies near zero, log-transforming makes matters worse. If any such transformation were likely to be helpful in this setting, it would be something like 1/x or 1/sqrt(x) or 1/x2, but in all honesty I think that any simple transformation is likely to just wreck the rather good linearity you are seeing everywhere but that tiny region. Bear in mind also that the lowess fit that you are using as a guide here is going to be less stable at the edges of the data than it is in the middle, and you have some data points at the left edge that are rather outlying and may be distorting the lowess picture, so I'm not sure how seriously I would even take this.

    I suppose the p-value from -nlcheck- bothers you, but it wouldn't bother me: I very rarely let p-values influence my model specifications.

    If it really irks you to have that degree of possible non-linearity near zero (and I can't fault you for wanting to dot your i's and cross your t's on the model specification), then I might use a linear spline with a knot at around 0.1 or something like that.

    Comment


    • #3
      Clyde thank you so much for your quick response!
      I'll take your advice and use PSK as it is. As you understood correctly the p-value from -nlcheck- really bothered me, so its great to know I don't have to rely on it completely. I'll also use a spline at around 0.1 to run the model a second time and check how the results differ.
      Thanks again, your reply really helped me a great amount!

      Comment

      Working...
      X