Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • FYI: Bug in lnlsq() option of nl (nonlinear least squares) and discussion of nl's R2

    I am using nl with the lnlsq(0) option in Stata 12.1 with Windows 7.

    Bug in lnlsq(0) option of nl
    The lnlsq(#) option permits log-linear errors, where ln(depvar #) is assumed to be normally distributed. predict generates error r(198): invalid syntax. "capture noisily predict" will not identify the error, but the error code can be shown with "display _rc". Predictions can be calculated with the generate command. Sample code:
    Code:
    tempname b1out b2out
    scalar `b1out' =  _b[b1:_cons]
    scalar `b2oout' =  _b[b2:_cons]
    gen double pred = `b1out'*x1 + `b2out'*x2
    Discussion of nl and R2
    Stata's manual states: "All other statistics are calculated analogously to those in linear regression, except that the nonlinear function f(xi,beta) plays the role of the linear function x'beta. See [R] regress." Wikipedia's entry on the Coefficient of determination states, "There are several definitions of R2 that are only sometimes equivalent. " I concur.

    The R2 as presented by nl can seem high when compared with casual inspection of a graph of actual and predicted values. When run without the lnlsq() option, R2 appears to match this formula:

    R2 = 1 - (residuals^2/actuals^2) . Call that R2(a).

    In nonlinear contexts it is not unusual for the average residual to equal something other than zero. R2(b) gives a different answer:

    R2(b) = 1 - ((residuals-ave(residuals))^2/ (actuals-ave(actuals))^2

    And if you square the correlations you get a third answer:

    R2(c) = (corr(predicted,actuals))^2

    I find the last to be most intuitive. In the particular sample run that I worked with, the values differed widedly:

    nl gave R2= .8283
    R2(a) = .8283
    R2(b) = -.87145797 (yes, <0)
    R2(c) = .4237^2 = .1795

    When nl is run assuming log-linear errors (the lnlsq(0) option), residuals and predicted values must be worked out with the generate command. I can no longer replicate Stata's R2 figure, though I haven't checked all possibilities. In particular, I understand the generalized R2 apparently encompasses the ordinary R2 in a linear context and produces values between 0 and 1 when the model is nonlinear, according to the wikipedia article, which includes citations to published work. I haven't tried that calculation, nor am I familiar with it. In my nl,lnlsq(0) example, I produced values as follows :

    nl gave R2=.9943
    R2(a) = .7645
    R2(b) = -2.7902443
    R2(c) = .3535^2 = .1250

    That's a fairly wide range.

    Researchers using nonlinear least squares might want calculate R2 by hand and report their method. Researchers uneasy with the lnlsq(0)option can get an approximation by taking logs of both sides of their equation. In my model, the coefficients matched closely.

    Code available by request, but please let me know what you want presented.
    ---

    Past discussion
    http://www.stata.com/statalist/archi.../msg00738.html

    Nice treatment:
    http://www.stata.com/support/faqs/statistics/r-squared/

    A program by UCLA Statistical Consulting that will compute R-squared and several other fit indices for rreg. I have not tried it. I assume the .ado file could be adapted.
    http://www.ats.ucla.edu/stat/stata/faq/rregr2.htm
    ---

    Bonus advice on debugging nl function evaluator programs
    The tried and true method involves the command "set trace on", submitted immediately before the nl command. But also check for these common errors:

    1. syntax varlist(min=5 max=5) if, at(name)
    Q: Do the min and max numbers match the number of variables passed to the program? Do the variables in the nl command match those in the program?

    2.
    local y : word 1 of `varlist'
    local x1 : word 2 of `varlist'
    Q: Were any numbers skipped or mis-numbered?

    3. Did you remember to include all the parameters in the tempname command? Do the `at'[1,n] locals have the proper numerical ordering? Do they match the parameters provided in the nl command? Does the parameters() option match the initial() option, with the same number of parameters and plausible initial values?
    Last edited by David Howe; 22 Jan 2015, 13:13.
Working...
X