Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lowess vs. Lpoly and scatter/raw data suppression

    Hi I'm running a polynomial regression of property value on distance, and am graphing multiple versions of property value against the same continuous distance variable. I can suppress the raw data via noscatter for the lpoly graphical output, but cannot find a similar command for Lowess. I am also not entirely certain the difference between the two as they both claim to be doing localized regression at the bin distances. Can anyone tell me 1. how to supress the lowess scatter data leaving only a smoothed line, and 2. what exactly I'm trading off between lowess and lpoly?

    here's a look at my .do file, I was trying the mcolor to at least make the scatter data fade a bit (which it didn't do), and I have a similar one for the lpoly version:

    graph twoway (scatter resid_78 resid_9 near_km if near_km<100) ///
    (lowess resid_9 near_km, mcolor(*.1) clpattern(_)) ///
    (lowess resid_78 near_km, mcolor(*.1))

    Thanks

  • #2
    With regard to 1, with -lowess- you can use the -nograph- option along with the -generate()- option. Then do a line plot of the generated variable against your independent:

    Code:
    lowess y x, nograph gen(yhat)
    graph twoway line yhat x
    With regard to 2, -lowess- fits a linear regression at each point based on a small neighborhood of points around that point (how small depends on the specified bandwidth), and pieces those together. -lpoly- fits a polynomial regression at each point, again based on a neighborhood of points around that one depending on the specified bandwith. I have very limited experience with -lpoly- and am venturing beyond my expertise here, but I have found that the -lpoly- results provide a "prettier" fit to the data. By that I mean that with -lpoly- you can tune the bandwidth to find something that both twists and turns with the data but is also not very jagged, whereas with -lowess- you generally have to trade off smoothness against fit, and there is often no "sweet spot" that gives both. But I haven't used -lpoly- enough to be sure that that is a general property of the procedure and not just what I have seen in a few "lucky" samples. It does make sense, though, when you think about how the two procecures work.

    I have also read that the -lpoly- tends to provide better fit to the data at the extremes of the range of the independent variable than -lowess-.

    I tend to use these commands only for exploratory purposes, to get a sense of the form of the relationship between two variables before I start building models, I haven't delved deeply into their technical properties, so I can't really say much more.
    Last edited by Clyde Schechter; 04 Apr 2016, 14:42.

    Comment


    • #3
      Thanks Clyde this helps a lot.

      Comment


      • #4
        It is even easier to suppress the data points than Clyde reports.

        Code:
         
        sysuse auto 
        twoway lowess mpg weight 
        lowess mpg weight, ms(none)
        both just show smooth curves.

        Broadly speaking, lpoly is more general than lowess as the latter is only local linear regression, with the option of a local mean, whereas lpoly offers any degree of polynomial. In practice I haven't often wanted quadratics or higher.

        lowess was long a personal favourite but I've come to favour lpoly. I like its greater flexibility even though I haven't much exploited it.

        Although lowess has a clear pedigree in practice implementations of the lowess method (often under other names, such as loess or locfit) in various software are far from standardised. For example, some implementations include iterated robust fits, which I don't think Stata has ever offered. So, if publishing results in principle you need to explain what is idiosyncratic about Stata's implementation.

        What lpoly does is by contrast, so far as I am aware, quite standard.

        I found lpoly disappointing at first but I think the main reason is that its defaults are (perhaps deliberately) not well chosen. After a burst of experimentation I settled on defaults I like better and coded the whole thing as localp (SSC), which is just lpoly set up as I like it. I don't offer cross-validation (which although thought up by very smart people has I think been oversold).

        http://www.statalist.org/forums/foru...ial-regression

        Comment


        • #5
          Thanks, Nick. I should have thought of the -ms(none)- approach myself. But I never never about -twoway lowess-.

          Comment


          • #6
            Harking back to #1 the reason mcolor(*.1) does nothing to the data points is that it applies only to each lowess curve. The parentheses enforce application only to each twoway call.

            If you want data as backdrop, one simple tip is to reach for a gray (grey) shade.

            See also http://www.stata-journal.com/sjpdf.h...iclenum=gr0040
            http://www.stata-journal.com/sjpdf.h...iclenum=gr0046

            Code:
            webuse motorcycle
            * ssc inst localp
            localp accel time, bw(3) mc(gs10) scheme(s1color)
            Click image for larger version

Name:	lpoly.png
Views:	1
Size:	14.8 KB
ID:	1334046


            Note that I am typing on an ancient computer with Stata 10. In Stata 11 up, the ugly "-sq" is replaced automatically by a superscript 2.

            Comment


            • #7
              Thanks I really appreciate it, thesis draft due in one week!

              Comment

              Working...
              X