Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Breakpoint found outside Domain for a "Hockey Stick" Regression

    Hello!

    I am trying to identify a kink in a distribution. To do so, I installed the loghockey package, in order to perform a piecewise linear regressions. I am running Stata 13.1 on Windows 7.
    Below is the code I used to run my piecewise regression (with an linear regression included for comparison). While the Scatter plot is not necessary, I attached an image of it to help motivate why I expected to find a kink in my data.

    Stata code:
    Code:
    sort y
    gen x = _n/_N
    nl hockey y x
    regress y x
    twoway (scatter y x)
    Unfortunately, my piecewise regression results appear to be incorrect. I identify a breakpoint (the value on the domain where the kink is, identified as c in nlhockey.hlp) of 1.007157, a value outside of my domain (domain has values between 1/7367 and 1). However, the slope to the left of the breakpoint (identified as b in nlhockey.hlp) does not match the slope of the linear regression.
    I am not sure if I am doing something incorrectly. I went through the help file of nlhockey, and I could not find an explanation for this result. If any of you have any ideas or insights into the matter, I would appreciate your input!

    Thanks!
    Attached Files
    Graph generated by included Stata code
    Last edited by Chady Gemayel; 20 Sep 2014, 21:19. Reason: Removing broken image link

  • #2
    This sounds as if you have a J-shaped distribution and you are using a method designed for examining bivariate relationships, especially kinks in time series. Either way I guess your results are telling you indirectly that the model doesn't fit well.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      This sounds as if you have a J-shaped distribution and you are using a method designed for examining bivariate relationships, especially kinks in time series. Either way I guess your results are telling you indirectly that the model doesn't fit well.

      Agreed. However, given that the breakpoint is outside of my domain, I would expect the piecewise regression to match the results of the linear regression. The fact that it doesn't suggests there's an issue with the package, or that I am missing something.

      Comment


      • #4
        1.007 is close enough to 1 that I'm thinking it's just a precision issue. The breakpoint is 1. Since all the cases with a value of 1 are taken out of the first piece (b), its slope (based on the scatterplot) should be pretty close to zero.

        Comment


        • #5
          Originally posted by ben earnhart View Post
          1.007 is close enough to 1 that I'm thinking it's just a precision issue. The breakpoint is 1. Since all the cases with a value of 1 are taken out of the first piece (b), its slope (based on the scatterplot) should be pretty close to zero.
          I see. The slope is much smaller in magnitude on the left then it is on the right. I initially thought the precision issue was related to using small x, so I tried the following code:

          Code:
          sort y
          gen x = _n
          nl hockey y x
          regress y x
          However, the breakpoint was still outside of the domain. Does this mean that the precision issue occurs in the actual regression calculation?

          Either way, thank you both for helping explain my result!
          Last edited by Chady Gemayel; 21 Sep 2014, 10:16. Reason: Tried to make writing clearer

          Comment


          • #6
            Kinks supposed to exist in distribution functions are rarely credible without a substantive rationale. In any case, much depends on your values, but if they are all positive, you will see more structure more plausibly by adopting a log scale on either or both axes.

            Comment


            • #7
              While normally throwing away information is a bad thing, given the relationship between x and y, I'm wondering if simple dummy variable regression might be preferable and more interpretable.

              Can you try turning x into a dummy variable? If the r2 between treating it as a dummy and as a continuous variable is negligible, then treating it as a dummy might simplify interpretation. However, if you really care about the little bit of variation at the lower levels of x, the methods described here might help, giving you flexibility in defining your cut-points.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Kinks supposed to exist in distribution functions are rarely credible without a substantive rationale. In any case, much depends on your values, but if they are all positive, you will see more structure more plausibly by adopting a log scale on either or both axes.

                I do find a breakpoint in the domain if I use log(y) as my dependent variable.
                I was trying to produce a result identified in LaBrie et. al. (http://eurpub.oxfordjournals.org/con...4/410.full.pdf). On page 412 (3rd page of .pdf), they visually identify a kink in the distribution at 95%. I was hoping to validate this by using the piecewise linear regression.

                Full citation:
                LaBrie, Richard A., et al. "Inside the virtual casino: A prospective longitudinal study of actual Internet casino gambling." The European Journal of Public Health 18.4 (2008): 410-416.

                Comment

                Working...
                X