Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear regression on a string of ranges of indep var

    Hi all,

    I am trying to split the independent variable into different ranges and then regress the dependent variable on those separate ranges in one regression. The regression output will have different slopes for each range.

    To explain simply, I am trying to replicate
    Code:
     twoway (lfit Y Range1) || (lfit Y Range2) || (lfit Y Range3)
    in a single regression command. I understand lfit is plotting predicted values but in my regression output I want slopes of individual ranges.

    For this I created,
    Code:
    egen range = cut(X), at(X1, X2, X3) icodes
    where X1, X2, X3 are the starting points in the respective ranges.

    And then ran the regression
    Code:
    reg Y i.range
    I am not sure if this is correct. Can we specify ranges inside the regression command for example by using "if inrange()" multiple times or in any other way?

    Any help is much appreciated. Thank you.


  • #2
    No, that won't get you what you want. The variable your -egen, cut()- command creates will just give you a dichotomous variable distinguishing observations with X between X1 and X2 from those with X between X2 and X3. Regressing Y on that will not give you slopes within those ranges. It will only tell you the expected values of Y within each range.

    What you want to do is best done with linear splines.

    Code:
    mkspline range0 X1 range1 x2 range2 x3 range3 = X
    regress Y range1-range3
    (The variable range0 will represent values of X < X1, and since you are interested only intervals [X1, X2), [X2, X3), and [X3, infinity), you don't want to include range0 in the regression.)

    The coefficient of range1 will be the slope of the Y:X relationship when X lies in [X1, X2), etc.

    Comment


    • #3
      Actually, the code I show in #2 will not give you a model that corresponds to the graph you described in #1. The reason is that the code in #2 implicitly forces the linear graphs of Y vs X over the ranges to meet at the cutpoints, whereas the graphs you created in #1 allow for there to be jumps in the value of Y at the cutpoints. Assuming that the graphs in #1 represent the model you really want, you were closer to correct than I was:

      Code:
      summ X, meanonly
      local highest `r(max)'
      egen range = cut(X), at(X1, X2, X3, `highest') icodes label
      regress Y i.range##c.X
      margins range, dydx(X)
      The slopes you want will be found in the -margins- output. Note that you must calculate and include `highest' in the -at()- option as shown. If you just use -at(X1, X2, X3)-, you will get no value of range for X > X3, and you will only have two ranges.
      Last edited by Clyde Schechter; 04 Dec 2021, 12:29.

      Comment


      • #4
        Method prescribed in #3 worked perfectly Clyde. Thank you very much.
        Last edited by Lars Pete; 04 Dec 2021, 12:38.

        Comment

        Working...
        X