Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What's the best way to do with a nonlinear DV: transformation, polynomial regressors or piecewise regression?

    Dear List,

    I'm analyzing a time-series cross-sectional data set with wind capacity in US states as the dependent variable.

    My DV is strongly nonlinear (ranging from a S-shaped to growth-curve-like) and to deal with this, three approaches have been proposed:
    1. Log-transform the dependent variable.
    2. Add a squared or cubic term to the right-hand-side of the equation.
    3. Perform a piecewise regression after splitting the dataset.
    I'm not really sure how to assess which of these approaches is optimum -- aside from looking at post-estimation statistics. Is there any statistical method or package that can help me assess what the best approach is to tackling a regression with a non-linear DV?

    Thanks!

    -nick

  • #2
    The answer to this question is more likely to be in the domain of your subject matter than in the domain of statistics.

    A few points. First, I assume that by "My DV is strongly nonlinear" you mean that in exploratory graphical analyses it shows strongly non-linear relationships with some of your predictor variables.

    While log transforming the DV (assuming it has no zero or negative values) would straighten out a growth curve, it will not help you much with an S-shaped relationship. The latter would suggest something more like an inverse logit transform is needed. And that might also work for something that looks like a growth curve, because the beginning of a logit curve starts out looking like a growth curve. Also, a logistic model is fairly close to the kind of model you would get for an epidemic spreading in a homogeneous population--which might be a mechanism that has relevance in this context.

    Polynomial models usually lack any theoretical justification, although if all you are trying to do is find a model that fits the data well (but lacks generalizability) it might be suitable. As for your third suggestion, I don't know what kind of splitting of the data set you have in mind. If you think that a piecewise linear relationship between your DV and your independent variables applies, you can keep the data set in tact and use linear spilnes (-help mkspline-). Or cubic splines might be useful here as well (also, -help mkspline-). By the way, don't overlook the possibility that apparent piecewise linearity arises from the population under study being heterogeneous, with different rates prevailing in different subpopulations, and the subpopulations also segregating on some of your other independent variables. If you can identify such subpopulations, adding indicator variables and interaction terms may linearize the model.

    Those are some purely mathematical/statistical techniques (tricks?). But if the purpose of your analysis is to develop understanding of the forces influencing the growth of wind capacity in the US, it is more important, in my opinion, to develop a model that makes sense pragmatically. In particular, you would be well advised to search the literature in your field to see how others have dealt with this problem before.

    Comment


    • #3
      Dear Dr. Schechter,

      Thank you for your thoughtful and detailed reply. Let me provide a bit more context: I'm a doctoral student in public policy so I've done my best to specify a model based on well-grounded theory.

      The issue (according to my dissertation committee) is that exploratory research using the Curvefit package (Liu, 2013) has shown that models such as the S-curve, Gompertz and Rational fit the dependent variable approximately 30% more closely than a linear fit line (see two of the attached plots below). These patterns vary by state, but generally the Gompertz model best fits the data. As you note, the disease-spread mechanism is something I have identified as a possible causal driver (imitation by neighboring states).

      Now that I've moved to perform a regression analysis (after a previous chapter involved just curve-fitting), my committee is suggesting various approaches to tackle the observed non-linearity in the data. So to your first point, exploratory graphical analyses have shown that the data is nonlinear when graphed against time in an X-Y plot. Given all this, one committee member is suggesting a simple log-transformation, while another is suggesting adding a squared or cubic term that accounts for technological improvements to wind machines to the right-hand side. My idea is to perform a piecewise regression (or perhaps a spline-based analysis) as the growth patterns vary over time.

      My hope was that there would be a statistical technique to assess the "best" approach without veering into data-mining. But it sounds like my best approach is to use theory to find my approach and then go for it?

      Thanks again for any further thoughts,

      -nick
      Attached Files

      Comment


      • #4
        I have a question about curvefit. I hope your answer helps me with my situation. I am using Curvefit command to compare different fits to my data. However, I need to fit curves separately for each cluster of my data and apparently one can't use curvefi command with by (group). Is there any other way to curvefi on groups separately?

        Comment


        • #5
          It looks to me like the vertical lines in your graphs for State RPS and Treasury 1603 are some interest to you or they wouldn't be there. If it is the case that you want to test whether the time path of the outcome variable changes, either in level, slope or both at those time points you want to consider splines. Neither curve fitting with polynomials or log transforms will tell you want you want to know.
          Richard T. Campbell
          Emeritus Professor of Biostatistics and Sociology
          University of Illinois at Chicago

          Comment

          Working...
          X