Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spline with coefficient from linear regression model

    Dear all,
    I'm running an analysis assessing the association between a continuous variable and a clinical characteristic (categorical variable). Since I've found discrepancies between a linear regression model and a logistic one (i.e. if I categorised my continuous variable it prompts a positive association --> increased odds for pts with a clinical characteristic of having high variable // but if I keep it continuous the coefficient is no more significant ) I was wondering if the association couldn't be linear. Thereafter I've started thinking about using a spline model, displaying coefficient of a linear model (adjusted for covariates) across each single values of my continuous variable. Nonetheless i'm having problems. Anyone can help me out?

    Here's the code

    Variable dependent: ntprobnp
    Categorical variable: clinPH

    Code used:

    mkspline ntprobnpspl = ntprobnp, cubic nknots(5) displayknots
    mat knots = r(knots) //crea spline variable con knots

    stset sos_outtime_hosphf, failure(sos_out_deathcvhosphf=2) scale(30.44) exit(time 30.44*12)

    regress ntprobnpspl* i.clinPH i.agec i.sexc i.NYHAdic i.diur i.FA i.egfrc i.SV i.agec i.bmic I.LAarea i.MAP


    tabstat ntprobnp, statistics(p1 p5 p10 p25 p50 p75 p90 p95 p99) /


    levelsof ntprobnp
    xbrcspline ntprobnpspl, values(35 85 191 459 1588 4070 7701 14737 3500) ref(35) matknots(knots) eform gen(Ntprobnp coeff_ntprobnp lb_events ub_events)

    and here I got a message from STATA [ntprobnpspl1] not found

  • #2
    I'm confused by your code. You have -stset- your data, but there is no subsequent survival analysis.

    And your use of the spline variables is baffling! You have put the first spine variable (which, I believe, coming from -mkspline- is identical to ntprobnp itself) as the dependent variable in your -regress- command, and the other spline variables are independent variables. In other words, you are predicting ntprobnp in a linear model that includes the rest of the spline of ntprobnp. That makes no sense to me at all.

    Your final command, -xtbrcspline- is not part of official Stata, and -search- turns up nothing on it. What is it supposed to do? Where does it come from?

    Putting aside all this confusing code, to get a feel for the association between ntprobnp and some clinical predictor variable (I'll just refer to it as x since I can't tell which of the many variables in your regression is the one you are focusing on), you can graph the relationship with -lowess ntprobnp x, logit-. If the fitted curve is more or less a straight line, that would support using a logistic model.

    I'll make a couple of generic comments about this situation. Dichotomizing an inherently continuous variable is almost always a bad idea. "Significant" findings with the resulting dichotomous variable are typically artifacts of the cutpoint used to establish the dichotomy and are not really interpretable. Dichotomizing also discards information and also treats some similar observations as if they were radically different. It is really only justifiable if you believe that there is some real-world mechanism that causes the relationship with the clinical predictor to actually jump abruptly at some cutpoint that can be understood, and, better still, predicted, in mechanistic terms.

    Finally, remember that picking the analysis that gives you the results you want is not science, it is pseudo-science at best, and is increasingly viewed as scientific misconduct. You should be able to justify your analysis choices independently of the results the produce, and, in fact, they should have been decided upon and finalized before you even looked at the data. If that has not been done, you should, in reporting your results, disclose that fact, and characterize all your results as exploratory.

    Comment

    Working...
    X