Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Association between binary exposure and numerical outcome

    Good morning!

    I have been struggling to tell what's the most appropriate way to look at association between binary exposure and numerical outcome.

    The primary goal of my analysis is to look at distribution of CRP levels in a multi-ethnic study group of men and women and see whether differences exist.


    My plan was compare two means in each group (race/gender), with use of t-test. For both gender and race I got results supporting null hypothesis of no difference.
    To assess the confounding effects of potential confounders on the relationship between race/gender and CRP I wanted to use linear regression analyses using log-transformed CRP concentrations.

    I log transformed CRP levels as data was positively skewed when looking at histogram.

    dependent variable- crp levels
    independent variable- race or gender

    Now, I am confused thinking linear regression is not the right choice (since I have categorical binary exposures a)gender, b)race )? Am I right?

    I was thinking of using logistic regression instead, but everywhere I read is to be used with dichotomous outcomes...which is not true in my case.

    Hope to hear from someone soon. Thank you.

    Jana


  • #2
    It is fine to use linear regression with a continuous outcome and a categorical predictor variable. It's done all the time. If there are no other variables in the model it is equivalent to doing a t-test (or if the predictor has more than two levels, ANOVA).

    So your code would look like:

    Code:
    regress crp i.gender and_perhaps_other_variables
    regress crp i.race and_perhaps_other_variables
    And you have correctly understood in your reading that logistic regression is used for a dichotomous outcome variable.

    I'm not going to comment specifically on whether it is appropriate to log-transform this data. But let me say that it makes no sense to do so for the regressions but not for the earlier t-tests. A regresion like -regress crp.i.gender- is no different from -ttest crp, by(gender)- statistically. So if a log transform is needed, it is needed in both cases. If it is not needed for either one, then it is also unnecessary for the other.

    Comment


    • #3


      Thank you so much for your quick response Clyde. I have been told it is important to look at the distribution first to see if it is appropriate to use linear regression. As I said before data is positively skewed (wanted to attached photo of histogram, but somehow it does not work). In what case would it be inappropriate to use linear regression then?

      Comment


      • #4
        For these purposes, the distribution of crp itself is not relevant. What may matter is the distribution of the residuals after you do the regression. In the classical theory of linear regression the residuals are assumed to have normal distributions in order to get the t-distribution for coefficient/standard error. (And the exact same logic applies to the distributions of crp in each group separately (but not the overall distribution) for the classical Student t-test.)

        But, in fact, even if the distributions are not normal, the central limit theorem comes to the rescue and you still get, asymptotically, the same distribution. So the question is whether your sample is large enough. If you have only small numbers in each gender or race-based group that you are comparing, then you cannot directly apply the t-test (or the regression aproach). You would either have to find a transformation (logarithm might work, there are other options as well) that makes the within-group distributions closer to normal, or you could use non-parametric approaches such as the Wilcoxon rank-sum test, or, for multiple groups, Kruskal-Wallis ANOVA. But if your samples are hefty, then you don't have to go through that rigmarole--just do the regression straight out.

        Comment


        • #5
          Thank you very much again Clyde. It is much clearer for me now!

          Comment

          Working...
          X