Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Decimal points in regression

    Hi, apologies I am very new to STATA.

    I am running a logistic regression where my dependent variable is 'depression' and one of my independent variables is income. In my data-set, income is 'income category midpoint' and has values such as 330.5, 380.5 etc. When I run the regression (treating income as continuous), is it fine that the values are stored as having decimal points.
    In subsequent interpretation, would I still for example say 'A one $ increase in income increases the probability of depression..' even though I have decimal points?

    Thanks!

  • #2
    Hello Gergana,

    Welcome to the Stata Forum/ Stata List

    Precision is part an parcel of a statiscal anaysis. There is no point in getting rid if that.

    You may wish to take a look at the types of variables stored in Stata: byte, int, long, float and double. It seems float and double are mostly related to your query.

    Yes, the interpretation won't change (much) because of slight differences in the decimal points, after you have performed the commands. That said, before the estimations, in terms of regression analysis, for example, the higher the precision, the better.

    Also, depending on the range of the variable (and it seems yours has a large range), taking it as continuous usually works fine.
    Last edited by Marcos Almeida; 19 Feb 2017, 07:59.
    Best regards,

    Marcos

    Comment


    • #3
      Gergana:
      beware of the risk of reversal causality (endogeneity) in your regression model.
      At a very first sight, -income- may explain variation in probability of being depressed, but being or not depressed may explain variation in income.
      Last edited by Carlo Lazzaro; 19 Feb 2017, 08:46.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        would I still for example say 'A one $ increase in income increases the probability of depression..' even though I have decimal points?
        Carlo's point about reverse causality is quite important, and you could not properly say this even if your income variable took only integer values.

        But if you strip out the causal language, the statement would be just as valid with values like 330.5 and 380.5 as it would with 330 and 331, etc. In fact, in most applications of regression analysis, the continuous predictor variables take on non-integer values and we still often interpret coefficients by noting the change in outcome associated with a 1 unit change in the predictor.

        So, what you could say is: a $1 difference in income is associated with X percentage points higher probability of depression.

        Comment

        Working...
        X