Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS model that includes related covariates that add up to 100%

    Hi group,
    I'm running an OLS model where the dependent variable is patients' average value of HgA1c, and among the predictors is a group of measures from the American Community Survey (ACS) that outputs percent of people in a census tract that have achieved various levels of education. I matched the patient census tract with ACS census tract for relevant years to obtain these measures, so these values represent the entire CT rather than the individual person (it's the best we can do - we don't have a measure that lists patient education experience). There are 5 of these measures - (1) % < high school education, (2) % high school education/GED, (3) % some college, (4) % bachelors degree, (5) % masters degree or higher. These five measures sum up to 100%. I don't have any experience modeling something like this where there's separate measures relating to the same thing. We discussed instituting a cut-point to create a indicator variable, such as high_school_educ = 1 if % high school education/GED is > 50% for that particular census tract, but then we get into the business of the 'choosing' of the cut point. Anyone have advice or experience working with something like this?
    Thanks.

  • #2
    I would simply enter four of those five variables into the regression model. (If you try to answer all five, one of them will be omitted due to colinearity--this is not a problem.) It is bad enough that the variable is already split into 5 groups instead of being provided as a continuous variable. If you further coarsen it with a cutpoint you will just add even more noise to your model.

    That said, if you find that, say, your first two variables have the same coefficient (or very nearly so) and the last two have a different but also common value, then that would justify inserting a cutpoint between those and combining the two lower and two upper categories. But absent some specific finding justifying coarsening of the variable, I advise you to leave it alone.

    Comment


    • #3
      Brian:
      it is always critical to translate in individual levels data that were collected at a wider level.
      The risk is to incurr in the so called ecological fallacy (https://en.wikipedia.org/wiki/Ecological_fallacy).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        I respectfully disagree with Carlo in #2. The use of ecological variables does not inevitably lead to fallacy, though it can.

        What is critical is, when discussing results, to acknowledge the ecological nature of the analysis. So your findings will be valid as regards what happens in census tracts with particular distributions of education, but the findings do not necessarily apply to individuals with particular levels of education. As long as you always make that clear, you will avoid fallacies.

        But if you begin to speak sloppily about effects of education, without specifying that you are talking about the distribution of education levels in a census tract, not an individual person's education, then you may well make misleading and erroneous conclusions.

        In fact, there are some circumstances where the distribution of a variable in a geographic region has a greater impact on person-level outcomes than the individual person's value of that variable does. In that case using individual level data would give an "individual" fallacy. The point is that you need to be crystal clear about whether you are making ecological claims or individual claims. Neither one automatically generalizes to the other. Both can be valid, even if they contradict each other. What is never valid is to treat an ecological result as individual or vice versa.

        Comment


        • #5
          Clyde is right: I should have been clearer.
          Obviously risk differs from certainty: what I woulf fear, assuming that a manuscript on Brian's analysis were submitted to a technical journal in his research field, is a reviewer posing a methodological issue against coupling variables collected at a different levels.
          It may well be, as Clyde suggested, that being clear in the Method section of the paper suffices to avoid potentially dramatic criticisms.
          That said, I particularly agree with Clyde's last sentence:
          What is never valid is to treat an ecological result as individual or vice versa.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you for your responses Clyde and Carlo. Very insightful.

            Comment

            Working...
            X