Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sensitivity Analysis

    I have the variable age which I want to categorize, but I'm not fully sure which version of the categorization I should use. I choose these 2 versions based on past research and what was used in them.

    Age is continuous on the interval [15 - 90]
    I categorize this as:
    Age_cat_1:
    Less than 20 - young
    20- 50 = old
    > 50 = really old

    Age_cat_2:
    Less than 40 - young
    > 40 = old

    Then my main analysis is whether age has an association with math scores (continuous variable), after factoring for differences in sex, high school and English scores

    Under model 1:
    ologit Age_cat_1 Mathscores i.sex i.high_school English

    Under model 2:
    ologit Age_cat_2 Mathscores i.sex i.high_school English

    Let's say both models are statistically significant - i.e. math scores are clearly statistically significant predictors of age.
    Now how do I determine which model to use for the age categorization?

  • #2
    Why do you want to categorize (discretize) age? Wouldn't the answer to that question guide your choice between the two?

    Comment


    • #3
      A couple of comments:
      • Your ologit models assume that your age depends on your mathscore. So if you feel you are old and want to change that, you should study math and you automatically become younger. If that were true, the math departments in universities would look very different than they do now... So, what you want to explain instead is the mathscore, and how that depends on the (categorized) age.
      • "Statistically significant" is the beginning of an analysis not the end. We humans are very good at seeing patterns in random noise, so we need a procedure to protect ourselves against over-interpreting patterns in the data when none exist. That is the purpose of statistical tests. So whether both are significant or not is not really relevant for this question.
      • Age in that range will in all likelihood have a non-linear effect, so just adding age linearly is in all likelihood not going to fit well. But that does not mean that categorizing age is any better. You will throw away a lot of information by doing so. Instead you could look at help npregress.
      Last edited by Maarten Buis; 07 Dec 2017, 01:41.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thanks for the information.
        Age is just an example here.
        The actual variable is a continuous financial variable. I just thought it would take too much time to describe the actual financial variable, so I used a simple example of a continuous variable such as age.


        It isn't quite as simple as just thinking about which one fits better or is the better choice. Both categorized versions of age are appealing to me and have equal weightings in which one I choose for the model.

        The goal would be for me to determine how a change in the limits that I use for age would have an impact on the statistical analysis.
        i.e.
        Categorization 1:
        Age_cat_1:
        Less than 20 - young
        20- 50 = old
        > 50 = really old

        Age_cat_2:
        Less than 40 - young
        > 40 = old


        What is the impact on my model by changing the categorization to 40. How sensitive is the model to changes in this limit? This test will give me a better idea of what I can change my limits to if I wanted to.
        Is there any statistical test to inspect this?

        Comment


        • #5
          It is usually better to stick to the variables you care about, otherwise you will get answers that do not apply to you.

          So there is a mysterious continuous variable that can legitimately be used as a dependent variable (so it cannot be age). You want to categorize it (which you should not do, as that way you throw away information, and throwing away information is bad). You have two ways of categorizing and you want to figure out which one is "best".

          Simplest way of doing this (wrong) task is to make a third categorization in which both are nested. Now you can impose constraints on that model and see what happens.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment

          Working...
          X