Sensitivity Analysis

Senor Edward

Join Date: Feb 2016

Posts: 29
#1

Sensitivity Analysis

06 Dec 2017, 22:34

I have the variable age which I want to categorize, but I'm not fully sure which version of the categorization I should use. I choose these 2 versions based on past research and what was used in them.

Age is continuous on the interval [15 - 90]
I categorize this as:
Age_cat_1:
Less than 20 - young
20- 50 = old
> 50 = really old

Age_cat_2:
Less than 40 - young
> 40 = old

Then my main analysis is whether age has an association with math scores (continuous variable), after factoring for differences in sex, high school and English scores

Under model 1:
ologit Age_cat_1 Mathscores i.sex i.high_school English

Under model 2:
ologit Age_cat_2 Mathscores i.sex i.high_school English

Let's say both models are statistically significant - i.e. math scores are clearly statistically significant predictors of age.
Now how do I determine which model to use for the age categorization?
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

07 Dec 2017, 01:10

Why do you want to categorize (discretize) age? Wouldn't the answer to that question guide your choice between the two?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#3

07 Dec 2017, 01:39

A couple of comments:
Your ologit models assume that your age depends on your mathscore. So if you feel you are old and want to change that, you should study math and you automatically become younger. If that were true, the math departments in universities would look very different than they do now... So, what you want to explain instead is the mathscore, and how that depends on the (categorized) age.

"Statistically significant" is the beginning of an analysis not the end. We humans are very good at seeing patterns in random noise, so we need a procedure to protect ourselves against over-interpreting patterns in the data when none exist. That is the purpose of statistical tests. So whether both are significant or not is not really relevant for this question.

Age in that range will in all likelihood have a non-linear effect, so just adding age linearly is in all likelihood not going to fit well. But that does not mean that categorizing age is any better. You will throw away a lot of information by doing so. Instead you could look at help npregress.

Last edited by Maarten Buis; 07 Dec 2017, 01:41.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Senor Edward

Join Date: Feb 2016

Posts: 29
#4

08 Dec 2017, 22:57

Thanks for the information.
Age is just an example here.
The actual variable is a continuous financial variable. I just thought it would take too much time to describe the actual financial variable, so I used a simple example of a continuous variable such as age.

It isn't quite as simple as just thinking about which one fits better or is the better choice. Both categorized versions of age are appealing to me and have equal weightings in which one I choose for the model.

The goal would be for me to determine how a change in the limits that I use for age would have an impact on the statistical analysis.
i.e.
Categorization 1:
Age_cat_1:
Less than 20 - young
20- 50 = old
> 50 = really old

Age_cat_2:
Less than 40 - young
> 40 = old

What is the impact on my model by changing the categorization to 40. How sensitive is the model to changes in this limit? This test will give me a better idea of what I can change my limits to if I wanted to.
Is there any statistical test to inspect this?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#5

11 Dec 2017, 02:21

It is usually better to stick to the variables you care about, otherwise you will get answers that do not apply to you.

So there is a mysterious continuous variable that can legitimately be used as a dependent variable (so it cannot be age). You want to categorize it (which you should not do, as that way you throw away information, and throwing away information is bad). You have two ways of categorizing and you want to figure out which one is "best".

Simplest way of doing this (wrong) task is to make a third categorization in which both are nested. Now you can impose constraints on that model and see what happens.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment

Announcement

Sensitivity Analysis

Comment

Comment

Comment

Comment