how to use quintiles in the dataset to create variables

ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#1

how to use quintiles in the dataset to create variables

03 Feb 2021, 13:44

Hello every one
My inquiry is
How to create a variable like ((VAR: It is an explanatory variable, equals 1 for observations in the lower quintile of tax
aggressiveness and 0 for other observations)). which means how to use quintiles to creates a dummy variable ?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

03 Feb 2021, 14:10

-help xtile-. Or, if you have installed -egenmore- from SSC, there is also -egen, xtile()-.

That said, creating discrete variables from continuous ones is usually a very bad idea. It discards information and introduces noise into the analysis. Also, results are sometimes sensitive to whether one uses, say, quintiles, or deciles, or terciles, etc. I realize that donig this is very popular in many disciplines (unfortunately including my own, epidemiology) but it makes for bad data analysis and can easily underlie misleading conclusions. Unless you are under duress to do that, or unless there is really convincing external evidence that the bottom quintile differs drastically and qualitatively in your study outcome from the rest of the pack, please don't do it.
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#3

04 Feb 2021, 12:25

Originally posted by Clyde Schechter View Post

-help xtile-. Or, if you have installed -egenmore- from SSC, there is also -egen, xtile()-.

That said, creating discrete variables from continuous ones is usually a very bad idea. It discards information and introduces noise into the analysis. Also, results are sometimes sensitive to whether one uses, say, quintiles, or deciles, or terciles, etc. I realize that donig this is very popular in many disciplines (unfortunately including my own, epidemiology) but it makes for bad data analysis and can easily underlie misleading conclusions. Unless you are under duress to do that, or unless there is really convincing external evidence that the bottom quintile differs drastically and qualitatively in your study outcome from the rest of the pack, please don't do it.

Clyde Schechter first of all thank you so much.
secondly, I dont prefer to use it but i saw some authors created a binary variable from the continuous one. So, I posited this question.
i will not do that
thank you so much
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#4

05 Feb 2021, 09:55

Originally posted by Clyde Schechter View Post

-help xtile-. Or, if you have installed -egenmore- from SSC, there is also -egen, xtile()-.

That said, creating discrete variables from continuous ones is usually a very bad idea. It discards information and introduces noise into the analysis. Also, results are sometimes sensitive to whether one uses, say, quintiles, or deciles, or terciles, etc. I realize that donig this is very popular in many disciplines (unfortunately including my own, epidemiology) but it makes for bad data analysis and can easily underlie misleading conclusions. Unless you are under duress to do that, or unless there is really convincing external evidence that the bottom quintile differs drastically and qualitatively in your study outcome from the rest of the pack, please don't do it.

kindly professor Clyde Schechter could you please give me a reference that creating discrete variables from continuous ones may distort the results ?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#5

05 Feb 2021, 12:44

See Frank Harrell's take on this at https://www.fharrell.com/post/errmed/#catg. There he focuses specifically on dichotomies, but the exact same reasoning applies to any categorization with a small number of categories.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#6

06 Feb 2021, 06:36

Although I have been thumping the drum on "Is quantile binning really a good idea?" for a long while, I think there is an interesting difference between one facet of a standard biomedical setting and a standard business setting.

In particular, the clinical consequences of say BMI, blood pressure, cholesterol aren't better analysed by knowing also, or instead, where a patient belongs on any of those measures in (bins of) a series of quantiles. (This is a standard point, but I think I first saw made by Frank Harrell somewhere.)

However, the worst or best performing firms on some measure are usually to be considered in the context of a market with competition between firms (or cartels, or whatever, but please don't assume I am any kind of expert here) accordingly. So, this firm being better (worse) than others may be part of what being analysed.
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#7

06 Feb 2021, 10:39

Originally posted by Clyde Schechter View Post

See Frank Harrell's take on this at https://www.fharrell.com/post/errmed/#catg. There he focuses specifically on dichotomies, but the exact same reasoning applies to any categorization with a small number of categories.

thank you so much.
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#8

06 Feb 2021, 10:48

Originally posted by Nick Cox View Post

Although I have been thumping the drum on "Is quantile binning really a good idea?" for a long while, I think there is an interesting difference between one facet of a standard biomedical setting and a standard business setting.

In particular, the clinical consequences of say BMI, blood pressure, cholesterol aren't better analysed by knowing also, or instead, where a patient belongs on any of those measures in (bins of) a series of quantiles. (This is a standard point, but I think I first saw made by Frank Harrell somewhere.)

However, the worst or best performing firms on some measure are usually to be considered in the context of a market with competition between firms (or cartels, or whatever, but please don't assume I am any kind of expert here) accordingly. So, this firm being better (worse) than others may be part of what being analysed.

I got your point. But, I actually created a binary variable from my continuous variable and find no evidence while with the continuous one i find evidence. my topic is (CEO turnover and tax avoidance). coded 1 if a firm observation occurs in the upper quintile, zero otherwise. kindly do you suggest anything?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#9

06 Feb 2021, 11:06

Who knows? An optimistic interpretation is that you didn't find an effect that doesn't exist. A pessimistic interpretation is that coarsening the data made it harder to find anything. If a predictor has an influence, why expect that the form of the influence is a step function?
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#10

06 Feb 2021, 11:33

Originally posted by Nick Cox View Post

Who knows? An optimistic interpretation is that you didn't find an effect that doesn't exist. A pessimistic interpretation is that coarsening the data made it harder to find anything. If a predictor has an influence, why expect that the form of the influence is a step function?

I cant answer. But I think that creating a dummy variable from continuous variable is just kind of manipulation. because if the real data of a variable does not influence the dependent variable so how and why the dummy one does? especially when the variable nature is ranging (-1 to +1). and some observations will classify as 0 while it is not vice versa. i just think loudly
Comment

Announcement

how to use quintiles in the dataset to create variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment