Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to use quintiles in the dataset to create variables

    Hello every one
    My inquiry is
    How to create a variable like ((VAR: It is an explanatory variable, equals 1 for observations in the lower quintile of tax
    aggressiveness and 0 for other observations)). which means how to use quintiles to creates a dummy variable ?


  • #2
    -help xtile-. Or, if you have installed -egenmore- from SSC, there is also -egen, xtile()-.

    That said, creating discrete variables from continuous ones is usually a very bad idea. It discards information and introduces noise into the analysis. Also, results are sometimes sensitive to whether one uses, say, quintiles, or deciles, or terciles, etc. I realize that donig this is very popular in many disciplines (unfortunately including my own, epidemiology) but it makes for bad data analysis and can easily underlie misleading conclusions. Unless you are under duress to do that, or unless there is really convincing external evidence that the bottom quintile differs drastically and qualitatively in your study outcome from the rest of the pack, please don't do it.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      -help xtile-. Or, if you have installed -egenmore- from SSC, there is also -egen, xtile()-.

      That said, creating discrete variables from continuous ones is usually a very bad idea. It discards information and introduces noise into the analysis. Also, results are sometimes sensitive to whether one uses, say, quintiles, or deciles, or terciles, etc. I realize that donig this is very popular in many disciplines (unfortunately including my own, epidemiology) but it makes for bad data analysis and can easily underlie misleading conclusions. Unless you are under duress to do that, or unless there is really convincing external evidence that the bottom quintile differs drastically and qualitatively in your study outcome from the rest of the pack, please don't do it.
      Clyde Schechter first of all thank you so much.
      secondly, I dont prefer to use it but i saw some authors created a binary variable from the continuous one. So, I posited this question.
      i will not do that
      thank you so much

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        -help xtile-. Or, if you have installed -egenmore- from SSC, there is also -egen, xtile()-.

        That said, creating discrete variables from continuous ones is usually a very bad idea. It discards information and introduces noise into the analysis. Also, results are sometimes sensitive to whether one uses, say, quintiles, or deciles, or terciles, etc. I realize that donig this is very popular in many disciplines (unfortunately including my own, epidemiology) but it makes for bad data analysis and can easily underlie misleading conclusions. Unless you are under duress to do that, or unless there is really convincing external evidence that the bottom quintile differs drastically and qualitatively in your study outcome from the rest of the pack, please don't do it.
        kindly professor Clyde Schechter could you please give me a reference that creating discrete variables from continuous ones may distort the results ?

        Comment


        • #5
          See Frank Harrell's take on this at https://www.fharrell.com/post/errmed/#catg. There he focuses specifically on dichotomies, but the exact same reasoning applies to any categorization with a small number of categories.

          Comment


          • #6
            Although I have been thumping the drum on "Is quantile binning really a good idea?" for a long while, I think there is an interesting difference between one facet of a standard biomedical setting and a standard business setting.

            In particular, the clinical consequences of say BMI, blood pressure, cholesterol aren't better analysed by knowing also, or instead, where a patient belongs on any of those measures in (bins of) a series of quantiles. (This is a standard point, but I think I first saw made by Frank Harrell somewhere.)

            However, the worst or best performing firms on some measure are usually to be considered in the context of a market with competition between firms (or cartels, or whatever, but please don't assume I am any kind of expert here) accordingly. So, this firm being better (worse) than others may be part of what being analysed.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              See Frank Harrell's take on this at https://www.fharrell.com/post/errmed/#catg. There he focuses specifically on dichotomies, but the exact same reasoning applies to any categorization with a small number of categories.
              thank you so much.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Although I have been thumping the drum on "Is quantile binning really a good idea?" for a long while, I think there is an interesting difference between one facet of a standard biomedical setting and a standard business setting.

                In particular, the clinical consequences of say BMI, blood pressure, cholesterol aren't better analysed by knowing also, or instead, where a patient belongs on any of those measures in (bins of) a series of quantiles. (This is a standard point, but I think I first saw made by Frank Harrell somewhere.)

                However, the worst or best performing firms on some measure are usually to be considered in the context of a market with competition between firms (or cartels, or whatever, but please don't assume I am any kind of expert here) accordingly. So, this firm being better (worse) than others may be part of what being analysed.
                I got your point. But, I actually created a binary variable from my continuous variable and find no evidence while with the continuous one i find evidence. my topic is (CEO turnover and tax avoidance). coded 1 if a firm observation occurs in the upper quintile, zero otherwise. kindly do you suggest anything?

                Comment


                • #9
                  Who knows? An optimistic interpretation is that you didn't find an effect that doesn't exist. A pessimistic interpretation is that coarsening the data made it harder to find anything. If a predictor has an influence, why expect that the form of the influence is a step function?

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Who knows? An optimistic interpretation is that you didn't find an effect that doesn't exist. A pessimistic interpretation is that coarsening the data made it harder to find anything. If a predictor has an influence, why expect that the form of the influence is a step function?
                    I cant answer. But I think that creating a dummy variable from continuous variable is just kind of manipulation. because if the real data of a variable does not influence the dependent variable so how and why the dummy one does? especially when the variable nature is ranging (-1 to +1). and some observations will classify as 0 while it is not vice versa. i just think loudly

                    Comment

                    Working...
                    X