Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Explanation for Why Quantile Regression Estimates Are Very Small (Discrete Dependent Variable)

    Hi,

    I'm using quantile regression to examine the distribution of a self-reported subjective well being variable. My dependent variable is "HAPPY" ordered on a scale of 1-10. I know quantile regression is supposed to be used with continuous dependent variables however, this method is increasingly being used in the subjective well being literature and so trying to learn more about the method adn why.

    For example: Martin Binder & Alex Coad, 2010. "Going Beyond Average Joe's Happiness: Using Quantile Regressions to Analyze the Full Subjective Well-Being Distribution,"Papers on Economics and Evolution 2010-10, Philipps University Marburg, Department of Geography.

    I'm trying to understand the reasoning behind why under certain circumstances quantile regression gives strange results with discrete data like this. I have the following code, all my independent variables are dummies.

    Code:
     xi: qreg2 HAPPY i.mars1 i.mars2 i.mars3 i.mars4 i.mars5 i.employ7 i.employ8 i.male i.eth1, quantile(.5)
    For quantiles (0.5, 0.9) I seem to be getting strange parameter estimates of -1.26e-10 or -1 and p values of 1. I've attached an image of the output below. Can anyone provide some guidance on this?


    Many thanks.





    Click image for larger version

Name:	Screen Shot 2017-08-25 at 02.12.15.png
Views:	2
Size:	137.3 KB
ID:	1407747
    Attached Files

  • #2
    Dalia:
    - the "weird" results might be due to different scale of coefficients;
    - as per FAQ, please post what you got from Stata via CODE delimiters (screenshots are difficult to read and comment on);
    - as per AFQ again, posting and example/excerpt of your data via -dataex- is the best way to let other listers delving into your query;
    - -xi- prefix is redundant if your Stata release is reasonably recent (say from 10 on).

    As a final aside, it's good to know that geographer are so versatile (admittedly, I have some suspects about that reading Nick's excellent replies!)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Personally I would advise against using quantile regression for discrete outcomes. (1) Inference for quantile regression is based on asymptotic normality. But the quantile estimators are not asymptotically normally distributed when the outcome is discrete. The quantile function is a step function for these outcomes. Quantiles in the middle of the "flat segments" are superefficient (converge faster than square root n), quantiles right at a jump are not even consistent. (2) The assumption of linearity for the conditional quantiles cannot be justified. For instance, even a linear Poisson regression model implies nonlinear conditional quantile functions.

      This explains your results: all the coefficients are round values (plus minus some small numerical imprecision) because for quantile regression the estimated line goes exactly through at least k observations; with discrete outcomes these interpolated observations take all round values. And the values must be in the middle of "flat segments" such that the s.e. are estimated to be very small. But these s.e. are not consistent anyway.

      Personally I would suggest using "distribution regression" instead of "quantile regression". This is an old and simple idea: simply estimate one binary regression such as logit or probit for each level of the outcome. This has already be suggested for ordered outcomes in 1972 by Williams, O. D., Grizzle, J. E., 1972. Analysis of contingency tables having ordered response categories. Journal of the American Statistical Association 67 (337), 55–63.
      If you are interested in reporting and making inference about quantiles after having estimated distribution regressions, then you can use the methods that we have developed in a recent paper: "Generic inference on quantile and quantile effect functions for discrete outcomes". https://arxiv.org/abs/1608.05142

      Comment


      • #4
        Dalia: In addition to Carlo's and Blaise's comments, I would suggest that you take a look at the paper "Quantiles for Counts" by José Machado and Joao Santos Silva (JASA, December 2005) in which the authors discuss a particular form of jittering the outcome data (to "smooth" it) and how such jittering enables quantile-regression identification of various parameters that may be of interest.

        Comment


        • #5
          Thank you all for your help.

          Blaise Melly could you clarify something for me? I'm exploring quantile regression because some papers on subjective well being suggest that theoretically life satisfaction could be treated as a continuous variable
          We treat life satisfaction as a continuous variable rather than an ordered variable as studies have shown there is little difference to (Ferrer-I-Carbonell and Paul Frijters, 2004) results in treating life satisfaction as a continuous or ordered variable but allowing for fixed effects is important.
          .

          However, I don't really understand how this assumption ties in practically with quantile regression because the actual data points are discrete. So, as you mentioned it brings issues with estimating coefficients and standard errors. But many papers seem to be reporting quantile regression estimates with discrete life satisfaction data using boostrapped standard errors and their estimates are not round values and p values are not 1. So technically should these reported standard error estimates be inconsistent, given that it's quantile regression with discrete data? Should the coefficient estimates be taken with caution?

          Comment


          • #6
            Dear Dalia Su,

            Originally posted by Dalia Su View Post
            I'm exploring quantile regression because some papers on subjective well being suggest that theoretically life satisfaction could be treated as a continuous variable.
            I do not know this literature. But what is clear is that, in all the papers that I know, continuity of the dependent variable is an assumption to get asymptotic normality of the QR estimator. To the best of my knowledge, all existing methods to make inference on QR coefficients are based on this continuity assumption. You mention the bootstrap but it is known that the bootstrap does not mimic the asymptotic distribution of the sample quantiles when the outcome is discrete, see "Bootstrapping sample quantiles of discrete data" by Carsten Jentsch and Anne Leucht in the Annals of the Institute of Statistical Mathematics 2016. Sample quantiles are special cases of QR (with only a constant). I do not think that adding regressors can help.
            Of course, finite-sample data are always discrete. It may be that assuming continuity is not too problematic if there are many points in the support. This is an empirical question.

            Originally posted by Dalia Su View Post
            But many papers seem to be reporting quantile regression estimates with discrete life satisfaction data using boostrapped standard errors and their estimates are not round values and p values are not 1.
            I should have been more precise. All QR line will go through at least k observations (residual=0 for these k observations). Since the observed values are round values, it means that the fitted values will be round values for at least these k observations. If, as it is the case in your regression, the regressors consists of categorical indicator variables, then the coefficients themselves will be round values (I should have added this sentence in my first reply).
            Take a trivial example: We have only observations with age=20 or age=23. Assume that the median is 2 for the observations with age=20 and the median is 3 for the observations with age=23. If you regress the outcome on a constant and an indicator variable for being 23, then the coefficient of the median regression on the constant will be 2 and the coefficient on age23 will be 1. These are round values. If you estimate the same model with a constant and a variable age in years, then the constant will be -4.6667 and the slope will be 0.3334. So, whether the coefficients themselves are round values depend on the parametrization of the regressors.
            Anyway, this was not the main point. The main points: with discrete outcomes (1) existing inference for QR is not consistent, (2) I do not see how the conditional quantile function could be linear (except in fully saturated models).

            Comment

            Working...
            X