Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using quantile regression to model the distribution of the dependent variable

    Dear all,

    I have a dataset with a dependent variable y and three independent variables x1 x2 x3. Since my dependent variable y is U-shaped, an OLS regression is not sufficient to predict the values of y. Therefore, I want to predict the values with a quantile regression. Is there a command that predicts the y values accurately across the whole distribution of y?
    In the end I want to:
    1. get a variable ypred that has a similar density (U-shaped) as variable y (contrary to an OLS regression which in this case is normally distributed). So that kdensity ypred looks similar to kdensity y.
    2. generate a new variable which looks like this in case of an OLS regression: gen ystressed = _b[_cons] + _b[x1]*x1 + _b[x2]*x2 + _b[x3]*z with z being a fixed number.
    I know how to get those variables in case of an OLS regression but I am not sure how to get this variables in case of a quantile regression.

    Kind regards,
    Steffen



  • #2
    Hi Steffen,
    It isnt clear (to me) what is the final purpose of your strategy. It sounds a little bit like Machado Mata (2005) approach to estimate quantile decompositions.
    I have done something like that for a class before. Perhaps this is what you have in mind
    Code:
    use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
    drop if lnwage==.
    drop if exp(lnwage)>100
    
    gen wage=exp(lnwage)
    gen id=_n
    expand 99
    gen wage_hat=.
    bysort id:gen idq=_n
    forvalues q=1/99{
    local fq=`q'/100
    qui:qreg wage educ exper tenure female if idq==1, q(`fq') 
    capture drop aux
    predict aux
    replace wage_hat=aux if idq==`q'
    }
    two kdensity wage if idq==1 || kdensity wage_hat, legend(order(1 "original Wage distribution" 2 "Predicted Wage distribution"))
    Click image for larger version

Name:	Graph.png
Views:	2
Size:	43.8 KB
ID:	1489364


    Machado, J. A. and Mata, J. (2005), Counterfactual decomposition of changes in wage distributions using quantile regression. J. Appl. Econ., 20: 445-465. doi:10.1002/jae.788
    Attached Files

    Comment


    • #3
      Hi Fernando,
      thanks for your suggestion. What I try to aim for is to predict y as accurate as possible by using a quantile regression (not only the 5th percentile etc. but over the whole distribution).
      I think your proposal is doing exactly this. I sorted the dependent variable before generating the variable id to not only model the distribution but to also have a variable y_hat that is close to y.
      Furthermore, I want to see how the distribution/values of my predicted dependent variable change if I use the quantile regression coefficients and constant as well as the variables of my observations for x1 and x2. For x3 I plug in a fixed value. Why I am doing this? x3 is a macroeconomic variable and I want to predict the values of the dependent variable in a bad macroeconomic state.
      To get result, I included a generate command in the loop which I mentioned in the original post: ystressed = _b[_cons] + _b[x1]*x1 + _b[x2]*x2 + _b[x3]*`z’
      I tried to add this to the code. In this case x3 is the exper-variable I stressed so I can see the distribution of wages in the case that exper is the maximum value for all observations.

      Code:
      use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
      drop if lnwage==.
      
      drop if exp(lnwage)>100
      
      gen wage=exp(lnwage)
      quietly sum exper, detail
      local max=r(max)
      sort wage
      gen id=_n
      expand 99
      gen wage_hat=.
      gen wage_stressed=.
      bysort id:gen idq=_n
      forvalues q=1/99{
      local fq=`q'/100
      qui:qreg wage educ exper tenure female if idq==1, q(`fq')
      replace wage_stressed= _b[_cons] + _b[educ]*educ + _b[exper]*`max' + _b[tenure]*tenure + _b[female]*female if idq==`q'
      capture drop aux
      predict aux
      replace wage_hat=aux if idq==`q
      }
      two kdensity wage if idq==1 || kdensity wage_hat || kdensity wage_stressed, legend(order(1 "original Wage distribution" 2 "Predicted Wage distribution" 3 "Stressed Wage distribution"))
      I have a follow up question: why did you expand the observations by 99 and how can I reduce the number of observations to the original number in the end?

      Kind regards
      Steffen

      Comment


      • #4
        Hi Steffen
        In your original question you indicated you wanted to have a method to obtain the whole distribution of y conditional on X. using OLS, the only think you get its the conditional mean. If you would like to obtain the original distribution, you would also need to "impute" the value of the errors. assuming errors follow some distribution, and that you know which percentile of that distribution an observation belongs to.
        The alternative, as you imagine, is doing this by quantile regression, obtaining predictions for the conditional nth quantile conditional on X. We still have no information about the precise information on which conditional quantile a person belongs to. So my alternative was to obtain 99 quantiles (from the 1st to the 99th).
        If you only want 1 prediction per person, perhaps a good alternative would be to randomly assign a quantile to each person, and get his/hers prediction from that quantile regression.
        Code:
        use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
        drop if lnwage==.
        drop if exp(lnwage)>100
        gen wage=exp(lnwage)
        gen wage_hat1=.
        gen wage_hat2=.
        gen idq1=int(runiform()*99)+1
        gen idq2=int(runiform()*99)+1
        
        forvalues q=1/99{
        local fq=`q'/100
        qui:qreg wage educ exper tenure female , q(`fq') 
        capture drop aux
        predict aux
        replace wage_hat1=aux if idq1==`q'
        replace wage_hat2=aux if idq2==`q'
        }
        two kdensity wage  || kdensity wage_hat1 || kdensity wage_hat2, legend(order(1 "original Wage distribution" 2 "Predicted Wage distribution"))
        Just want to stress, the predicted values are good to see what happens with the overall distribution of wages, but not for wages of a single observation. In other words, it does not make sense to keep the yhat that is close to the original Y.

        Also two imputed/predicted wages will NOT be the same, since we have now a random component of which quantile a person belongs to.
        From a macro perspective, i think that is what you want.

        I think, however, that what you really are looking for is to see the impact of a macro variable on unconditional quantiles. You can look for that under that name or under RIF regressions.
        Fernando

        Comment


        • #5
          Hi Fernando,
          thanks for your help I appreciate it since I'm new to Quantile regressions. I try to express myself a bit clearer. Suppose I want to model the n-th-quantile given the three variables (x1, x2, x3), the regression equation is Yi = β(n)'xi + εin with i = the i-th observation and xi the covariate vector and β(n) the unknown vector of parameters and εin the error.
          One assumption is that Qn(εin) = 0 and εin is uncorrelated. Does this help?

          In the end I want to
          (1) conduct a research about in-sample and out-of-sample goodness of fit (for example by compiling a probability-probability plot) and comparing that to an OLS regression and
          (2) compare (to an OLS regression) how the predicted values react if I stress one covariate (e.g. plugging the vector of parameters and the observed values for x1,x2 as well as a stressed value for x3 in Yi = β(n)'xi + εin to obtain Y_hati)
          I think my first statement about only modelling the distribution was confusing.

          Regards,
          Steffen

          Comment

          Working...
          X