Quantile Regression with Clustered Standard Errors and Factor Variables

Arne RW

Join Date: Jun 2015

Posts: 10
#1

Quantile Regression with Clustered Standard Errors and Factor Variables

12 Jun 2015, 09:04

Hello everyone,

For my thesis, I'm trying to do a quantile regression on an income variable called yearlyincome for 2 groups separately. I would like to do this quantile reg using both factor variables and clustered standard errors, as I'm using panel data.

From what I know, there are multiple options for this:

- Using qreg command: this allows for quantiles and factor variables, but not clustered standard errors.
- Using qreg2 command: this allows for quantiles and clustered standard errors, but not factor variables.
- Using reg: which allows for quantiles, clustered standard errors and factor variables. The normal reg command therefore seems favorite, but you would need to separate your income variable with the xtile command into quantiles first.

So basically, I would like to use the normal reg command to do this as it allows everything I want (factor vars and clustered standard errors). I used the xtile command to divide my data into 4 quantiles and that's all fine.

The problem is, I wanted to see if I was doing things right and decided to run both a quantile regression using the qreg command, as well as a quantile regressions using the reg command and doing it on one of the created quantiles. So for clarification, I first created a variable to indicate quantiles:

Code:

xtile Quantiles = yearlyincome, nq(4)

Then, I ran both regressions:

Code:

reg yearlyincome age88 age88sq i.civstatus i.education potworkexp if Quantiles==1 qreg yearlyincome age88 age88sq i.civstatus i.education potworkexp, quantile(.25)

This should yield the same results right? I figured if I created 4 quantiles (I want the .25th percentile, the .50th, and .75th), quantile no. 1 that is created with the xtile command is the .25th percentile right? Hence I thought the above command should give the exact same results, as I'm estimating the first quantile in both.

However, for some reason they don't. The thing is, I have a valid sample size of 13,713 observations. The reg command only takes 2,611 observations into account, while the qreg command takes into account all 13,713 observations.

So from my thinking, that's the reason they're not yielding the same results. But why not? If anyone could tell me, that would be greatly appreciated!

Best,

Last edited by Arne RW; 12 Jun 2015, 09:10.
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

12 Jun 2015, 14:56

Dear Arne,

I am afraid you are making a very common mistake. The command -reg- runs an OLS regression which estimates (an approximation to) the conditional mean. The commands -qreg- and -qreg2- run quantile regressions which estimate (an approximation to) a conditional quantile. The important thing to keep in mind is that a conditional quantile is not the conditional mean of some sub-sample of your data. In fact, quantiles have nothing to do with means, except that they are all provide information on the location of the distribution. Therefore, you really cannot use -reg- to estimate quantiles.

So, if you want to estimate a quantile regression with clustered standard errors you will have to use -qreg2-. You are right in saying that it does not allow you to use factor variables, but that is not a problem because you can start by using -xi- to create all the variables you need, and then use -qreg2- with the variables you just created. Alternatively, you can simply use

Code:

xi: qreg2 yearlyincome age88 age88sq i.civstatus i.education potworkexp, quantile(.25)

Hope this helps and thanks for your interest in -qreg2-.

Joao

Last edited by Joao Santos Silva; 12 Jun 2015, 14:59.
Comment
Arne RW

Join Date: Jun 2015

Posts: 10
#3

13 Jun 2015, 08:48

Dear Joao,

Thanks very much for your quick and helpful reply.

Now that you mention it, it indeed seems unlogical to use the -reg- command for estimating the quantiles since it estimates means. I was thinking I could use it because I first separated the yearlyincome variable into quantiles. But if I understand it right now, using the -reg- command with an if statement to run the regression on the right quantile, it would still estimate the means of that quantile, right?

I've just tried using the -xi- command together with the -qreg2- command and it works perfectly fine.

Thanks again for your help - I didn't know you were on of the creators of -qreg2-, but good job!

Arne
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#4

13 Jun 2015, 10:13

Dear Arne,

What you where doing was estimating the conditional mean for the observations in a certain unconditional quantile, which is unlikely to provide interesting or meaningful results. Surprisingly, a lot of people think that is how quantile regression is performed.

Thanks for your feedback on -qreg2- I am glad you found it useful. If you end up using the clustered standard errors, please cite:

Parente, P.M.D.C. and Santos Silva, J.M.C. (2016), Quantile Regression with Clustered Data, Journal of Econometric Methods, forthcoming.

Best of luck,

Joao
Comment
Arne RW

Join Date: Jun 2015

Posts: 10
#5

18 Jun 2015, 03:58

Dear Joao,

Like you said, many people misinterpret quantile regression. I myself have some trouble interpreting the results as well.

Let's assume, using the formula you provided above, I get coefficients like this (I've left some out as interpretation should be the same for all) for the .25 quantile, assuming dollar amounts:

Code:

_cons 65,932 age88 -4,681 civstatus 19,488 potworkexp 6,290

How would I properly interpret these results (without considering significance, standard errors or CIs)? Does it mean people at the .25th percentile have a yearly income of $65,932, when the rest of the variables are equal to zero? And a one-unit change in civstatus would increase this yearly income by $19,488 for people at the .25h percentile? Or does that count for people up to and including the .25th percentile, in other words, entire range from 0 - .25th percentile? I would guess these are the estimates for people at at .25th percentile, but I would like to be sure.

Thanks in advance, and I will cite you properly in my final thesis.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#6

18 Jun 2015, 15:08

Dear Arne,

Think about how you would interpret the OLS results: what you get are estimates of (an approximation to) the conditional mean, and the intercept tells you the location of the mean when the regressors are zero and the slopes tell you how the mean changes when you change the regressors. So, all of this if about the mean of the distribution, not about people at the mean (for instance, for a binary variable the mean is generally between 0 and 1, and therefore the variable is never equal to the mean).

For quantile regression the reasoning is the same: you get information about where the quantile is located and how it shifts with the regressors. So, it is not about people in a certain region, is about the location of the quantile.

One final note about terminology: for a continuous variable, the quantile is a point, not a range. For example, for a uniform (0,1) variable, the first quartile is 0.25, not the observations between 0 and 0.25, OK?

All the best,

Joao
Comment
Marcos Gonzalez

Join Date: Nov 2015

Posts: 36
#7

30 Nov 2015, 03:36

Dear Joao,

I would like to know how to interpret the Parente-Santos Silva test for intra-cluster correlation. What do I need in this test? Do I want to accept or reject the null hypothesis? becuase it is not clear to me. I would appreciate yor answer.

Thanks in advance
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#8

30 Nov 2015, 06:55

Hi Marcos,

In the future, please open a new thread for a new question, OK?

Anyway, the null hypothesis of the test is that there is no intra-cluster correlation. Whether you want to accept or not is up to you

Please do let me know if you have further questions,

Joao
Comment
Marcos Gonzalez

Join Date: Nov 2015

Posts: 36
#9

30 Nov 2015, 16:15

Thanks a lot for your answer, I do not know if I should re ask in this post or open a new thread. I am analyzing the determinants of debt maturity for a set of European countries. Therefore i am using quantile regressions clustered by countries but I do not know from an econometric point of view if it is good or not the existence of intra cluster correlation in my case. In all the cases I obtain p values that reject the null hypothesis, so there exists intra cluster correlation and I do not know how to interpret that in my case.

Thanks in advance
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#10

30 Nov 2015, 16:44

If you have intra-cluster correlation it is safer to use the cluster-robust standard errors. That's essentially what you have to do. You may also try to reduce or eliminate the correlation by changing the specification of the model, if that is feasible.

All the best,

Joao
Comment
Fortune Ganda

Join Date: Jan 2017

Posts: 3
#11

15 Jan 2017, 21:36

Hello everyone

I also need help. I am completing panel quantile regression and I am only getting coefficients of the independent variables only. The pvalues, standard error and confidence intervals are not generated.

I used command . qregpd LogEmissions LogGDP LogGDP2 Indu manu Trade Apopn, id(Country) fix(Year) quantile(0.7)

Please help, is it the right command???
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#12

16 Jan 2017, 11:10

Dear Fortune,

You should open a new thread for this because you are talking about something totally different.

Best wishes,

Joao
Comment
Aamina Khurraa

Join Date: Sep 2019

Posts: 87
#13

26 Sep 2019, 03:52

Dear Joao

In #3. you mentioned xi:qreg2 code. can you suggest, what does this xi addition means here?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#14

26 Sep 2019, 14:01

It allows the use of factor notation.
Comment
Mirzakishi Aliyev

Join Date: Feb 2020

Posts: 8
#15

04 Mar 2020, 02:55

Good morning prof. Santo Silva. I have following codes for run quantile regression. But I try it to run for 0.1 quantile it returns error. Codes and error given below. I would be appreciate any help. Thank in advance!

global prodvars logY3a logY4a logY3asq logY4asq logY34a logY3apk logY4apk logY3apl logY4apl logY3aequity_ logY4aequity_ logpk logpl logequity_ logpksq logplsq logequity_sq logpkpl logequity_pk logequity_pl

global envars logOffBalance Z1a Z2 Z12 Z13 Z14 Z15 NPLsh2001 NPLsh2002 NPLsh2003 NPLsh2004 NPLsh2005 NPLsh2006 NPLsh2007 NPLsh2008 NPLsh2009 NPLsh2010 NPLsh2011 NPLsh2012 NPLsh2013

global annvars fukushima2002 fukushima2003 fukushima2004 fukushima2005 fukushima2006 fukushima2007 fukushima2008 fukushima2009 fukushima2010 fukushima2011 fukushima2012 fukushima2013 d2002 d2003 d2004 d2005 d2006 d2007 d2008 d2009 d2010 d2011 d2012 d2013

global ylist logtc

global xlist $prodvars $envars $annvars

describe $ylist $xlist
summarize $ylist $xlist

qreg2 $ylist $xlist, quantile (.1)

convergence not achieved.
VCE computation failed; try increasing the maximum number of iterations or try bsqreg
Comment

Announcement

Quantile Regression with Clustered Standard Errors and Factor Variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment