Quantile regression and interaction term between continuos variable and a categorical variable

Simone Angioloni

Join Date: May 2014

Posts: 48
#1

Quantile regression and interaction term between continuos variable and a categorical variable

03 Jul 2018, 08:57

Hi,

I am having some problems to understand how Stata handles interaction terms between a continuous variable and a categorical variables with quantile regression qreg.

My model looks like:

Y_i=B₀ + B₁*D1i + B₂*D2_i + B3*D3_i + B4*D1_i*X_i + B5*D2_i*Xi + B6*D3_i*X_i + E_i i=1..n;

Where Betas are the coefficients, D1, D2, D3 are three binary variables such that D1_i+D2_i+D3_i=1 for every i and X_i is the continuous variable.

If I run the previous model with regress y i.D c.X#i.D

Stata drops the first binary variable, D1, and it estimates all the other coefficients, in particular B4, B5, and B6. If I remember correctly, this is possible because there is not perfect collinearity between the intercept and the sum of the three interacted effects, that is D1_i*X_i+ D2_i*X_i + D3_i*X_i.

However, when I run xi: qreg y i.D c.X#i.D

1) Stata drops the first interacted term, that is D1_i*X_i:
2) Stata estimates the second interacted term, that is D2_i*X_i, in every state. In other words, Stata estimates two different coefficients for D2_i*X_i, one when D2_i=0 and one when D2_i=1.
3) Stata estimates the third interacted term, that is D3_i*X_i, when D3_i=0 and omits the case when D3_i=1.

Any help would be greatly appreciated.

Simone
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#2

03 Jul 2018, 09:13

This has nothing to do with -qreg-. Your model is mis-specified. Including the interaction terms between the D indicators and X without also including X is an invalid model. So you need:

Code:

whatever_regression_command_you_like y c.X##i.D

The use of the ## operator (not #) will cause Stata to include the "main effects" of X and D along with the interactions. When it does that, it will also find the colinearities that will lead it to omit one D indicator and also omit the interaction between X and that indicator.
Comment
Simone Angioloni

Join Date: May 2014

Posts: 48
#3

03 Jul 2018, 09:49

Hi Clyde,

Thank you very much for your points.

I was aware that including X is required in cases like #1. However, I did not follow this because my theoretical model implies the specificaiton given in #1. Does this change anything with respect to #2?

Simone
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#4

03 Jul 2018, 10:26

Then I think there is something wrong with your theoretical model. That kind of model specification is inherently invalid. You apparently have some theory which suggests that the "main effect" of X will have a zero coeficient, i.e. the coefficient of X will be zero in the reference category of D. But that cannot be right, because it is completely contingent on which category of D you choose as the reference category.

Now, it may be that you have a theory that says that the effect of X will be zero when D = some specific category (as opposed to whichever category happens to be the reference category in your data). In that case, you still run the regression including the main effect of X. And then you can test the theory by seeing whether the effect of X in that category actually is zero (or very close to it). Or you can impose that as a constraint on your model if you really believe it. (See -cnsreg-).
1 like
Comment
Masa Mihailovic

Join Date: Aug 2019

Posts: 4
#5

20 Aug 2019, 06:03

Hello Clyde,

I have a question related to this.
I am trying to see whether there is any evidence of avoidance behavior in rooms with exceptionally high CO2.
I have a very large sample, and an OLS regression simply shows a positive relationship between CO2 and occupancy. However, I want to explore the non-linear relationship by creating quintiles of the CO2 variable and seeing if the relationship becomes negative in the final quintile where CO2 is highest.
When I do the code:

xtile co2Q = co2, n(10)
reg occupancy i.co2Q, robust

I find that that the coefficient gets more and more positive. This is not what I was expecting.

However, when I do the interactions like this:

reg occupancy c.co2##i.co2Q

the "main effects" are now more and more negative with each quantile.
The interaction coefficients, on the other hand, get more positive.

I am not sure why the main effects change so dramatically when I include the interaction. I am also not entirely sure how to interpret such findings.

Can you please give me your input on this?

Thank you,
Masa
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2495
#6

20 Aug 2019, 06:30

Hi Masa
Just couple of comments on what you are trying to do.
First of all, this is NOT quantile regression, as Qreg refers to models where the heterogeneity comes from variation across the distribution of the dependent, not the independent, variable.
Second of all, your model with dummies is already giving you the right relationship between CO2 and occupancy. You could make the model more flexible (if you have Stata 16 you could look into npregress series, or npregress kernel if you have Stata 15 or above.)
If the only variable of interest is CO2, you could also just use lpoly, and see if you still observe that positive relationship between both.
From an econometric/economic point of view, perhaps the problem you are facing is an endogeneity problem cause by reverse causality, or omitted variables.
HTH
Fernando
Comment
Masa Mihailovic

Join Date: Aug 2019

Posts: 4
#7

20 Aug 2019, 07:57

Thanks for your comments Fernando.
I do have more variables of interest as well as a number of control variables too. Also, I am using Stata 16, so I will try out what you recommended.

I have tried to restrict the dataset to only observations where CO2 is exceptionally high (I see this from the normal distribution). The regression output using this sample does indicate a negative coefficient for CO2. However, I believe my dataset is just too big to capture this 'avoidance behavior' when I try non-linear regressions using the entire dataset.
Thanks for your help anyways!
Comment

Announcement

Quantile regression and interaction term between continuos variable and a categorical variable

Comment

Comment

Comment

Comment

Comment

Comment