Constant Term in Regressions with Multiple Dummies and Interaction

Ayesha Ahmed

Join Date: Mar 2023
Posts: 7

Constant Term in Regressions with Multiple Dummies and Interaction

09 May 2025, 20:38

I’m working with a regression that includes multiple interaction terms, and I’m trying to understand how Stata calculates the constant (_cons) when the model includes several categorical variables and their interactions.

I have variables y (outcome), `a' and `g' both binary dummies and `a' = {1, 2 , 3 , 4}.

Data Setup

Code:

mem_id    y    a    g    r
1    25000    0    1    1
1    10000    1    1    1
1    20000    0    1    2
1    10000    1    1    2
1    15000    0    1    3
1    5000    1    1    3
1    15000    0    1    4
1    15000    1    1    4
2    20000    0    1    1
2    10000    1    1    1
2    5000    0    1    2
2    0    1    1    2
2    5000    0    1    3
2    5000    1    1    3
2    5000    0    1    4
2    15000    1    1    4

Code:

.    sum y a g r 

    Variable         Obs    Mean    Std. dev.    Min    Max
                        
    y       7,288    24578.49    12550.92    0    35000
    a       7,288    .5    .5000343    0    1
    g       7,288    .5192097    .4996651    0    1
    r       7,288    2.5    1.118111    1    4

I want to understand how Stata computes the constant and understand what the base category is in regression, like:

Code:

reg y i.a i.a#i.r i.r i.r#i.g i.g

But before jumping into this, I stepped back and checked with simpler models to understand how _cons behaves.

Code:

**# (1) Reg w dummy a
reg y a 
sum y if a==0 // recovers _cons = 24240.67

sum y if a == 1
local m1 = r(mean)
sum y if a == 0 
local m0 = r(mean)
di `m1'-`m0' // recovers _b[a] 675.6312


**# (2) Reg w dummy g
reg y g 
sum y if g==0 // recovers _cons = 24894.41

sum y if g == 1
local m1 = r(mean)
sum y if g == 0 
local m0 = r(mean)
di `m1'-`m0' // recovers _b[g] -608.4656


**# (3) Reg w dummies & interaction
reg y i.a##i.g
sum y if a == 0 & g == 0 // recovers _cons = 24791.67

qui sum y if a == 1 & g == 0
local m10 = r(mean)
qui sum y if a == 0 & g == 0
local m00 = r(mean)
di `m10'-`m00' // recovers _b[a] = 205.4795

qui sum y if a == 0 & g == 1
local m01 = r(mean)
di `m01'-`m00' // recovers _b[g] = -1061.223

qui sum y if a == 1 & g == 1 
local m11 = r(mean)
di (`m11'-`m10')-(`m01'-`m00') // recovers _b[1.a#1.g] = 905.5142


**# (4) Reg w dummies
reg y i.a i.g

_b[a] matches with the coefficient obtained in (1)
_b[g] matches with the coefficient obtained in (2)
But I don’t understand how the constant in this regression is calculated—it’s not `sum y if a == 0 & g == 0''
So, what exactly is considered the base category here, and how to interpret it? Is there any simple way to manually calculate it like in (1), (2), (3)?
I’m hoping that if I understand what this constant represents—or how it's calculated—I’ll be better able to interpret the coefficients in terms of base category means in the full regression: 'reg y i.a i.a#i.r i.r i.r#i.g i.g'

Click image for larger version

Name: reg_ya.png
Views: 1
Size: 27.9 KB
ID: 1777170

Click image for larger version

Name: reg_yg.png
Views: 1
Size: 28.2 KB
ID: 1777171

Click image for larger version

Name: reg_yag_int.png
Views: 1
Size: 35.1 KB
ID: 1777172

Click image for larger version

Name: reg_yag_dum.png
Views: 1
Size: 30.9 KB
ID: 1777173

Tags: None

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1369
#2

10 May 2025, 02:19

Ayesha, I think you have got confused between what the linear regression model implies for the predicted values of y, as opposed to the sample average of y.

The general implication of the first order conditions for linear regression is that the estimated constant will be equal to the predicted value of y, when all the right hand side variables are set to zero, not the sample average of y. The latter is what you are getting from the sum command. What you need instead are the predicted values, which you can get using the command predict. In general the only relevant implication of linear regression is that the predicted value of y will equal the sample average of y at one specific point: when the explanatory variables are all set to their respective sample averages; it is not generally true at any arbitrary values of the explanatory variables. (In a simple linear regression, the line of best fit will NOT generally pass through all the points that represent the sample averages of y for each possible value of x, and that includes the particular case of x = 0)

The reason your method seems to work is because of the very special case of using binary variables. When you run a linear regression model on just one binary variable, this is effectively a fully nonparametric model -- you have just two points (x = 0 and 1) at which you are looking for predictions, and the linear regression will end up predicting that y should be equal to it sample average at those respective points. Similarly, when you have two binary explanatory variables, the model (3) is effectively a fully nonparametric model, and so once again, the predicted values turn out to be the respective sample averages. However, this is not true of model (4), where effectively you are imposing the restriction that response of y to a (respectively, g) does not depend on g (respectively, a). So this is no longer a fully non-parametric estimation and you don't end up getting the result that the predicted values are just the respective sample averages.

Last edited by Hemanshu Kumar; 10 May 2025, 02:22.
2 likes
Comment
Ayesha Ahmed

Join Date: Mar 2023

Posts: 7
#3

10 May 2025, 15:25

That makes sense — thanks for clarifying the difference between predicted values and the sample mean.

the estimated constant will be equal to the predicted value of y, when all the right-hand side variables are set to zero, not the sample average of y.

Does this mean that even in a parametric model, the constant is still some weighted version of the sample mean of `y` when all covariates are set to zero?

Also, how should we interpret the coefficients relative to the constant in a regression like (4)?

I’ve seen some analyses where the control group mean is reported along with the regression table, presumably to help interpret treatment effects. For a regression like (4), what would be the appropriate control group mean? Should it be the mean of `y` when the treatment variable of interest `a == 0`, ignoring the time variable (`g`) and the control variables (`r`)?
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1369
#4

11 May 2025, 03:07

Originally posted by Ayesha Ahmed View Post

Does this mean that even in a parametric model, the constant is still some weighted version of the sample mean of `y` when all covariates are set to zero?

No, it is not a linear combination of the sample values of y or of the sample mean of y. The formulae should be easy to find in any elementary econometrics or statistics book that covers linear regression. In matrix form, the linear regression coefficient matrix equals (X'X)^-1(X'Y).

Also, how should we interpret the coefficients relative to the constant in a regression like (4)?

For these kind of exercises, it is instructive to write the population regression function and take expectations for various combinations of the explanatory variables, and back out the value of the population parameters from that. The coefficients are then simply the least squares estimators of those parameters.

Here, writing the population regression function as
E[Y|a,b] = b₀ + b₁a + b₂g

we can see that

E[Y | a = 0, g = 0] = b₀
E[Y | a = 0, g = 1] = b₀ + b₂
E[Y | a = 1, g = 0] = b₀ + b₁
E[Y | a = 1, g = 1] = b₀ + b₁ + b₂

Thus
b₁ = E[Y | a=1, g = 0] - E [Y | a = 1, g = 0] = E[Y | a = 1, g = 1] - E[ Y| a = 0, g = 1]

Thus b₁ is the marginal impact of a (in your case, the average treatment effect), keeping the value of g constant, for any value of g. And the regression coefficient is the corresponding least squares estimator of that.

Last edited by Hemanshu Kumar; 11 May 2025, 03:10.
1 like
Comment
Ayesha Ahmed

Join Date: Mar 2023

Posts: 7
#5

11 May 2025, 20:52

Thanks so much for the explanation!
Comment

Announcement

Constant Term in Regressions with Multiple Dummies and Interaction

Comment

Comment

Comment

Comment