I’m working with a regression that includes multiple interaction terms, and I’m trying to understand how Stata calculates the constant (_cons) when the model includes several categorical variables and their interactions.
I have variables y (outcome), `a' and `g' both binary dummies and `a' = {1, 2 , 3 , 4}.
Data Setup
I want to understand how Stata computes the constant and understand what the base category is in regression, like:
But before jumping into this, I stepped back and checked with simpler models to understand how _cons behaves.
_b[a] matches with the coefficient obtained in (1)
_b[g] matches with the coefficient obtained in (2)
But I don’t understand how the constant in this regression is calculated—it’s not `sum y if a == 0 & g == 0''
So, what exactly is considered the base category here, and how to interpret it? Is there any simple way to manually calculate it like in (1), (2), (3)?
I’m hoping that if I understand what this constant represents—or how it's calculated—I’ll be better able to interpret the coefficients in terms of base category means in the full regression: 'reg y i.a i.a#i.r i.r i.r#i.g i.g'
I have variables y (outcome), `a' and `g' both binary dummies and `a' = {1, 2 , 3 , 4}.
Data Setup
Code:
mem_id y a g r 1 25000 0 1 1 1 10000 1 1 1 1 20000 0 1 2 1 10000 1 1 2 1 15000 0 1 3 1 5000 1 1 3 1 15000 0 1 4 1 15000 1 1 4 2 20000 0 1 1 2 10000 1 1 1 2 5000 0 1 2 2 0 1 1 2 2 5000 0 1 3 2 5000 1 1 3 2 5000 0 1 4 2 15000 1 1 4
Code:
. sum y a g r Variable Obs Mean Std. dev. Min Max y 7,288 24578.49 12550.92 0 35000 a 7,288 .5 .5000343 0 1 g 7,288 .5192097 .4996651 0 1 r 7,288 2.5 1.118111 1 4
Code:
reg y i.a i.a#i.r i.r i.r#i.g i.g
Code:
**# (1) Reg w dummy a reg y a sum y if a==0 // recovers _cons = 24240.67 sum y if a == 1 local m1 = r(mean) sum y if a == 0 local m0 = r(mean) di `m1'-`m0' // recovers _b[a] 675.6312 **# (2) Reg w dummy g reg y g sum y if g==0 // recovers _cons = 24894.41 sum y if g == 1 local m1 = r(mean) sum y if g == 0 local m0 = r(mean) di `m1'-`m0' // recovers _b[g] -608.4656 **# (3) Reg w dummies & interaction reg y i.a##i.g sum y if a == 0 & g == 0 // recovers _cons = 24791.67 qui sum y if a == 1 & g == 0 local m10 = r(mean) qui sum y if a == 0 & g == 0 local m00 = r(mean) di `m10'-`m00' // recovers _b[a] = 205.4795 qui sum y if a == 0 & g == 1 local m01 = r(mean) di `m01'-`m00' // recovers _b[g] = -1061.223 qui sum y if a == 1 & g == 1 local m11 = r(mean) di (`m11'-`m10')-(`m01'-`m00') // recovers _b[1.a#1.g] = 905.5142 **# (4) Reg w dummies reg y i.a i.g
_b[g] matches with the coefficient obtained in (2)
But I don’t understand how the constant in this regression is calculated—it’s not `sum y if a == 0 & g == 0''
So, what exactly is considered the base category here, and how to interpret it? Is there any simple way to manually calculate it like in (1), (2), (3)?
I’m hoping that if I understand what this constant represents—or how it's calculated—I’ll be better able to interpret the coefficients in terms of base category means in the full regression: 'reg y i.a i.a#i.r i.r i.r#i.g i.g'
Comment