Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Constant Term in Regressions with Multiple Dummies and Interaction

    I’m working with a regression that includes multiple interaction terms, and I’m trying to understand how Stata calculates the constant (_cons) when the model includes several categorical variables and their interactions.

    I have variables y (outcome), `a' and `g' both binary dummies and `a' = {1, 2 , 3 , 4}.

    Data Setup
    Code:
    mem_id    y    a    g    r
    1    25000    0    1    1
    1    10000    1    1    1
    1    20000    0    1    2
    1    10000    1    1    2
    1    15000    0    1    3
    1    5000    1    1    3
    1    15000    0    1    4
    1    15000    1    1    4
    2    20000    0    1    1
    2    10000    1    1    1
    2    5000    0    1    2
    2    0    1    1    2
    2    5000    0    1    3
    2    5000    1    1    3
    2    5000    0    1    4
    2    15000    1    1    4
    Code:
    .    sum y a g r 
    
        Variable         Obs    Mean    Std. dev.    Min    Max
                            
        y       7,288    24578.49    12550.92    0    35000
        a       7,288    .5    .5000343    0    1
        g       7,288    .5192097    .4996651    0    1
        r       7,288    2.5    1.118111    1    4
    I want to understand how Stata computes the constant and understand what the base category is in regression, like:
    Code:
    reg y i.a i.a#i.r i.r i.r#i.g i.g
    But before jumping into this, I stepped back and checked with simpler models to understand how _cons behaves.

    Code:
    **# (1) Reg w dummy a
    reg y a 
    sum y if a==0 // recovers _cons = 24240.67
    
    sum y if a == 1
    local m1 = r(mean)
    sum y if a == 0 
    local m0 = r(mean)
    di `m1'-`m0' // recovers _b[a] 675.6312
    
    
    **# (2) Reg w dummy g
    reg y g 
    sum y if g==0 // recovers _cons = 24894.41
    
    sum y if g == 1
    local m1 = r(mean)
    sum y if g == 0 
    local m0 = r(mean)
    di `m1'-`m0' // recovers _b[g] -608.4656
    
    
    **# (3) Reg w dummies & interaction
    reg y i.a##i.g
    sum y if a == 0 & g == 0 // recovers _cons = 24791.67
    
    qui sum y if a == 1 & g == 0
    local m10 = r(mean)
    qui sum y if a == 0 & g == 0
    local m00 = r(mean)
    di `m10'-`m00' // recovers _b[a] = 205.4795
    
    qui sum y if a == 0 & g == 1
    local m01 = r(mean)
    di `m01'-`m00' // recovers _b[g] = -1061.223
    
    qui sum y if a == 1 & g == 1 
    local m11 = r(mean)
    di (`m11'-`m10')-(`m01'-`m00') // recovers _b[1.a#1.g] = 905.5142
    
    
    **# (4) Reg w dummies
    reg y i.a i.g
    _b[a] matches with the coefficient obtained in (1)
    _b[g] matches with the coefficient obtained in (2)
    But I don’t understand how the constant in this regression is calculated—it’s not `sum y if a == 0 & g == 0''
    So, what exactly is considered the base category here, and how to interpret it? Is there any simple way to manually calculate it like in (1), (2), (3)?
    I’m hoping that if I understand what this constant represents—or how it's calculated—I’ll be better able to interpret the coefficients in terms of base category means in the full regression: 'reg y i.a i.a#i.r i.r i.r#i.g i.g'

    Click image for larger version

Name:	reg_ya.png
Views:	1
Size:	27.9 KB
ID:	1777170
    Click image for larger version

Name:	reg_yg.png
Views:	1
Size:	28.2 KB
ID:	1777171
    Click image for larger version

Name:	reg_yag_int.png
Views:	1
Size:	35.1 KB
ID:	1777172
    Click image for larger version

Name:	reg_yag_dum.png
Views:	1
Size:	30.9 KB
ID:	1777173

  • #2
    Ayesha, I think you have got confused between what the linear regression model implies for the predicted values of y, as opposed to the sample average of y.

    The general implication of the first order conditions for linear regression is that the estimated constant will be equal to the predicted value of y, when all the right hand side variables are set to zero, not the sample average of y. The latter is what you are getting from the sum command. What you need instead are the predicted values, which you can get using the command predict. In general the only relevant implication of linear regression is that the predicted value of y will equal the sample average of y at one specific point: when the explanatory variables are all set to their respective sample averages; it is not generally true at any arbitrary values of the explanatory variables. (In a simple linear regression, the line of best fit will NOT generally pass through all the points that represent the sample averages of y for each possible value of x, and that includes the particular case of x = 0)

    The reason your method seems to work is because of the very special case of using binary variables. When you run a linear regression model on just one binary variable, this is effectively a fully nonparametric model -- you have just two points (x = 0 and 1) at which you are looking for predictions, and the linear regression will end up predicting that y should be equal to it sample average at those respective points. Similarly, when you have two binary explanatory variables, the model (3) is effectively a fully nonparametric model, and so once again, the predicted values turn out to be the respective sample averages. However, this is not true of model (4), where effectively you are imposing the restriction that response of y to a (respectively, g) does not depend on g (respectively, a). So this is no longer a fully non-parametric estimation and you don't end up getting the result that the predicted values are just the respective sample averages.
    Last edited by Hemanshu Kumar; 10 May 2025, 02:22.

    Comment


    • #3
      That makes sense — thanks for clarifying the difference between predicted values and the sample mean.

      the estimated constant will be equal to the predicted value of y, when all the right-hand side variables are set to zero, not the sample average of y.
      Does this mean that even in a parametric model, the constant is still some weighted version of the sample mean of `y` when all covariates are set to zero?

      Also, how should we interpret the coefficients relative to the constant in a regression like (4)?

      I’ve seen some analyses where the control group mean is reported along with the regression table, presumably to help interpret treatment effects. For a regression like (4), what would be the appropriate control group mean? Should it be the mean of `y` when the treatment variable of interest `a == 0`, ignoring the time variable (`g`) and the control variables (`r`)?

      Comment


      • #4
        Originally posted by Ayesha Ahmed View Post
        Does this mean that even in a parametric model, the constant is still some weighted version of the sample mean of `y` when all covariates are set to zero?
        No, it is not a linear combination of the sample values of y or of the sample mean of y. The formulae should be easy to find in any elementary econometrics or statistics book that covers linear regression. In matrix form, the linear regression coefficient matrix equals (X'X)-1(X'Y).

        Also, how should we interpret the coefficients relative to the constant in a regression like (4)?
        For these kind of exercises, it is instructive to write the population regression function and take expectations for various combinations of the explanatory variables, and back out the value of the population parameters from that. The coefficients are then simply the least squares estimators of those parameters.

        Here, writing the population regression function as
        E[Y|a,b] = b0 + b1a + b2g

        we can see that

        E[Y | a = 0, g = 0] = b0
        E[Y | a = 0, g = 1] = b0 + b2
        E[Y | a = 1, g = 0] = b0 + b1
        E[Y | a = 1, g = 1] = b0 + b1 + b2


        Thus
        b1 = E[Y | a=1, g = 0] - E [Y | a = 1, g = 0] = E[Y | a = 1, g = 1] - E[ Y| a = 0, g = 1]

        Thus b1 is the marginal impact of a (in your case, the average treatment effect), keeping the value of g constant, for any value of g. And the regression coefficient is the corresponding least squares estimator of that.
        Last edited by Hemanshu Kumar; 11 May 2025, 03:10.

        Comment


        • #5
          Thanks so much for the explanation!

          Comment

          Working...
          X