How to estimate a constrained linear regression

Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#1

How to estimate a constrained linear regression

23 Mar 2016, 08:45

I am currently estimating a linear regression, where I am regressing my outcome variable on a full set of dummy variables. More specifically, I have transactions data, and buyer and seller dummies. Thus, I am regressing loans on dummy variables corresponding to buyers and sellers. As a result, transaction(i,j) is a function of I buyer dummies and J seller dummies, which take on a value of 1 when the buyer is i and the seller is j. Of course, I cannot estimate all the dummies because the sum of buyer dummies=sum of seller dummies=1 (even after removing the constant). Alternatively, what I can do is to include a constant and 2 constraints: the first constraint is that the sum of buyer dummies=0 and the second constraint is that the sum of seller dummies=0.

To create the dummy variables, I have used the tabulate buyer, gen(buyerdummy) command for the buyers, and the symmetrical version for the sellers (STATA 13.0).

I understand I have to use a constrained regression. First, I specified the constraints:
constraint 1 sum(buyerdummy1-buyerdummy200)=0
constraint 2 sum(sellerdummy1-sellerdummy150)=0

and I ran the regression

cnsreg y buyerdummy1-buyerdummy200 sellerdummy1-sellerdummy150, c(1-2)

For some reason, Stata returns an error message r(131) for constraint 1 and 2 respectively. Mathematically, I am certain that my design matrix is correct. The problem originates in my syntax. Any help is much appreciated.

Last edited by Chinmay Sharma; 23 Mar 2016, 09:02.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

23 Mar 2016, 11:22

We can better help you if we know what commands you have tried and what Stata told you to indicate that there was a problem. Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. See especially sections 9-12 on how to best pose your question, including

12. What should I say about the commands and data I use?

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly! If you can, reproduce the error with one of Stata's provided datasets, a small fragment of your dataset, or a simple concocted dataset that you include in your posting.

It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ.

I find your narrative description incomplete and leaving me with several questions. i understand that 350 dummy variables may result in a lot of output; perhaps you could follow the advice of the FAQ and create an example using just a few buyer and seller dummies (collapse the buyers and sellers into groups of 50, say).

Added in edit: having said that, it is not clear that your use of sum() is appropriate in this context, although if it is not, you are in for serious difficulty expressing your constraints. (The documented sum() function does not accept a variable list.) Perhaps you could do better using factor variables, or perhaps you could do better using old-fashioned ANOVA coding techniques, where dummies 1-199 are coded as usual, and category 200 is coded as -1 for each of the 199 dummies. I think.

Last edited by William Lisowski; 23 Mar 2016, 11:33.
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#3

23 Mar 2016, 11:50

Sure. I'll try to be more precise. The data originally looks like the following:

Code:

clear input float(Transaction Buyer Seller) 121 1 2 342 1 3 342 1 4 55 2 1 443 2 3 2324 2 4 324 3 1 55 3 2 334 3 4 55 4 1 33 4 2 22 4 3 end

The first column corresponds to the transaction type. The second and third column correspond to the buyer and seller. So, the first row corresponds to a transaction that took place when 1 was the buyer and 2 the seller. Now, I wish to create a full set of dummy variables. There are multiple ways I can do this. One method is to use the tabulate command for which i use:

Code:

tab Buyer, gen(buyerdummy) tab Seller, gen(sellerdummy)

The new data looks like this:

Code:

clear input float Transaction byte(buyerdummy1 buyerdummy2 buyerdummy3 buyerdummy4 sellerdummy1 sellerdummy2 sellerdummy3 sellerdummy4) 121 1 0 0 0 0 1 0 0 342 1 0 0 0 0 0 1 0 342 1 0 0 0 0 0 0 1 55 0 1 0 0 1 0 0 0 443 0 1 0 0 0 0 1 0 2324 0 1 0 0 0 0 0 1 324 0 0 1 0 1 0 0 0 55 0 0 1 0 0 1 0 0 334 0 0 1 0 0 0 0 1 55 0 0 0 1 1 0 0 0 33 0 0 0 1 0 1 0 0 22 0 0 0 1 0 0 1 0 end

There are multiple ways to estimate a regression of the type where I regress the transaction on the whole set of dummy variables. The problem of multicollnearity will inevitably show up, as the sum of the buyer dummies (buyerdummy1+buyerdummy2+buyerdummy3+buyerdummy4)= the sum of seller dummies=1. (even after dropping a constant). If I run a regression of the form:

Code:

regress Transaction buyerdummy* sellerdummy*, noconstant

Stata automatically drops a variable (omitted category). Another way to resolve this issue of multicollinearity where I do not have to drop a category (reference case) is to estimate the model subject to a constraint, for instant, that the sum of the buyer dummy coefficients=0. In this case, all coefficients should be identified, subject to this constraint. I am not sure how to do this. T
When I typed:

Code:

constraint 1 buyerdummy1+buyerdummy2+buyerdummy3+buyerdummy4=0 constraint 2 sellerdummy1+sellerdummy2+sellerdummy3+sellerdummy4=0 cnsreg Transaction buyerdummy* sellerdummy*, noconstant c(1)

I obtaned:

------------------------------------------------------------------------------
Transaction | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
buyerdummy1 | -246.7917 350.2317 -0.70 0.504 -1074.958 581.3748
buyerdummy2 | 481.4583 350.2317 1.37 0.212 -346.7081 1309.625
buyerdummy3 | -234.6667 350.2317 -0.67 0.524 -1062.833 593.4998
buyerdummy4 | 0 (omitted)
sellerdummy1 | -98.70833 350.2317 -0.28 0.786 -926.8748 729.4581
sellerdummy2 | 69.04167 350.2317 0.20 0.849 -759.1248 897.2081
sellerdummy3 | 29.66667 350.2317 0.08 0.935 -798.4998 857.8331
sellerdummy4 | 0 (omitted)
_cons | 370.8333 202.2064 1.83 0.109 -107.3088 848.9755
------------------------------------------------------------------------------
I think I get what I want. However, in my original dataset, short of typing buyerdummy1+buyerdummy2+.....buyerdummy200 by brute force, I tried using the sum(.) command:

Code:

constraint 1 sum(buyerdummy*)=0 constraint 2 sum(sellerdummy*)=0 cnsreg Transaction buyerdummy* sellerdummy*, c(1,2)

The output I obtained was:
. cnsreg Transaction buyerdummy* sellerdummy*, c(1,2)
note: buyerdummy4 omitted because of collinearity
note: sellerdummy4 omitted because of collinearity
(note: constraint number 1 caused error r(198))
(note: constraint number 2 caused error r(198))

Constrained linear regression Number of obs = 12
F( 6, 5) = 1.24
Prob > F = 0.4174
Root MSE = 596.9006

------------------------------------------------------------------------------
Transaction | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
buyerdummy1 | -60.125 516.9311 -0.12 0.912 -1388.939 1268.689
buyerdummy2 | 668.125 516.9311 1.29 0.253 -660.6888 1996.939
buyerdummy3 | -48 516.9311 -0.09 0.930 -1376.814 1280.814
buyerdummy4 | 0 (omitted)
sellerdummy1 | -875.375 516.9311 -1.69 0.151 -2204.189 453.4388
sellerdummy2 | -707.625 516.9311 -1.37 0.229 -2036.439 621.1888
sellerdummy3 | -747 516.9311 -1.45 0.208 -2075.814 581.8138
sellerdummy4 | 0 (omitted)
_cons | 813.3333 544.8932 1.49 0.196 -587.3594 2214.026
------------------------------------------------------------------------------

The errors above stipulate invalid syntax. I am not sure why. I want to be able to replicate the constraints above to a case with many dummy variables. Thanks!

Last edited by Chinmay Sharma; 23 Mar 2016, 11:53.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#4

23 Mar 2016, 12:23

Code:

regress Transaction buyerdummy* sellerdummy*, noconstant

No, don't do that! Don't create your own dummy variables. Just leave it alone and use factor variable notation

Code:

regress Transaction i.buyer i.seller

Stata will automatically create virtual indicator variables and deal with creating a reference category for buyer and seller to omit. (If you want to specify which one, you can--see -help fvvarlist-.) Then to get the expected values of transaction for each buyer and seller, including the omitted categories, it's just:

Code:

margins buyer seller

Do familiarize yourself with factor variable notation (-help fvvarlist-) and the -margins command-. For the latter, I think the clearest introduction is http://www.stata-journal.com/article...article=st0260, after which you can get more detail and advanced learning from the Stata manual section.
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#5

23 Mar 2016, 12:31

Thanks, Clyde. I am aware of the factor notation. The thing is, I do not want a base category to be dropped, but rather estimate a constrained regression subject to the constraint that the sum of all coefficients equals 0. Using factor notation of the form you mentioned will omit a category, rather than estimate b1+b2+b3+...bN=0 (which is what I want). The two are statistically equivalent, but one lets me identify all coefficients.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#6

23 Mar 2016, 12:41

Chinmay, you're too focused on what the regress command does. The output of the -margins- command afterwards will give you exactly what you want from your -regress- command that is very awkward to write.
2 likes
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#7

23 Mar 2016, 13:16

Thanks a lot! I'll try it out.
Comment

Chinmay Sharma

Join Date: Nov 2015
Posts: 351

23 Mar 2016, 14:25

I tried what you suggested:

Code:

xi: reg Transaction i.Buyer i.Seller
i.Buyer           _IBuyer_1-4         (naturally coded; _IBuyer_4 omitted)
i.Seller          _ISeller_1-4        (naturally coded; _ISeller_4 omitted)

      Source |       SS       df       MS              Number of obs =      12
-------------+------------------------------           F(  6,     5) =    1.24
       Model |  2641313.75     6  440218.958           Prob > F      =  0.4174
    Residual |  1781451.92     5  356290.383           R-squared     =  0.5972
-------------+------------------------------           Adj R-squared =  0.1139
       Total |  4422765.67    11  402069.606           Root MSE      =   596.9

------------------------------------------------------------------------------
 Transaction |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   _IBuyer_1 |    -60.125   516.9311    -0.12   0.912    -1388.939    1268.689
   _IBuyer_2 |    668.125   516.9311     1.29   0.253    -660.6888    1996.939
   _IBuyer_3 |        -48   516.9311    -0.09   0.930    -1376.814    1280.814
  _ISeller_1 |   -875.375   516.9311    -1.69   0.151    -2204.189    453.4388
  _ISeller_2 |   -707.625   516.9311    -1.37   0.229    -2036.439    621.1888
  _ISeller_3 |       -747   516.9311    -1.45   0.208    -2075.814    581.8138
       _cons |   813.3333   544.8932     1.49   0.196    -587.3594    2214.026
------------------------------------------------------------------------------

and thereafter:

Code:

margins Buyer Seller
'Buyer' not found in list of covariates
r(322);

I am not sure why it is returning this error message..Thanks a lot for your consideration.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#9

23 Mar 2016, 18:14

No, you didn't try what I suggested. I didn't say to use xi:. xi: is a largely obsolete command, and it completely nullifies the effects of factor variable notation. Consequently -margins- didn't work. You have to run the code exactly as I posted it.

Not only is the use of xi: here a problem, you should pretty much forget you ever heard of it going forward in Stata. It is only needed by a handful of old commands, most of which do things that can be better done with newer commands anyway. There are a few oddball situations where xi: is much more convenient, but they are few and far between. You should abandon xi: and use factor-variable notation. To learn factor variable notation, read -help fvvarlist- and the corresponding manual section. It, with -margins-, will make your life much eaiser, improve your programming efficiency, and reduce the frequency with which you make errors in situations where you would previously have used -xi-.
1 like
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#10

24 Mar 2016, 08:38

Thanks a lot for the good advice. I'll read up on it fvvarlist now. I appreciate the guidance.
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#11

24 Mar 2016, 11:29

Clyde, the margins command did indeed work. However, there seems to be some interpretation issue. It seems that I am getting the coefficients (marginal effects) for all variables! This surely cannot be the case (all coefficients cannot be uniquely identified). This is because if I have 2 sets of dummy variables, the sum of each set equals 1. Even if I drop the constant, I still have to drop something/ The margins command is giving me the coefficients for all the variables, exlcuding the constant. There has to be some normalization that is taking place, although I cannot see what. Do you have any suggestions?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#12

24 Mar 2016, 12:07

What the margins command is giving you is the expected value of transaction for each buyer and each seller. Though some of them are equal to corresponding coefficients in the regression output, they are not coefficients themselves. And these expected values are uniquely identified for all categories of buyer and seller.

Now, what you originally sought, as I understand it, was to do a regression with no constant term and including all of the indicator variables for buyers and sellers with no omitted reference levels. Had you succesfully coded that and run it, the corresponding coefficients in that regression would be the expected values of the transaction for each buyer and seller: in other words, exactly what -margins- is showing you.
2 likes
Comment
Petar Soric

Join Date: Apr 2021

Posts: 3
#13

23 Apr 2021, 09:29

Hello everybody!
I am wondering is it somehow possible to estimate a constrained linear regression (using cnsreg, I guess) with newey west standard errors? I need a constrained linear regression model, and my residuals seem to be both autocorrelated and heteroskedastic. As I see, cnsreg postestimation allows for vce (robust) but not newey. Any thoughts on how to do it?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#14

23 Apr 2021, 10:43

What is the constraint that you want to impose? You might be able to reparametrise the model so that the constraint is imposed.

Originally posted by Petar Soric View Post

Hello everybody!
I am wondering is it somehow possible to estimate a constrained linear regression (using cnsreg, I guess) with newey west standard errors? I need a constrained linear regression model, and my residuals seem to be both autocorrelated and heteroskedastic. As I see, cnsreg postestimation allows for vce (robust) but not newey. Any thoughts on how to do it?
Comment
Petar Soric

Join Date: Apr 2021

Posts: 3
#15

23 Apr 2021, 11:08

Originally posted by Joro Kolev View Post

What is the constraint that you want to impose? You might be able to reparametrise the model so that the constraint is imposed.

y=b0+b1*x1+b2*x2+b3*x3+b4*x4, with the restriction b1+b2=b3+b4.

Could it be done in Stata?
Comment

Announcement

How to estimate a constrained linear regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment