Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to estimate a constrained linear regression

    I am currently estimating a linear regression, where I am regressing my outcome variable on a full set of dummy variables. More specifically, I have transactions data, and buyer and seller dummies. Thus, I am regressing loans on dummy variables corresponding to buyers and sellers. As a result, transaction(i,j) is a function of I buyer dummies and J seller dummies, which take on a value of 1 when the buyer is i and the seller is j. Of course, I cannot estimate all the dummies because the sum of buyer dummies=sum of seller dummies=1 (even after removing the constant). Alternatively, what I can do is to include a constant and 2 constraints: the first constraint is that the sum of buyer dummies=0 and the second constraint is that the sum of seller dummies=0.

    To create the dummy variables, I have used the tabulate buyer, gen(buyerdummy) command for the buyers, and the symmetrical version for the sellers (STATA 13.0).

    I understand I have to use a constrained regression. First, I specified the constraints:
    constraint 1 sum(buyerdummy1-buyerdummy200)=0
    constraint 2 sum(sellerdummy1-sellerdummy150)=0


    and I ran the regression

    cnsreg y buyerdummy1-buyerdummy200 sellerdummy1-sellerdummy150, c(1-2)

    For some reason, Stata returns an error message r(131) for constraint 1 and 2 respectively. Mathematically, I am certain that my design matrix is correct. The problem originates in my syntax. Any help is much appreciated.
    Last edited by Chinmay Sharma; 23 Mar 2016, 09:02.

  • #2
    We can better help you if we know what commands you have tried and what Stata told you to indicate that there was a problem. Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. See especially sections 9-12 on how to best pose your question, including

    12. What should I say about the commands and data I use?

    Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly! If you can, reproduce the error with one of Stata's provided datasets, a small fragment of your dataset, or a simple concocted dataset that you include in your posting.
    It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ.

    I find your narrative description incomplete and leaving me with several questions. i understand that 350 dummy variables may result in a lot of output; perhaps you could follow the advice of the FAQ and create an example using just a few buyer and seller dummies (collapse the buyers and sellers into groups of 50, say).

    Added in edit: having said that, it is not clear that your use of sum() is appropriate in this context, although if it is not, you are in for serious difficulty expressing your constraints. (The documented sum() function does not accept a variable list.) Perhaps you could do better using factor variables, or perhaps you could do better using old-fashioned ANOVA coding techniques, where dummies 1-199 are coded as usual, and category 200 is coded as -1 for each of the 199 dummies. I think.
    Last edited by William Lisowski; 23 Mar 2016, 11:33.

    Comment


    • #3
      Sure. I'll try to be more precise. The data originally looks like the following:


      Code:
      clear
      input float(Transaction Buyer Seller)
       121 1 2
       342 1 3
       342 1 4
        55 2 1
       443 2 3
      2324 2 4
       324 3 1
        55 3 2
       334 3 4
        55 4 1
        33 4 2
        22 4 3
      end

      The first column corresponds to the transaction type. The second and third column correspond to the buyer and seller. So, the first row corresponds to a transaction that took place when 1 was the buyer and 2 the seller. Now, I wish to create a full set of dummy variables. There are multiple ways I can do this. One method is to use the tabulate command for which i use:


      Code:
      tab Buyer, gen(buyerdummy)
      tab Seller, gen(sellerdummy)


      The new data looks like this:

      Code:
      clear
      input float Transaction byte(buyerdummy1 buyerdummy2 buyerdummy3 buyerdummy4 sellerdummy1 sellerdummy2 sellerdummy3 sellerdummy4)
       121 1 0 0 0 0 1 0 0
       342 1 0 0 0 0 0 1 0
       342 1 0 0 0 0 0 0 1
        55 0 1 0 0 1 0 0 0
       443 0 1 0 0 0 0 1 0
      2324 0 1 0 0 0 0 0 1
       324 0 0 1 0 1 0 0 0
        55 0 0 1 0 0 1 0 0
       334 0 0 1 0 0 0 0 1
        55 0 0 0 1 1 0 0 0
        33 0 0 0 1 0 1 0 0
        22 0 0 0 1 0 0 1 0
      end

      There are multiple ways to estimate a regression of the type where I regress the transaction on the whole set of dummy variables. The problem of multicollnearity will inevitably show up, as the sum of the buyer dummies (buyerdummy1+buyerdummy2+buyerdummy3+buyerdummy4)= the sum of seller dummies=1. (even after dropping a constant). If I run a regression of the form:

      Code:
      regress Transaction buyerdummy* sellerdummy*, noconstant
      Stata automatically drops a variable (omitted category). Another way to resolve this issue of multicollinearity where I do not have to drop a category (reference case) is to estimate the model subject to a constraint, for instant, that the sum of the buyer dummy coefficients=0. In this case, all coefficients should be identified, subject to this constraint. I am not sure how to do this. T
      When I typed:

      Code:
      constraint 1 buyerdummy1+buyerdummy2+buyerdummy3+buyerdummy4=0
      constraint 2 sellerdummy1+sellerdummy2+sellerdummy3+sellerdummy4=0
      cnsreg Transaction buyerdummy* sellerdummy*, noconstant c(1)
      I obtaned:

      ------------------------------------------------------------------------------
      Transaction | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      buyerdummy1 | -246.7917 350.2317 -0.70 0.504 -1074.958 581.3748
      buyerdummy2 | 481.4583 350.2317 1.37 0.212 -346.7081 1309.625
      buyerdummy3 | -234.6667 350.2317 -0.67 0.524 -1062.833 593.4998
      buyerdummy4 | 0 (omitted)
      sellerdummy1 | -98.70833 350.2317 -0.28 0.786 -926.8748 729.4581
      sellerdummy2 | 69.04167 350.2317 0.20 0.849 -759.1248 897.2081
      sellerdummy3 | 29.66667 350.2317 0.08 0.935 -798.4998 857.8331
      sellerdummy4 | 0 (omitted)
      _cons | 370.8333 202.2064 1.83 0.109 -107.3088 848.9755
      ------------------------------------------------------------------------------
      I think I get what I want. However, in my original dataset, short of typing buyerdummy1+buyerdummy2+.....buyerdummy200 by brute force, I tried using the sum(.) command:


      Code:
      constraint 1 sum(buyerdummy*)=0
      constraint 2 sum(sellerdummy*)=0
      cnsreg Transaction buyerdummy* sellerdummy*, c(1,2)
      The output I obtained was:
      . cnsreg Transaction buyerdummy* sellerdummy*, c(1,2)
      note: buyerdummy4 omitted because of collinearity
      note: sellerdummy4 omitted because of collinearity
      (note: constraint number 1 caused error r(198))
      (note: constraint number 2 caused error r(198))

      Constrained linear regression Number of obs = 12
      F( 6, 5) = 1.24
      Prob > F = 0.4174
      Root MSE = 596.9006

      ------------------------------------------------------------------------------
      Transaction | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      buyerdummy1 | -60.125 516.9311 -0.12 0.912 -1388.939 1268.689
      buyerdummy2 | 668.125 516.9311 1.29 0.253 -660.6888 1996.939
      buyerdummy3 | -48 516.9311 -0.09 0.930 -1376.814 1280.814
      buyerdummy4 | 0 (omitted)
      sellerdummy1 | -875.375 516.9311 -1.69 0.151 -2204.189 453.4388
      sellerdummy2 | -707.625 516.9311 -1.37 0.229 -2036.439 621.1888
      sellerdummy3 | -747 516.9311 -1.45 0.208 -2075.814 581.8138
      sellerdummy4 | 0 (omitted)
      _cons | 813.3333 544.8932 1.49 0.196 -587.3594 2214.026
      ------------------------------------------------------------------------------

      The errors above stipulate invalid syntax. I am not sure why. I want to be able to replicate the constraints above to a case with many dummy variables. Thanks!

      Last edited by Chinmay Sharma; 23 Mar 2016, 11:53.

      Comment


      • #4
        Code:
        regress Transaction buyerdummy* sellerdummy*, noconstant
        No, don't do that! Don't create your own dummy variables. Just leave it alone and use factor variable notation

        Code:
        regress Transaction i.buyer i.seller
        Stata will automatically create virtual indicator variables and deal with creating a reference category for buyer and seller to omit. (If you want to specify which one, you can--see -help fvvarlist-.) Then to get the expected values of transaction for each buyer and seller, including the omitted categories, it's just:

        Code:
        margins buyer seller
        Do familiarize yourself with factor variable notation (-help fvvarlist-) and the -margins command-. For the latter, I think the clearest introduction is http://www.stata-journal.com/article...article=st0260, after which you can get more detail and advanced learning from the Stata manual section.

        Comment


        • #5
          Thanks, Clyde. I am aware of the factor notation. The thing is, I do not want a base category to be dropped, but rather estimate a constrained regression subject to the constraint that the sum of all coefficients equals 0. Using factor notation of the form you mentioned will omit a category, rather than estimate b1+b2+b3+...bN=0 (which is what I want). The two are statistically equivalent, but one lets me identify all coefficients.

          Comment


          • #6
            Chinmay, you're too focused on what the regress command does. The output of the -margins- command afterwards will give you exactly what you want from your -regress- command that is very awkward to write.

            Comment


            • #7
              Thanks a lot! I'll try it out.

              Comment


              • #8
                I tried what you suggested:

                Code:
                xi: reg Transaction i.Buyer i.Seller
                i.Buyer           _IBuyer_1-4         (naturally coded; _IBuyer_4 omitted)
                i.Seller          _ISeller_1-4        (naturally coded; _ISeller_4 omitted)
                
                      Source |       SS       df       MS              Number of obs =      12
                -------------+------------------------------           F(  6,     5) =    1.24
                       Model |  2641313.75     6  440218.958           Prob > F      =  0.4174
                    Residual |  1781451.92     5  356290.383           R-squared     =  0.5972
                -------------+------------------------------           Adj R-squared =  0.1139
                       Total |  4422765.67    11  402069.606           Root MSE      =   596.9
                
                ------------------------------------------------------------------------------
                 Transaction |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                   _IBuyer_1 |    -60.125   516.9311    -0.12   0.912    -1388.939    1268.689
                   _IBuyer_2 |    668.125   516.9311     1.29   0.253    -660.6888    1996.939
                   _IBuyer_3 |        -48   516.9311    -0.09   0.930    -1376.814    1280.814
                  _ISeller_1 |   -875.375   516.9311    -1.69   0.151    -2204.189    453.4388
                  _ISeller_2 |   -707.625   516.9311    -1.37   0.229    -2036.439    621.1888
                  _ISeller_3 |       -747   516.9311    -1.45   0.208    -2075.814    581.8138
                       _cons |   813.3333   544.8932     1.49   0.196    -587.3594    2214.026
                ------------------------------------------------------------------------------
                and thereafter:

                Code:
                margins Buyer Seller
                'Buyer' not found in list of covariates
                r(322);

                I am not sure why it is returning this error message..Thanks a lot for your consideration.

                Comment


                • #9
                  No, you didn't try what I suggested. I didn't say to use xi:. xi: is a largely obsolete command, and it completely nullifies the effects of factor variable notation. Consequently -margins- didn't work. You have to run the code exactly as I posted it.

                  Not only is the use of xi: here a problem, you should pretty much forget you ever heard of it going forward in Stata. It is only needed by a handful of old commands, most of which do things that can be better done with newer commands anyway. There are a few oddball situations where xi: is much more convenient, but they are few and far between. You should abandon xi: and use factor-variable notation. To learn factor variable notation, read -help fvvarlist- and the corresponding manual section. It, with -margins-, will make your life much eaiser, improve your programming efficiency, and reduce the frequency with which you make errors in situations where you would previously have used -xi-.

                  Comment


                  • #10
                    Thanks a lot for the good advice. I'll read up on it fvvarlist now. I appreciate the guidance.

                    Comment


                    • #11
                      Clyde, the margins command did indeed work. However, there seems to be some interpretation issue. It seems that I am getting the coefficients (marginal effects) for all variables! This surely cannot be the case (all coefficients cannot be uniquely identified). This is because if I have 2 sets of dummy variables, the sum of each set equals 1. Even if I drop the constant, I still have to drop something/ The margins command is giving me the coefficients for all the variables, exlcuding the constant. There has to be some normalization that is taking place, although I cannot see what. Do you have any suggestions?

                      Comment


                      • #12
                        What the margins command is giving you is the expected value of transaction for each buyer and each seller. Though some of them are equal to corresponding coefficients in the regression output, they are not coefficients themselves. And these expected values are uniquely identified for all categories of buyer and seller.

                        Now, what you originally sought, as I understand it, was to do a regression with no constant term and including all of the indicator variables for buyers and sellers with no omitted reference levels. Had you succesfully coded that and run it, the corresponding coefficients in that regression would be the expected values of the transaction for each buyer and seller: in other words, exactly what -margins- is showing you.

                        Comment


                        • #13
                          Hello everybody!
                          I am wondering is it somehow possible to estimate a constrained linear regression (using cnsreg, I guess) with newey west standard errors? I need a constrained linear regression model, and my residuals seem to be both autocorrelated and heteroskedastic. As I see, cnsreg postestimation allows for vce (robust) but not newey. Any thoughts on how to do it?

                          Comment


                          • #14
                            What is the constraint that you want to impose? You might be able to reparametrise the model so that the constraint is imposed.

                            Originally posted by Petar Soric View Post
                            Hello everybody!
                            I am wondering is it somehow possible to estimate a constrained linear regression (using cnsreg, I guess) with newey west standard errors? I need a constrained linear regression model, and my residuals seem to be both autocorrelated and heteroskedastic. As I see, cnsreg postestimation allows for vce (robust) but not newey. Any thoughts on how to do it?

                            Comment


                            • #15
                              Originally posted by Joro Kolev View Post
                              What is the constraint that you want to impose? You might be able to reparametrise the model so that the constraint is imposed.


                              y=b0+b1*x1+b2*x2+b3*x3+b4*x4, with the restriction b1+b2=b3+b4.

                              Could it be done in Stata?

                              Comment

                              Working...
                              X