Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running a regression by groups

    Hi,
    I want to run a regression by two (or several) groups. So, I tried by group: regress y x1 x2 x3. But, I got a message from stata not sorted r(5).
    I also tried a second alternative which is regress if group==1 and regress if group==2. And it worked but it's not practical if I need to do it for many groups.
    So, the main issue here is about the "not sorted" returned message. And a second issue is about the second alternative used, is it correct!
    Thanks.

  • #2
    Either sort first or use bysort instead of by.

    Comment


    • #3
      Originally posted by Sarah Edgington View Post
      Either sort first or use bysort instead of by.
      I did both but I still got the same message "not sorted"!!!

      Comment


      • #4
        Show us the exact code you ran and Stata's exact response. Do not retype them into a post. Instead, copy both the command and the results from Stata's Results window into a code block.

        Comment


        • #5
          Salma, You use bys group: ... to create a new variable or to modify an existing one. Try loop if you have many groups: su group forval i=r(min)/r(max) { regress y x1 x2 x3 if group == 'i' } Make sure to replace the single quote mark the left of i with the proper mark, I don't find it in my iphone. Abraham

          Comment


          • #6
            @ Abraham

            You use bys group: ... to create a new variable or to modify an existing one
            Not so. bys group:... can be used with many commands, including -regress-. Salma is doing something wrong in the way she is writing the command, or else there is some problem with her Stata installation as

            Code:
            bysort group: regress y x1 x2 x3
            should work. So it is important to see what she is actually typing to figure out why it isn't working.

            Also, your suggested code relying on -su group= and looping over values from `r(min)' to `r(max)' will break if not every value between the min and max actually appears in the data. And it also won't work if group is a string variable.

            Comment


            • #7
              Code:
              bysort Sin: regress CSR_str l.CSI_con FirmSize ROA ROE Booktomarketratio Financialleverage Capitalexpenditu
              > reratio RandDratio Advertisingratio Sizeinvestorbase
              
              -------------------------------------------------------------------------------------------------------------
              -> Sin = 0
              not sorted
              r(5);
              That's what I typed and the output from stata.
              I even tried regress without the bysort option, and it's returning the same error message "not sorted"!! Thanks for your help.
              Last edited by salma ktat; 18 Nov 2014, 13:07.

              Comment


              • #8
                It isn't obvious at first glance why the above shouldn't work. My eye is drawn to the l.CSI_con term.

                Playing around with some of my own panel-type data sets, I can reproduce this error when I am sorting on a variable that is different from the panel variable in data that is -xtset- and using a lag operator on one of the variables.

                I guess there is a conflict between the sorting that Stata needs to do for the lag operator to work and the sorting being specified in the -bysort-.

                Here's a workaround:

                Code:
                gen lag_CSI_con = l.CSI_con
                bysort Sin: regress CSR_str lag_CSI_con FirmSize ROA ROE Booktomarketratio Financialleverage ///
                    Capitalexpenditureratio RandDratio Advertisingratio Sizeinvestorbase

                Comment


                • #9
                  Code:
                  gen lag_CSI_con = l.CSI_con
                  not sorted
                  r(5);
                  Thanks Clyde, but here is the output, always the same error message "not sorted". I don't know what variables stata needs to be sorted.

                  Comment


                  • #10
                    Try sorting on CSI_con and see if that helps. (This is just a guess, so it may not fix the problem).

                    Actually, on second thought, sorting on whatever variable you used in xtset probably makes more sense.
                    Last edited by Sarah Edgington; 18 Nov 2014, 13:34. Reason: added an afterthought

                    Comment


                    • #11
                      I'm not sure what is going on here; for the problem with -sort-, I suggest contacting tech support

                      However, I also don't know why you want to do the regressions this way - what is your goal? and, will -statsby- get you there more easily?

                      Comment


                      • #12
                        Salma,

                        Have you -xtset- your data first? You can't use the lag operator until you do. I think if you do all of this in sequence it will work:

                        Code:
                        xtset your_panel_var your_time_var
                        // OPTIONAL OTHER STUFF YOU DO BEFORE THE REGRESSIONS
                        xtset   // RESTORE APPROPRIATE SORT ORDER FOR LAG OPERATOR
                        gen lag_CSI_con = L.CSI_con
                        bysort group: regress CSR_str lag_CSI_con FirmSize ROA ROE Booktomarketratio Financialleverage ///
                            Capitalexpenditureratio RandDratio Advertisingratio Sizeinvestorbase
                        where the italicized your_panel_var and your_time_var are to be replaced by whatever your panel and time variables are, respectively.

                        If this does not work, then I think you need to contact technical support; but I'm pretty confident this will work if you do it exactly this way.

                        Comment


                        • #13
                          @Rich
                          My goal is to run a regression by groups of firms. The reason why I tried using the by option combined with the regree command. But, my goal as said is to make a comparison between two main groups of firms. Thanks.

                          Comment


                          • #14
                            Salma,

                            You are contradicting yourself. First you say your goal is to run a regression by groups of firms. Then you say your goal is to make a comparison between two main groups of firms. Those are different goals and are accomplished in different ways. You need to make up your mind exactly what you want to do and then focus on that.

                            Comment


                            • #15
                              @Clyde
                              Sorry for the caused confusion (it's may be because of my poor english, sorry again!). But the point is that, I'm tring to make a regression for 2 groups of firms which allow me after to make a comparison between these groups. Hope it's enough clear now!! Thanks.

                              Comment

                              Working...
                              X