Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separate regression for each group in Panel data

    Hi,

    I have a panel dataset, which includes observation for returns and turnover for the past 52 weeks. I want to extract the coefficient of LMSW_2002 in the following regression equation which is based on the 52 observations for each firm-year.

    I can get the results of the regression using the following command. Unfortunately, I cannot store these result. I am looking for a solution that would run the regression for each group (Index) and store the results of the entire regression or just the coefficient of LMSW_2002.

    Code:
    by Index : reg Ret lagret LMSW_2002
    Sample data is attached as follows
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int Index long PERMNO byte Weeks double Ret float(lagret LMSW_2002)
    1 22592 -52   .03066  -.02839   .0015403944
    1 22592 -51 -.029901   .03066   -.002951392
    1 22592 -50  .023084 -.029901  -.0001493088
    1 22592 -49 -.034812  .023084   -.008187853
    1 22592 -48  .016831 -.034812   -.006448435
    1 22592 -47 -.040725  .016831   .0006774608
    1 22592 -46 -.048069 -.040725   -.005754218
    1 22592 -45 -.000719 -.048069   -.014784833
    1 22592 -44 -.010654 -.000719 .000031164695
    1 22592 -43  .003201 -.010654    -.00345002
    1 22592 -42  .029156  .003201   .0010735776
    1 22592 -41 -.011275  .029156    .005789125
    1 22592 -40   .04747 -.011275     .00033129
    1 22592 -39  .000136   .04747    .001350522
    1 22592 -38 -.009508  .000136  2.311508e-06
    1 22592 -37 -.009408 -.009508   .0018295203
    1 22592 -36 -.033659 -.009408   .0016629375
    1 22592 -35  .013875 -.033659   -.001987037
    1 22592 -34  .036066  .013875   .0046342714
    1 22592 -33 -.044441  .036066     .02636297
    1 22592 -32 -.069834 -.044441   -.008258095
    1 22592 -31 -.160062 -.069834    -.02828788
    1 22592 -30  .041098 -.160062    -.14564113
    1 22592 -29  .055231  .041098    .026202515
    1 22592 -28  .078678  .055231     .03807466
    1 22592 -27  .008243  .078678    .036639154
    1 22592 -26 -.027302  .008243   .0009457734
    1 22592 -25  -.01964 -.027302   -.007746905
    1 22592 -24  .091843  -.01964   -.009994903
    1 22592 -23 -.105782  .091843   -.006197574
    1 22592 -22 -.063659 -.105782    -.04772526
    1 22592 -21   .01035 -.063659   -.013357646
    1 22592 -20 -.015542   .01035    .000224061
    1 22592 -19  .061894 -.015542    .017627724
    1 22592 -18 -.024497  .061894    -.04321004
    1 22592 -17 -.022861 -.024497    .010510028
    1 22592 -16 -.062389 -.022861    .007009152
    1 22592 -15  .016824 -.062389     .02297962
    1 22592 -14 -.026027  .016824 -.00028896355
    1 22592 -13  -.05669 -.026027   .0005485007
    1 22592 -12 -.033566  -.05669  .00006714724
    1 22592 -11 -.037884 -.033566    .006307902
    1 22592 -10  -.07985 -.037884   -.002673329
    1 22592  -9  .147502  -.07985   -.017015157
    1 22592  -8 -.043125  .147502    .007189305
    1 22592  -7  .095798 -.043125    .004106716
    1 22592  -6  .035367  .095798   -.008555142
    1 22592  -5  .019574  .035367   -.005936099
    1 22592  -4  .012799  .019574   -.012329363
    1 22592  -3  .059283  .012799   -.002809522
    1 22592  -2  .015439  .059283     .00299079
    1 22592  -1  .038182  .015439  -.0027095284
    2 22592 -52 -.036113  .038182   -.008072878
    2 22592 -51 -.023458 -.036113    .009752043
    2 22592 -50   .01837 -.023458    .008100242
    2 22592 -49   .06725   .01837   -.009800303
    2 22592 -48  .000985   .06725   -.009077097
    2 22592 -47 -.026721  .000985 -.00024920338
    2 22592 -46 -.001853 -.026721    .006144799
    2 22592 -45  .016706 -.001853   .0004743336
    2 22592 -44 -.007635  .016706    -.00557728
    2 22592 -43   .05235 -.007635   .0002223419
    2 22592 -42  .103306   .05235    -.01228291
    2 22592 -41  .015846  .103306    .009666022
    2 22592 -40  .033749  .015846   -.003186107
    2 22592 -39 -.021674  .033749  -.0039089997
    2 22592 -38  .025846 -.021674   .0023565795
    2 22592 -37 -.009912  .025846   -.008456712
    2 22592 -36 -.007925 -.009912   .0031727066
    2 22592 -35  .037141 -.007925   .0023518275
    2 22592 -34  .008378  .037141    -.01484172
    2 22592 -33 -.010989  .008378  -.0021790809
    2 22592 -32 -.025339 -.010989   .0044569033
    2 22592 -31  .038927 -.025339    .004950087
    2 22592 -30  .017262  .038927   -.014910948
    2 22592 -29  .023678  .017262   -.004745167
    2 22592 -28 -.054613  .023678   .0022792008
    2 22592 -27   .02501 -.054613  -.0026666506
    2 22592 -26  .025328   .02501   -.005092273
    2 22592 -25 -.002297  .025328   -.006771228
    2 22592 -24  .001435 -.002297   .0007454183
    2 22592 -23  .019414  .001435  -.0009510986
    2 22592 -22  .044862  .019414  -.0018849007
    2 22592 -21 -.009541  .044862     .01090874
    2 22592 -20  .019513 -.009541   -.002094009
    2 22592 -19  .001454  .019513   -.018156344
    2 22592 -18  .019959  .001454  -.0007767971
    2 22592 -17 -.011267  .019959   .0008321834
    2 22592 -16  -.02267 -.011267    .001602538
    2 22592 -15  -.01215  -.02267  -.0002425309
    2 22592 -14 -.024227  -.01215   -.003120765
    2 22592 -13  .008149 -.024227   -.005951347
    2 22592 -12  .036298  .008149   .0018901667
    2 22592 -11 -.016806  .036298   -.011348456
    2 22592 -10  .028571 -.016806  -.0005825338
    2 22592  -9 -.012858  .028571   -.004710372
    2 22592  -8  .007127 -.012858  -.0008782291
    2 22592  -7 -.011713  .007127    .001869973
    2 22592  -6  .035185 -.011713    -.00105196
    2 22592  -5 -.004293  .035185    .002762586
    end
    Thank You

  • #2
    This can be done with a loop:

    Code:
    . egen group = group(Index)
    
    . summ group
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           group |        100        1.48    .5021167          1          2
    
    . gen coeff = .
    (100 missing values generated)
    
    . forvalues i = 1/`r(max)' {
      2. reg Ret lagret LMSW_2002 if `i'==group
      3. replace coeff = _b[LMSW_2002] if `i'==group
      4. }

    Comment


    • #3
      Obviously by the same procedure any statistic that -regress- produces can be extracted and saved. You just generate as many variables as needed statistics before the loop starts, and then you replace them within the loop just the same way as for coeff.

      Comment


      • #4
        Simpler still, and noticeably faster if your data set is large, is to do it with -rangestat-, written by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC.

        Code:
        rangestat (reg) Ret lagret LMSW_2002, by(Index) interval(LMSW_2002 . .)
        This will save the coefficient in a variable b_LMSW_2002. It will also save the standard error, the number of observations in the estimation sample and the R2.


        Comment


        • #5
          I do not use user-contributed packages, and I am ready to dispute any time the claim that using user-contributed packages is "simpler" and "faster" (in terms of getting the job done, not in terms of how fast the calculations are made).

          In my view, the story goes like this:

          1. Learning how to use loops, that is, learning the loops syntax, takes a couple of hours. Ever after that, everything is the same. You do not need to spend your time on learning new syntax (which for me is not fun at all), but you spend your time on thinking how you can make the syntax you already know work (this is more fun).

          2. User-contributed packages are a zoo of syntaxes. So you end up spending your time learning idiosyncratic syntaxes (which for me is not fun at all), instead of thinking how to make the one syntax you know work (this is more fun).

          Case study -rangestat-.

          The other day on another thread somebody asked how to accomplish similar task like from this thread, however the user wanted to save the Root Mean Squared Error from the regression. The user was already using -rangestat-. I wanted to help, I went and I read the help file of -rangestat- for almost half an hour. I could not figure out how to save the RMSE. At which point I decided that I have more fun things to do, than to dig into the idiosyncratic syntax of a command that does something which I can do with 2 lines of code and without reading any extra help files.

          So far I meant the total time of accomplishing the task (figuring out which user contributed command does what you want to do, reading the help file to see the syntax, doing it).

          In terms of speed of accomplishing the task in the narrow sense, -rangestat- is fast as lightning.

          For a speed test I loaded the nlswork data, and ran a regression of log wage on hours worked for every individual in the data. The speeds in seconds are as follows:

          . timer list
          1: 65.07 / 1 = 65.0740 (the loop in #2)
          2: 0.80 / 1 = 0.7960 (-rangestat-, the syntax that Clyde showed in #4)



          Comment


          • #6
            #5 That view is a trifle exaggerated, I suggest.

            On rangestat (SSC, as you are asked to explain): I am sorry you were disappointed, but the documentation of rangestat nowhere claims that it produces RMSE, so the implication is that you need to get it in other ways. The authors of rangestat (@Robert Picard and friends) are happy to underline that their command, like almost any other, does not purport to produce everything related that users might want. You wouldn't want the arbitrarily long help file that tells you that rangestat doesn't do this, that or the other.

            Indeed, most of the long help file of rangestat is devoted to showing how it is extensible, in particular how you can write extra routines that are compatible with its framework.

            Naturally it is a really good idea for users to master loops directly and I guess I've posted hundreds of answers here showing the same.

            Optimising the trade-off between writing code yourself and trying to work out if someone else has done it already is a dark and difficult art -- for experienced users, too. But dismissing all user-written (community-contributed) commands on the grounds that you'd always prefer to write your own code is a fairly eccentric stance.
            Last edited by Nick Cox; 20 Jan 2019, 08:17.

            Comment


            • #7
              On the theme of simplicity and speed, I shall present one more alternative - the asreg program which can estimate both rolling and by group regression really fast.
              Code:
              ssc install asreg
              bys Index : asreg Ret lagret LMSW_2002
              sort Weeks
              list _* in 1/2
                   +-------------------------------------------------------------------------------+
                   | Index   _Nobs         _R2       _adjR2    _b_lagret   _b_LM~2002      _b_cons |
                   |-------------------------------------------------------------------------------|
                1. |     1      52   .01020853   -.03019112   -.09419167    .23169832   -.00212456 |
                2. |     2      48   .03060532   -.01247888   -.06874114   -.82258502    .00695246 |
                   +-------------------------------------------------------------------------------+
              On a side note, asreg can report RMSE, though it is not demanded in this post.
              Regards
              --------------------------------------------------
              Attaullah Shah, PhD.
              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
              FinTechProfessor.com
              https://asdocx.com
              Check out my asdoc program, which sends outputs to MS Word.
              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

              Comment


              • #8
                Re #5, let me just comment on how I decide which community-contributed packages to use, and when to just write my own code. This may or may not be helpful to anyone else, but I hope it might be.

                Joro Kolev lays out the downsides of community-contributed packages. In fact, he leaves one out: official Stata commands have been developed by an organization with a history of producing and standing behind high quality software and it follows good software development practices. While this doesn't mean there are never any bugs (we wouldn't need the -update- command if there weren't), it means that bugs are infrequent and generally arise only in unusual use cases. With community-contributed software you are taking more of a gamble on the quality of the software, and there is no assurance at all that bugs that are found will be fixed.

                So, the truth is, I don't use very many community-contributed programs either. And when posting here, I am even more conservative about recommending them to others. There are only a few that I regularly tout here. I also enjoy figuring out how to code things from the basics--it's what makes programming fun.

                Nevertheless, there are a few circumstances where I find community-contributed programs the better way to go:

                1. The program solves a problem or class of problems that comes up often in my work. Alternatively, it solves a problem that comes up very seldom in my work, that I don't really know how to solve, and, so, it is not worth my while to invest my time learning to do it myself. (-matchit- is a good example of the latter.)
                2. The code from basics is finnicky to write, or produces a solution that is clunky, or too inefficient.
                3. The community-contributed program does it better than any code that I could come up with (mostly this means that it makes good use of Mata.)
                4. The author of the program is somebody who has consistently produced high-quality programs.

                -rangestat- is a good example of this in my case. The work I do leads me to use it (or the closely related -rangejoin-) several times a week. The code I previously crafted myself to solve the same problems was annoying to write and modify for each application and ran slowly. -rangestat- is very fast (in large part because it uses Mata well). And Robert Picard and Nick Cox are two of the best Stata programmers around. I should also add that the syntax of -rangestat-, except perhaps the (reg) operator and the -interval()- option, will already be completely familiar to users of the official command -collapse-.

                Just sayin'.

                Comment


                • #9
                  I agree 100% with everything Clyde is saying in #8.

                  Regarding Nick's observations in #6:

                  1. I have been using Stata since year 2000. For these 18 years, I used the user-contributed command -nnmatch- and a couple (about 5) of egen user written functions. (Then a couple of years ago I verified that iterated -sureg- produces the maximum likelihood estimates through user written routines -mysureg- by Gould et al 2010, and -cmp- by David Roodman. The last two "uses" are special cases as I already had the results that I needed but wanted to verify that what theory says is empirically true.)

                  So yes, the view that I do not use user contributed packages is "a trifle exaggerated", but really, just a trifle.

                  2. I never said that one "should never ever use user contributed packages" (normative statement). All I said was, for reasons that are very interesting from economics point of view, I end up not using user contributed packages (positive statement).

                  3. Giving rangestat (SSC) as an example was not supposed to diminish in any way the great job the authors of the package have done. Neither was it an expression of disappointment that the package does not accomplish some particular task. It was an example illustrating how things work even with the best of the best user contributed packages--you go, you search for a solution online, assuming you find a possible solution, you dig in the documentation, an you dig some more... and then sometimes you figure out how to do it, and sometimes you figure out that the package does not do what you have in mind.

                  4. Choosing whether to do the job yourself, or to search for user contributed solution in my view is not nearly as "a dark and difficult art" as Nick suggests. In my view, it is really simple: if you see how you can do the job in a couple of lines of code, you dont bother searching for user contributed solutions and reading help files. If the solution would take you days/months/years to program, well, you search on the web and hope that somebody has programmed it for you. Empirically speaking, such solutions that take days/months/years to program are extremely rare, and are mostly provided by Stata Corp themselves. Here are some examples:
                  a) -gmm- and -nlsur- were introduced in Stata 11. I needed both of these things badly since Stata 7. I would have loved to use a user contributed package doing those things. Such a package never arrived.
                  b) I have needed badly all the things that -sem- and -gsem- do, since Stata 7. But they were introduced in Stata 12 by Stata Corp, and no user took the task to program these things.
                  c) I would love if a user sits down and replaces the ancient -reg3- (has all linear system estimators you would find in an Econometrics textbook from the 70ies) with a modern vision of linear system estimators (say all estimators that you can find in Wooldridge, J.M., 2010. Econometric analysis of cross section and panel data, the two chapters on linear system estimation. But nobody has taken the task.

                  In point 4. all that Clyde said about how he chooses whether to go for a user written, or to provide solution himself, is all relevant.



                  Comment


                  • #10
                    Hi all,

                    May I ask a question which is similar to this.

                    I also want to separate regression for each group in Panel data, but I want to save the R_square for each of the regression and make them as a var.

                    how can I do that?

                    Thanks!

                    Comment


                    • #11
                      Hi all,

                      I think I have solved this problem through the asreg program.

                      Thanks.

                      Comment


                      • #12
                        -rangestat- will do that for you. -rangestat- is available from SSC.

                        Comment


                        • #13
                          Hi Clyde,

                          Thanks! that's really helpful!

                          Zihao

                          Comment


                          • #14
                            Zihao Chen as shown in post #7 above, asreg creates several variables from the regression statistics, these include the R2 and adjusted R2.
                            More on asreg can be found here https://fintechprofessor.com/stata-p...ions-in-stata/
                            Regards
                            --------------------------------------------------
                            Attaullah Shah, PhD.
                            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                            FinTechProfessor.com
                            https://asdocx.com
                            Check out my asdoc program, which sends outputs to MS Word.
                            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                            Comment


                            • #15
                              Hello!

                              I kinda stuck in the similar problems but a bit different.

                              Im not having the i.Period of each year from 1970 to 2015
                              and i have the total hours worked(L) of different countries from 1970 to 2015 as well. each country was aligned with a number (Country=1,2......)

                              i need to run the regression for each country to work out the coefficient of i.Period and see if its positive or negative (influence of i.Period to total hours worked). what code should i use?

                              Really appreciate the help!! Thanks!!

                              Comment

                              Working...
                              X