Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Abnormal Returns/ Average return of the market, value-weighted

    Hi everyone.

    I am trying to generate abnormal returns in STATA.
    To do so, I need to calculate the expected returns (Expected return = α+β*R)
    R is the average return of the market, for which I want to do a value weighted calculation (with returns and market capitalization)
    I have a dataset with around 550 companies and for each company I have more than 300 daily returns and the associated market cap for the day.

    I was able to calculate the average market return for one day by doing the following:

    egen num=total(QJ*QK)
    egen den=total(QJ)
    gen wtmean=num/den


    My Variables are named from QJ, QK, QL,...,ATI.
    QJ is daily market cap, QK daily return
    QL is daily market cap, QM daily return
    and so on (every second variable is market cap)

    I want to create wtmean for every day and then calculate the mean, so that I have the average return for the market for the time period I am looking at.

    Can anyone help me on how to create the average return for the market? I already tried foreach but I didn't find the right code, as I don't know how to address only every second variable.

  • #2
    For working with this kind of data, you will need to -rename- your variables and then -reshape- to long layout. Stata is not a spreadsheet, and trying to work with it as if it were is a recipe for pain and failure.

    Code:
    //  CREATE A TOY DATA SET TO DEMONSTRATE THE CODE
    clear*
    set obs 100
    set seed 1234
    foreach x in `c(alpha)' {
        gen q`x' = runiform()
        gen r`x' = runiform()
    }
    order q* r*, alphabetic
    
    
    //  THE CODE BELOW RENAMES THE VARIABLES AND CHANGES THE LAYOUT
    //  TO LONG BY FIRM
    rename (qa-rz) _v#, addnumber
    gen long obs_no = _n
    
    reshape long _v, i(obs_no) j(_j)
    gen which_var = "market_cap" if mod(_j, 2) == 1, before(v)
    replace which_var = "return" if missing(which_var)
    gen firm_num = ceil(_j/2)
    drop  _j
    reshape wide _v, i(firm_num obs_no) j(which_var) string
    rename _v* *
    drop obs_no
    
    //  CALCULATE THE WEIGHTED MEAN RETURN FOR EACH FIRM
    by firm_num: egen num = total(market_cap*return)
    by firm_num: egen den = total(market_cap)
    gen wtmean = num/den
    Note: For simplicity, I have named the variables qa-rz, but if you just change that to QJ-ATI, this should work with your data. Be patient with the -reshape- commands: they may be slow if your data set is very large.

    You should leave your data in this long layout for further work on your analyses. The wide layout you started from is only useful for a few limited purposes, mostly relating to creating tables or graphs for human eyes to read, and a handful of special-purpose commands. For data management and analysis the wide layout makes things difficult to impossible. In particular, going from these mean returns to calculating abnormal returns through a regression cannot be done in wide layout.

    In the future, when requesting help with coding, please post example data, and use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hello Clyde,

      thank you so much for your answer, this helps me a lot!
      I will keep in mind for the future, that working with a long format is better.

      The code works nearly perfect for me. I still have two more questions about it though. Maybe you are able to help here as well

      1. Is there a possibility to add a date to each observation? In my original dataset, each of the return/ market cap variable has a date in its description which is gone after reshaping into long format. It wold be very helpful for further analysis to keep the date somehow in a separate variable.

      2. I want to match the return with the market cap from the day before (market cap from day 1 with the return from day 2, the market cap from day 2 with the return from day 3 and so on)
      Therefore, in my original dataset, I have one day with only market cap at the beginning and one with only return at the end.
      somehow Stata still matches the market cap day 2 with return day 2. This leads to the first firm_num of market caps being without returns and the last firm_num having no market cap but just returns.


      This is an example of the description box for the variables. you can see that the description is the date. The first variable of one date is always the return and the second one is market cap (_v2 is market cap, _v3 return, _v4 market cap,..)

      _v2 9/30/2020
      _v3 9/29/2020
      _v4 9/29/2020
      _v5 9/28/2020
      _v6 9/28/2020


      (I tried to insert a screenshot here but that wasn't possible for some reason)


      With the code you provided, Stata matches _v3 with _v4. This leaves the "first" market cap which is supposed to be matches with the following variable, _v2, alone.

      example:

      firm_ num market_cap return CompanyName
      224 798479 - CompanyA
      224 453439 - CompanyB
      224 123443 - CompanyC
      225 723423 1,32 CompanyA




      Maybe you can help!

      Comment


      • #4
        I don't know what you mean by "description box." And I am not able to visualize what your data look like. For help with this, you need to post actual example data, using the -dataex- command, as recommended in #2. I'm sure there is some way to retain the date information, and use it in the way you describe, but without seeing the actual layout of the data I can't proceed.

        Comment


        • #5
          Hello Clyde,

          Here is the example data from the -dataex- command.


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str65 CompanyName double(_v1 _v2 _v3 _v4)
          "GSW Immobilien AG"                                   0       5894403840 -1.88679245283019 5910893287.23354
          "Draegerwerk AG & Co KGaA"            -.134408602150549 1392782583.91451 -.626666666666659 1407410211.36748
          "Tonkens Agrar AG"                     .442477876106205 7521354.29451648                 0 7519657.36964375
          "BHB Brauholding Bayern Mitte AG"     -13.9240506329114          8432000  16.1764705882353          9796000
          "Stroeer SE & Co KGaA"                -2.66106442577032 3931445305.56908  1.56472261735421     4044565169.4
          "Mountain Alliance AG"                -.854700854700852 39880682.5105874  .862068965517238 40393350.5562207
          "Formycon AG"                         -2.75080906148867 660177874.270744             -1.12 681701723.487843
          "RIB Software SE"                       .54844606946983     1432506872.5 -.905797101449275    1424693198.65
          "IBU tec advanced materials AG"                   -5.25        180025000 -.744416873449124        1.900e+08
          "HanseYachts AG"                      -1.76991150442479      87088907.25 -.877192982456137      88658076.75
          "Consus Real Estate AG"               -.145985401459851 1101968296.46653  -.72463768115943 1108212370.79629
          "Vapiano SE"                          -13.3333333333333      4065867.156                 0       4691385.18
          "Delivery Hero SE"                    -3.06252658443215 28335944540.9727 -6.14770459081837   29271986083.55
          "Metro AG"                             .677535464746984  3451477222.2501  2.60699543775799 3443451598.71464
          "JOST Werke AG"                       -2.39410681399631 788598498.429294  1.11731843575418 811333353.077831
          "AlzChem Group AG"                    -3.17460317460318        248302574  -1.1764705882353        256443642
          "Voltabox AG"                         -1.61764705882352         52934625 -2.57879656160459         53805000
          end

          I was able to find a way how to handle the dates.

          I still haven't found a solution for the second question of #3.

          In the example given in this post, I would want _v2 and _v3 to be in one observation when transforming the data into long format.
          right now, Stata "matches" _v1 and _v2. As described in #3 as well, I already tried to delete _v1 (the first return data) and the last market cap data, so that there is an equal number of market caps and returns. But that did not work.

          Comment


          • #6
            In the example given in this post, I would want _v2 and _v3 to be in one observation when transforming the data into long format.
            I don't know how you handled the dates, but good that you did. I assume you have a new variable called date now, and that it's a proper Stata internal format date variable. You can do this:

            Code:
            xtset firm_num date
            gen lagged_market_cap = L1.market_cap
            Now you have an extra variable, called lagged_market_cap that contains the previous day's market cap and it's in the same observation with today's return--as you wanted. If you are sure you have no use at all for today's market cap, then you can -drop market_cap- to make your data set a little more compact.

            Do read -help tsvarlist- for more information about time-series operators like L1 and other useful ones.

            Comment


            • #7
              Unfortunately, I wasn't able to create a date variable. I selected the variables in the wanted time-frame before converting the data into long format and "losing" the information about the dates.

              Hence why the code didn't work.
              I tried to lag the market cap without a date somehow, but it (as expected) didn't work.

              I had the idea to convert the firm_num into a date (which is obviously not the correct but, but I would just use it for lagging the market cap.)

              The firm_numb is a numerical variable (and as far as I know, in the %td format, Stata will interpret the numbers as of "number of days since January 1, 1960.") and I tried the following code to change it into a date:


              Code:
              tostring firm_num, replace
              gen date1=date(firm_num,"MDY")
              the generated date1 variable is empty though.

              I don't know how to lag the market cap without a date (or how to turn firm_num into a date only used for lagging the market cap).

              Comment


              • #8
                I'm beginning to see the problem differently, but I still don't have a grasp of what's going on. The fact that you would even think of using firm_num as a date variable suggests to me that I misconceived your data organization when I wrote #2. I assumed then that each row in the spreadsheet from which you imported the data originally corresponded to a date, and that each of the QJ-ATI variables corresponded to the market cap and return of a different company. I'm starting to think that I had this backwards: perhaps each of those QJ-ATI variables actually corresponded to market cap and return on different dates, and each spreadsheet row was a different company. It's still unclear to me where those dates are recorded in the data, but I don't think it actually matters: we can follow through on your idea to use the variable "firm_num" (which I now realize is a misnomer) as a date variable. It just has to be implemented differently.

                Code:
                rename firm_num sequential_date
                xtset CompanyName sequential_date
                gen lagged_market_cap = L1.market_cap
                There is just one big concern I have about this approach. It assumes that the original spreadsheet columns QJ-ATI were in correct chronological order. That is, QJ and QK were the market cap and return of one date, and QL and QM correspond to the next date, and so on all the way out. If the dates in a scrambled order then the code shown here will give incorrect results, and there will be no way to make progress without finding a way to retrieve the actual date information and turn it into a date variable.

                Comment


                • #9
                  Yes, you're right! Each spreadsheet row was a different company and the QJ-ATI variables corresponded to market caps and returns on different dates.
                  Luckily, the spreadsheet columns QJ-ATI were in correct chronological order, so the approach with using firm_num as a date can be used.

                  I used the -xset- command and got the following error message:

                  string variables not allowed in varlist;
                  CompanyName is a string variable

                  Comment


                  • #10
                    Code:
                    encode CompanyName, gen(Company)
                    xtset Company sequential_date

                    Comment


                    • #11
                      Code:
                      rename firm_num sequential_date
                      encode CompanyName, gen(Company)
                      xtset Company sequential_date
                      gen lagged_market_cap = L1.market_cap
                      For -xset- the error message "repeated time values within panel" appeared.

                      Comment


                      • #12
                        I have not been following the previous discussion, but I think that Clyde made the assumption that you have panel data. Therefore, you should have at most one "CompanyName" observation for a given "sequential_date". The error implies one of two things: Either, you have some duplicated observations or your data structure is such that you have multiple observations for a given "sequential_date". Which one, only you know. If you cannot figure it out, copy and paste the result of the following:

                        Code:
                        duplicates tag Company sequential_date, gen(dup)
                        dataex in 1/100 if dup

                        Comment


                        • #13
                          I have multiple observations for each CompanyName. For each sequential_date, there are different companies.
                          Lets say for example, that for sequential_date=1 there are Company1, Company2 and Company3. For each of the companies there is a market cap and a return given.
                          For sequential_date=2 there are market caps and returns for the same companies Company1, Company2 and Company3 again.

                          Your code gives the error "no observations; nothing to generate" which means there are no duplicates as far as I know.

                          Comment


                          • #14
                            The only way you get a "repeated time values within panel" error is if you have duplicates. Can you show the result of the following:

                            Code:
                            rename firm_num sequential_date
                            encode CompanyName, gen(Company)
                            count if missing(Company)
                            count if missing(sequential_date)
                            bys Company sequential_date: gen dup= _N>1
                            count if dup

                            Comment


                            • #15
                              It says that I have 9,048 duplicates (result of count if dup)

                              Comment

                              Working...
                              X