Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Event Study with r2000 'no observations' coding issue

    Dear Statalist,

    I'd really appreciate any help members have to offer with my problem.

    I'm running an event study on the princeton method (http://dss.princeton.edu/online_help...ventstudy.html). I've successfully merged my data for the FTSE350, with 72 events over ~29 years, and i am now trying to organise and clean the data but have run into trouble.

    My steps:
    1. i calculated 'dif' which counts the days to each event (based on trading days) - this worked fine.
    Code:
    sort group_id date
    by group_id: gen datenum=_n
    by group_id: gen target=datenum if date==event_date
    egen td=min(target), by(group_id)
    drop target
    gen dif=datenum-td
    2. Now i try to create a variable that ensures there are the minimum number of observations after
    the eventdate as well as the min number of obs before the event window for the estimation window.
    Code:
    by group_id: gen event_window=1 if dif>=1 & dif<=20
    
    egen count_event_obs=count(event_window), by(group_id)
    
    by group_id: gen estimation_window=1 if dif<-10 & dif>=-136
    
    egen count_est_obs=count(estimation_window), by(group_id)
    
    replace event_window=0 if event_window==.
    
    replace estimation_window=0 if estimation_window==.
    
    drop if count_event_obs < 20
    
    drop if count_est_obs < 126


    However at this stage none of the observations are dropped since the values for 'count_event_obs, count_est_obs' are always 7239 (which is the same as the maximum of 'datenum'). So when i later try to regress the model with loops it just gives the 'no observations' r2000 error message (because there is no return ['ret'] for the first company on that date). All the data is in numeric form.

    How do i code the above to recognise this properly?

    Below is a sample from company 1 (where there is no data for return) for the first event.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(company_id ret) float(rmrf smb hml) str9 date byte set str9 event_date byte group_id int(datenum td) byte(dif event_window estimation_window predicted_return id) int(count_event_obs count_est_obs target)
    1 . -.00042 -.003461 -.000548 "29-Sep-89" 1 "16-Oct-89" 1 260 271 -11 0 1 . 1 7239 7239   .
    1 .  -.0005  .000021  .000906 "02-Oct-89" 1 "16-Oct-89" 1 261 271 -10 0 0 . 1 7239 7239   .
    1 . -.00031 -.005591 -.002691 "03-Oct-89" 1 "16-Oct-89" 1 262 271  -9 0 0 . 1 7239 7239   .
    1 . -.00032 -.007059 -.000465 "04-Oct-89" 1 "16-Oct-89" 1 263 271  -8 0 0 . 1 7239 7239   .
    1 . -.00042  -.01392  .005066 "05-Oct-89" 1 "16-Oct-89" 1 264 271  -7 0 0 . 1 7239 7239   .
    1 .  .00051  .005306 -.007944 "06-Oct-89" 1 "16-Oct-89" 1 265 271  -6 0 0 . 1 7239 7239   .
    1 . -.00006 -.000687 -.002605 "09-Oct-89" 1 "16-Oct-89" 1 266 271  -5 0 0 . 1 7239 7239   .
    1 . -.00044  .003091  .000139 "10-Oct-89" 1 "16-Oct-89" 1 267 271  -4 0 0 . 1 7239 7239   .
    1 . -.00015 -.000951  .004418 "11-Oct-89" 1 "16-Oct-89" 1 268 271  -3 0 0 . 1 7239 7239   .
    1 . -.00053  .014565 -.003523 "12-Oct-89" 1 "16-Oct-89" 1 269 271  -2 0 0 . 1 7239 7239   .
    1 . -.00006  .006328  .002321 "13-Oct-89" 1 "16-Oct-89" 1 270 271  -1 0 0 . 1 7239 7239   .
    1 .  .00041  .006482 -.005457 "16-Oct-89" 1 "16-Oct-89" 1 271 271   0 0 0 . 1 7239 7239 271
    1 .  .00032  .003703 -.002295 "17-Oct-89" 1 "16-Oct-89" 1 272 271   1 1 0 . 1 7239 7239   .
    1 . -.00025 -.001736  .003873 "18-Oct-89" 1 "16-Oct-89" 1 273 271   2 1 0 . 1 7239 7239   .
    end
    Thank you

  • #2
    Welcome to Statalist, and thank you for taking the time to post sample data using the dataex command.

    As an initial piece of advice, which doesn't seem to directly address your problem, let me suggest that you need to convert your strings date and event_date from "human readable form" (HRF) to Stata Internal Format (SIF) daily date values. Sorting your strings ensures that after sorting, 01-Apr of some year will become the first observation of your data, and so forth.

    You will want to follow the instructions for converting your string variable to a SIF daily date variable following the guidance in help datetime, which is without a doubt the most visited documentation on my system, with the second-most-visited being Chapter 24 (Working with dates and times) of the Stata User's Guide PDF available from the PDF Documentation item on Stata's Help menu.

    Before working with dates and times, any Stata user should read the very detailed Chapter 24 thoroughly. After that, the help datetime documentation will usually be enough to point the way. Some people may be able to remember everything without have to continually refer to the documentation, but I for one am not such a person.

    Now, looking at the results you obtained for dif, it appears to me that you may have done so at some point, and then lost the SIF date, or else you have only been working with a very small subset of your data, or your sample data does not reflect the data the code was applied to. It is important that the data as presented is the data that the code was applied to.

    So perhaps you can reflect on this, and clarify your problem a little further.

    Comment


    • #3
      Hi William,

      Thanks for taking the time to reply.

      I suspect that the reason the date and event_date read as string variables is that the sample was copied out of the stata table, into excel and emailed before being put back into stata (taking on the excel default (?)).

      When i initially converted the dates from string to numeric i used this code:
      Code:
      gen date2 = date(date, "DMY")  // format date2 %td
      In my current dataset all variables are float, other than company_id which is simply an integer. Is there an advantage to converting into SIF? The date has sorted over the time period perfectly throughout - in the form 'DDmonYYYY'.

      I think that my problem is that while i have rows for all the dates on which each event occurs, there is not [daily stock] return data for every company on these (or the windows) dates - only a '.' . So when i run my regression it cannot regress on no return data, hence the 'r2000' error. Would i be able to code something that will ignore the cases where 'returns' has no value (some form of 'capture' perhaps) ?

      Regression:
      Code:
      gen predicted_return=.
      egen id=group(group_id)
      forvalues i=1(1)350 {
      l id group_id if id==`i' & dif==0
      reg ret rmrf smb hml if id==`i' & estimation_window==1
      predict p if id==`i'
      replace predicted_return = p if id==`i' & event_window==1
      drop p
      }
      I re-ran the program and now have figures for the count_event_obs and count_est_obs that simply match the respective window lengths (this is correct i think). This screenshot shows my problem i hope, it's an example of event no.1 for company_id 1 (in total there are 24,500 combinations of the 72 events for the 350 companies) with no return data.

      Click image for larger version

Name:	image (2).png
Views:	1
Size:	147.6 KB
ID:	1407098

      Comment


      • #4
        I had a look at the Princeton pages referred to in #1 and I note that the code uses old merge syntax and is at times overly complicated. For those who want to try this out, the two test datasets can be saved to Stata's current directory using:
        Code:
        copy "http://dss.princeton.edu/sampleData/eventdates.dta" "eventdates.dta", replace
        copy "http://dss.princeton.edu/sampleData/stockdata.dta" "stockdata.dta", replace
        The whole data preparation page code can be reduced to:
        Code:
        use "eventdates.dta", clear
        by company_id: gen event_id = _n
        joinby company_id using "stockdata.dta"
        egen group_id = group(company_id event_id)
        isid group_id date, sort
        save "data2useRP.dta", replace
        The code that prepares the data for estimation can be reduced to
        Code:
        use "data2useRP.dta", clear
        by group_id: gen tradeday = _n
        by group_id: gen event_obs = event_date == date
        * the number of trading days from the event date
        bysort group_id (event_obs): gen days = tradeday - tradeday[_N] if event_date[_N] == date[_N]
        
        gen w_event = inrange(days,-2,2)
        gen w_estim = inrange(days,-60,-31)
        by group_id: egen N_w_event = total(w_event)
        by group_id: egen N_w_estim = total(w_estim)
        
        drop if N_w_event < 5
        drop if N_w_estim < 30
        From there, you can use rangestat (from SSC) to perform the regressions by group_id using returns from the estimation window. You can then calculate the predicted returns over the event window. From there, the rest of the calculations follow the Princeton code.
        Code:
        * Estimating Normal Performance
        gen ret2use = ret if w_estim
        rangestat (reg) ret2use market_return, interval(group_id 0 0)
        gen p_return = b_market_return * market_return + b_cons if w_event
        
        * Abnormal and Cumulative Abnormal Returns
        gen abnormal_return = ret - p_return
        by group_id: egen cumulative_abnormal_return = total(abnormal_return)
        
        * Testing for Significance
        by group_id: egen ar_sd = sd(abnormal_return) 
        gen test =(1/sqrt(N_w_event)) * ///
            (cumulative_abnormal_return /ar_sd) 
        
        * Testing Across All Events
        reg cumulative_abnormal_return if days==0
        I haven't tried to figure out what's wrong in #1 but I hope this simpler approach will make it easier to track down the problem.

        Comment


        • #5
          Hi Robert,

          Thank you for the updated code - i imagine everyone who has been using the outdated code will also be very grateful.

          I was able to run the code successfully on a reduced dataset, however, when running with the whole dataset (~11GB) i received the error:
          Code:
          get_indices():  3900  unable to allocate real <tmp>[177355500,2]
                           <istmt>:     -  function returned error
          Suggesting that the OS doesn't have sufficient memory to compute this. Typing 'memory' gave these results:

          Click image for larger version

Name:	t (2).png
Views:	1
Size:	24.0 KB
ID:	1407115

          Do you have any suggestions to help?

          Regards

          Comment


          • #6
            Maybe I'm missing something but how does 72 events balloon to 11GB of data? Are you sure that you combined the data correctly. How many groups (firm event) do you have? You can find out using:
            Code:
            by group_id: gen one = _n == 1
            tab one
            You should first make sure that each group has only one observation per date. Then make sure that the event_date is constant within each group. You can do this using:
            Code:
            isid group_id date, sort
            bysort group_id (event_date): assert event_date[1] == event_date[_N]
            If you are really dealing with a 11GB dataset, you will need to be more efficient memory wise. Fortunately, there's plenty you can do. Here's a revised example that prunes out the excess fat:
            Code:
            use "data2useRP.dta", clear
            
            * the number of observations and the number of groups of firm and event
            by group_id: gen one = _n == 1
            tab one
            
            * make sure that you have one observation per date in each group
            isid group_id date, sort
            
            * make sure that you have one event per group
            bysort group_id (event_date): assert event_date[1] == event_date[_N]
            
            * reduce to variables needed
            keep group_id date event_date ret market_return
            
            * drop groups with no date observation on event_date
            by group_id: gen long tradeday = _n
            by group_id: gen byte event_obs = event_date == date
            bysort group_id (event_obs): drop if event_obs[_N] == 0
            
            * the number of trading days from the event date
            by group_id: gen long days = tradeday - tradeday[_N]
            drop tradeday event_obs
            
            * identify observations within the event and estimation windows
            gen byte w_event = inrange(days,-2,2)
            gen byte w_estim = inrange(days,-60,-31)
            
            * reduce to observations within the event and estimation windows
            keep if w_event | w_estim
            
            * drop groups when there are insufficient trading days within the windows
            by group_id: egen N_w_event = total(w_event)
            by group_id: egen N_w_estim = total(w_estim)
            drop if N_w_event < 5
            drop if N_w_estim < 30
            drop N_w_estim
            
            * recover as much memory as possible
            compress
            
            * Estimating Normal Performance
            gen ret2use = ret if w_estim
            rangestat (reg) ret2use market_return, interval(group_id 0 0)
            drop reg_*_r2 se_*
            gen p_return = b_market_return * market_return + b_cons if w_event
            
            * Abnormal and Cumulative Abnormal Returns
            gen abnormal_return = ret - p_return
            by group_id: egen cumulative_abnormal_return = total(abnormal_return)
            
            * Testing for Significance
            by group_id: egen ar_sd = sd(abnormal_return) 
            gen test =(1/sqrt(N_w_event)) * ///
                (cumulative_abnormal_return /ar_sd) 
            
            * Testing Across All Events
            reg cumulative_abnormal_return if days==0
            If you do not have enough memory to run this code, then you can split your data in parts and reduce to observations where days == 0. You then use append to combine the results from each part and then run the final regression.

            Comment


            • #7
              Thanks Robert, reducing the observations to those inside the event and estimation windows cut the dataset size down to ~400MB - well within the capability of my computer.
              The 'tab one' returned 1 with a frequency of 24500 (which is just the 70 events * the 350 firms).

              Since I am trying to run a Fama-French 3 Factor model (instead of the market model) is the following code appropriate for estimating normal performance? (rmrf = Market Return- Risk Free Return, rf = Risk Free Return)

              Code:
              gen ret2use = ret if w_estim
              rangestat (reg) ret2use rmrf smb hml, interval(group_id 0 0)
              drop reg_*_r2 se_*
              gen p_return = rf + b_rmrf * rmrf + b_smb * smb + b_hml * hml + b_cons if w_event
              My aim is then having calculated the abnormal returns for each day, to rank the individual stocks according to their AR on the event day, and construct portfolios from the top (winner) and bottom (loser) deciles. The contrarian strategy return (long the winners, short the losers) can then be measured by the average cumulative abnormal return over the event window. Would this be achievable in stata or do i have to pull the data out to excel?

              Comment


              • #8
                Glad to see that the code now works with your data. Sorry but I'm not in finance so I can't help on your additional questions. If no one else jumps in, you might want to start a new thread.

                Comment


                • #9
                  Hello,

                  I have read this thread with big interest, since I am trying to conduct an event study with Stata, too.
                  Unfortunately, I am an all new user of Stata, and although I am reading and trying a lot, I do not fully understand the method you describe.
                  To be more precise, in this moment I am struggling to merge my event dates with the stock and market returns. I cannot figure out how to allocate the identifying variable to the stock returns,
                  which might be due to the fact that they're in the wide format (as opposed to the event dates). I hope that you see my problem, below you can see the two data sets, the first one being the event dates and the second one the companies with the daily returns.
                  I hope I have described my problem properly, if not, please do not hesitate to ask.

                  I am very grateful for any advice, thanks a lot in advance!

                  Attached Files

                  Comment


                  • #10
                    Hi guys.

                    I was wondering, for the calculation of the test statistic, it can only be applied if you have an event window which is 2+ days.

                    What if you want the test statistic for event day AR only?

                    Event window would be dif>0 & dif<0.

                    You cannot calculate the test statistic in that instance. So how do you calculate it? Does it mean that the sd will be calculated using the estimation window then?

                    Thanks for any replies.

                    Comment


                    • #11
                      If I understood the very first question right, one practical solution would be:

                      Code:
                      gen present = !missing(ret)
                      by group_id: egen count_event_obs = total(present*event_window)
                      by group_id: egen count_est_obs = total(present*estimate_window)
                      drop if count_event_obs < 20
                      drop if count_est_obs < 126
                      Just in case if anybody needs it.

                      Comment


                      • #12
                        Originally posted by Robert Picard View Post
                        I had a look at the Princeton pages referred to in #1 and I note that the code uses old merge syntax and is at times overly complicated. For those who want to try this out, the two test datasets can be saved to Stata's current directory using:
                        Code:
                        copy "http://dss.princeton.edu/sampleData/eventdates.dta" "eventdates.dta", replace
                        copy "http://dss.princeton.edu/sampleData/stockdata.dta" "stockdata.dta", replace
                        The whole data preparation page code can be reduced to:
                        Code:
                        use "eventdates.dta", clear
                        by company_id: gen event_id = _n
                        joinby company_id using "stockdata.dta"
                        egen group_id = group(company_id event_id)
                        isid group_id date, sort
                        save "data2useRP.dta", replace
                        The code that prepares the data for estimation can be reduced to
                        Code:
                        use "data2useRP.dta", clear
                        by group_id: gen tradeday = _n
                        by group_id: gen event_obs = event_date == date
                        * the number of trading days from the event date
                        bysort group_id (event_obs): gen days = tradeday - tradeday[_N] if event_date[_N] == date[_N]
                        
                        gen w_event = inrange(days,-2,2)
                        gen w_estim = inrange(days,-60,-31)
                        by group_id: egen N_w_event = total(w_event)
                        by group_id: egen N_w_estim = total(w_estim)
                        
                        drop if N_w_event < 5
                        drop if N_w_estim < 30
                        From there, you can use rangestat (from SSC) to perform the regressions by group_id using returns from the estimation window. You can then calculate the predicted returns over the event window. From there, the rest of the calculations follow the Princeton code.
                        Code:
                        * Estimating Normal Performance
                        gen ret2use = ret if w_estim
                        rangestat (reg) ret2use market_return, interval(group_id 0 0)
                        gen p_return = b_market_return * market_return + b_cons if w_event
                        
                        * Abnormal and Cumulative Abnormal Returns
                        gen abnormal_return = ret - p_return
                        by group_id: egen cumulative_abnormal_return = total(abnormal_return)
                        
                        * Testing for Significance
                        by group_id: egen ar_sd = sd(abnormal_return)
                        gen test =(1/sqrt(N_w_event)) * ///
                        (cumulative_abnormal_return /ar_sd)
                        
                        * Testing Across All Events
                        reg cumulative_abnormal_return if days==0
                        I haven't tried to figure out what's wrong in #1 but I hope this simpler approach will make it easier to track down the problem.
                        I tried the code, I'm using stata 16 and it seems that
                        Code:
                        rangestat (reg) ret2use mkt_ret, interval(com_id 0 0) command rangestat is unrecognized r(199);
                        & if I change to
                        Code:
                        reg ret2use mkt_ret, interval(com_id 0 0) option interval() not allowed r(198);
                        What went wrong?

                        Comment


                        • #13
                          you can use rangestat (from SSC)
                          Code:
                          ssc install rangestat, replace

                          Comment


                          • #14
                            I appreciate all the time everyone has spent helping us with event studies.


                            A few questions about this post.

                            1) Does anyone have or know where I can get the Princeton data files mentioned in #4 above?

                            These files are no longer located at:

                            "http://dss.princeton.edu/sampleData/eventdates.dta"
                            "http://dss.princeton.edu/sampleData/stockdata.dta"


                            2) If the files are no longer available, could someone at least tell me what variables are in each file?


                            3) Does the above code (Robert P's code in #6) account for Saturdays and Sundays (the difference between calendar time and business time)? For example, if an event is on a Monday, the previous relevant day would be Friday, not Sunday.


                            Thanks in advance.


                            Comment

                            Working...
                            X