Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Large dataset of events : Event study

    Hello,

    I am a PhD Student and I am trying to conduct an event study of 50000 events.
    When merging event data with return data stata takes time and blocks. I can't have the event study results.

    Is there any tips or solution for dealing with large dataset of event?

    Thank you in advance.

  • #2
    You don't provide much information. It would be helpful to know the number of observations and variables in each of the data sets, and also to show the exact command you are using for the -merge-. You should also state how long it ran before you concluded that it is not working. Did your computer actually crash? How did you recognize the crash--did it freeze up entirely so you could not launch any other applications? Or did Stata just appear to be "hung." You say it "blocks"--I don't know what that means. Did Stata issue any error messages along the way? If so, what did they say?

    All of that said, it is most likely that nothing is wrong and you are just not patient enough. -merge- is a slow command: it has to sort both data sets if they are not already sorted on the merge key variables, and it also has to read and write the data to disk several times. Sorting is slow, and the time required grows faster than just in proportion to the size of the data set. And disk operations will be even slower than you are accustomed to if you are accessing files on a remote computer.

    Comment


    • #3
      Thank you for your aswer!

      I have the event dataset with 58,099 observations and the return dataset with 7,492,376 observations.

      I use the eventstudy2 command.

      . eventstudy2 PERMNO Daily_date using security_return, returns(Return)
      Generating dateline ...
      ...succeeded
      Preparation of event list ...
      ...succeeded
      Preparation of security return data...
      ...succeeded
      Merging event dates and stock market data...


      Stata appear to be hung and the computer crash at this step of the code, and I am forced to close and stop stata to continue using the computer.


      Comment


      • #4
        OK. I'm not familiar with the -eventstudy2- command. Understandably, it does not show the -merge- command that it is using, so it is hard to say more about this.

        I am forced to close and stop stata to continue using the computer.
        So you are unable to, for example, minimize the Stata window and then launch some other application? Or, if you are running Windows, you are unable to bring up the Task Manager?

        You still didn't say how long you waited before concluding that Stata was in trouble and your computer frozen.

        Try this: in each of the two data sets, -drop- all of the variables except the ones that are absolutely needed for the -eventstudy2- command, and run the -compress- command. Then save each data set under a new name. If the original data sets were on a remote computer, save these compressed data sets on the local hard drive. Then try running -eventstudy2- again. Immediately after you launch -eventstudy2-, launch the Task Manager program (or the equivalent if you are running on a Mac or Unix) so you can actually see whether Stata and other processes are still running or Stata is just taking a long time to do a large compute and disk intensive task.

        Comment


        • #5
          I droped the variables that I don't need and I compressed my 2 datatsets.
          I run the eventstudy2 command it tooks more than 1 hour and stata hung(black screen).
          I launched the task manager to stop Stata.

          Comment


          • #6
            I'm not sure whether the black screen is really evidence that Stata hung. For example, my setup goes to black screen after a set period of time with no user activity, but the programs continue to run in the background. The fact that you were able to launch Task Manager shows that your system did not actually crash. When you looked at Task Manager, were there "signs of life" for the Stata application (changing amounts of CPU or memory allocation?) Again, I'm not convinced there is actually a problem here. You said you ran it for an hour, but I would not expect a merge of that size to complete in an hour on a normal desktop or high-end laptop. I would think of this more as an "overnight" job. Consider waiting until you are ready to leave for the day and then starting it up again and letting it run overnight. If the process is not actually hanging, I do think a merger of that size would complete within 8 hours on a typical desktop or laptop.

            Comment


            • #7
              Hi Meryem,

              I concur; the merge will likely take an entire night to run. Don't forget to edit the power saving settings on your computer to insure that it doesn't go to sleep, hibernate, update, or restart in the middle of the night while eventstudy2 is running!

              For reference, you can see the implementation of -eventstudy2- here: http://fmwww.bc.edu/repec/bocode/e/eventstudy2.ado. This is a large file, but you can use the find feature and search for "Merging event dates and stock market data..." to get to the relevant section of code. You can always download and edit this file and add your own -noisily: display- commands to get more console output, and therefore more of a sense of where you are in the script as it runs. Reading this is frankly overwhelming at first (at least for me), but taking some time to try to understand the underlying code is often worthwhile, particularly with free-to-use third party commands like this.

              Comment


              • #8
                Dear all,

                An event study with 50 K events is huge. I recommend to Meryem the following:

                1) read my Stata Journal publication on "Event studies using daily stock returns in...." There is a section that simulates run times using different numbers of events and models. Note that these analyses were executed on an HPC with 256 GB RAM.

                2) use the nokolari and saveram (undocumented) options
                3) gradually increase sample sizes to 100 events, 1000, 5000,...., have a look at RAM consumptions while executing, and find out the critical number of events.

                Best
                Thomas Kaspereit

                Comment

                Working...
                X