Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding a list of variables to each observation

    How do I add a list of variables from a “using” file to every observation (for a list of entities) in the master file? I am attempting to run an event study (and using Stata 15). The variables to be added are dates of events which affect all entities.
    For example, to this data set
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float(month ff)
    2708 657     -.005559
    2708 658 -.0023612394
    2708 659  .0009129993
    2708 660 -.0016764477
    2708 661 -.0025546604
    end
    format %tmYYMon month
    I hope to add the dates of events. For example
    generate event_month = mofd(date("2/15/15", "MDY", 2050))
    will add a single date to each observation
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float(month ff event_month)
    2708 657     -.005559 661
    2708 658 -.0023612394 661
    2708 659  .0009129993 661
    2708 660 -.0016764477 661
    2708 661 -.0025546604 661
    end
    format %tmYYMon month
    however there are many event dates (in a seperate file) which I wish to add to each observation in the master file.
    Thank you for your help, Dan

  • #2
    Hello Dan, why not have a try with the command -merge-? It seems that's the simplest way to meet your goal.
    2B or not 2B, that's a question!

    Comment


    • #3
      Thank you for your prompt response Liu, I don't believe the merge command should work in this context as there is not a common key variable - i.e. there is a sequence of months for each observation for each entity but the events only occur on some specific months. I wish to attach each event's month to each observation. I hope that makes sense? Thank you, Dan

      Comment


      • #4
        sorry Dan, I am still very confused about it. To my best understating of the event study, the event time is used to generate a time dummy variable and the common key variable is id . Besides, you can also generate several event time variables if there are more than one. Given that, you may elaborate more on your research purpose and list the data structure for both the using and master file if possible.
        2B or not 2B, that's a question!

        Comment


        • #5
          I don't have any understanding of event studies, so in what follows I'm taking a try at accomplishing what I understand to be the objective described in post #1.
          Code:
          cls
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float event_month
          661
          665
          669
          673
          end
          format %tmCCYY_Mon event_month
          
          xpose, clear format(%tmCCYY_Mon)
          rename (v#) (ev_mon_#)
          list, clean abbreviate(12)
          generate i = _n
          tempfile events
          save `events'
          
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long id float(month ff)
          2708 657     -.005559
          2708 658 -.0023612394
          2708 659  .0009129993
          2708 660 -.0016764477
          2708 661 -.0025546604
          end
          format %tmCCYY_Mon month
          
          generate i = 1
          joinby i using `events'
          drop i
          list, clean abbreviate(12)
          Code:
          . list, clean abbreviate(12)
          
                 ev_mon_1   ev_mon_2   ev_mon_3   ev_mon_4  
            1.   2015 Feb   2015 Jun   2015 Oct   2016 Feb
          Code:
          . list, clean abbreviate(12)
          
                   id      month          ff   ev_mon_1   ev_mon_2   ev_mon_3   ev_mon_4  
            1.   2708   2014 Oct    -.005559   2015 Feb   2015 Jun   2015 Oct   2016 Feb  
            2.   2708   2014 Nov   -.0023612   2015 Feb   2015 Jun   2015 Oct   2016 Feb  
            3.   2708   2014 Dec     .000913   2015 Feb   2015 Jun   2015 Oct   2016 Feb  
            4.   2708   2015 Jan   -.0016764   2015 Feb   2015 Jun   2015 Oct   2016 Feb  
            5.   2708   2015 Feb   -.0025547   2015 Feb   2015 Jun   2015 Oct   2016 Feb

          Comment


          • #6
            Thanks again for looking over the problem Liu. Yes, I should have added the structure for the event data file to help explain the problem, eg
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float event_month str16 event
            661 "event_1"
            670 "event_2"
            end
            format %tmYYMon event_month
            Thank you William, that is a great help. Do you mind if I summarise the main elements of your solution (so I understand as the project gets more layers and more complexity)? 1. Add i = 1 to all the observations and all the events, 2. store the second ("using") file as a temporary file, and 3. use "joinby" to add the events to each observation.
            I just need to work out how to recognise the variable names (of the events) from the event data file - this will be the means by which I group them for the eventual testing). Thank you again, Dan

            Comment


            • #7
              Dan, before I can comment on your summary, I would appreciate it if you would take the 5-observation example dataset in post #1 and show what you would like to have after you combine it with the 2 observations of event data from post #6.

              Comment


              • #8
                I think William's answer in #5 has already made Dan's question in #1 very clearly solved after the event data are posted in #6. I didn't get Dan's point as I assume the event month should be id-specific.
                But the case itself is very strange as the event month is not id-specific, which would make the event time a constant for all observations. Namely, -gen- would solve your question in #1 if there are only several events ( less than 10 for example).
                2B or not 2B, that's a question!

                Comment


                • #9
                  Thank you for your suggestion of -gen- Liu. Yes, in this case there are a large number of events.
                  Thank you immensely for your follow up William. For completeness, here's my "master" file from #1 (but with your improved monthly formatting)
                  Code:
                  * Examples generated by -dataex-. To install: ssc install dataex
                  clear
                  input long id float(month ff)
                  2708 657     -.005559
                  2708 658 -.0023612394
                  2708 659  .0009129993
                  2708 660 -.0016764477
                  2708 661 -.0025546604
                  end
                  format %tmCCYY_Mon month
                  I am hoping to add the months of events from a second ("using") file which has the following structure.

                  Code:
                  clear
                  input float event_month str16 event
                  661 "event_1"
                  670 "event_2"
                  end
                  format %tmCCYY_Mon month
                  For example, if I manually added the months as variables, the commands would look like this
                  Code:
                  ​​​​​​​
                  generate event_1 = mofd(date("2/15/15", "MDY", 2050))
                  format %tmCCYY_Mon event_1
                  generate event_2 = mofd(date("11/15/15", "MDY", 2050))
                  format %tmCCYY_Mon event_2
                  and the resulting "master" data file would now look like
                  Code:
                    clear
                  input long id float(month ff event_1 event_2)
                  2708 657     -.005559 661 670
                  2708 658 -.0023612394 661 670
                  2708 659  .0009129993 661 670
                  2708 660 -.0016764477 661 670
                  2708 661 -.0025546604 661 670
                  end
                  format %tmCCYY_Mon month
                  format %tmCCYY_Mon event_1
                  format %tmCCYY_Mon event_2
                  following your approach, I'm getting close with
                  Code:
                  use using.dta
                  xpose, clear format(%tmCCYY_Mon)
                  rename (v#) (ev_mon_#)
                  list, clean abbreviate(12)
                  generate i = _n
                  tempfile events
                  save `events'
                  use master.dta
                  generate i = 1
                  joinby i using `events'
                  drop i
                  list, clean abbreviate(12)
                  but I'll need to replace the role of the v# and instruct the joinby to keep the original event variable names. Thank you very much for your help William, Dan

                  Comment


                  • #10
                    Thank you for post #9, which clearly displays your very useful example data and the intended result of the process.

                    Since your events file contains event names along with the event dates, and you want variables whose names are given by the event name (which thus must follow the requirements for Stata variable names), the xpose command is no longer the appropriate command to collapse your events dataset into a single observation to be added onto each of the observations in your master dataset. Instead, we use reshape wide.
                    Code:
                    use events.dta, clear
                    generate i = 1
                    rename event_month month
                    reshape wide month, i(i) j(event) string
                    rename (month*) (*)
                    tempfile events
                    save `events'
                    
                    use master.dta
                    generate i = 1
                    joinby i using `events'
                    drop i
                    list, clean abbreviate(12)
                    describe
                    Code:
                    . list, clean abbreviate(12)
                    
                             id      month          ff    event_1    event_2  
                      1.   2708   2014 Oct    -.005559   2015 Feb   2015 Nov  
                      2.   2708   2014 Nov   -.0023612   2015 Feb   2015 Nov  
                      3.   2708   2014 Dec     .000913   2015 Feb   2015 Nov  
                      4.   2708   2015 Jan   -.0016764   2015 Feb   2015 Nov  
                      5.   2708   2015 Feb   -.0025547   2015 Feb   2015 Nov  
                    
                    . describe
                    
                    Contains data
                      obs:             5                          
                     vars:             5                          
                     size:           100                          
                    ------------------------------------------------------------------------------------------------
                                  storage   display    value
                    variable name   type    format     label      variable label
                    ------------------------------------------------------------------------------------------------
                    id              long    %12.0g                
                    month           float   %tm..                 
                    ff              float   %9.0g                 
                    event_1         float   %tm..                 event_1 month
                    event_2         float   %tm..                 event_2 month
                    ------------------------------------------------------------------------------------------------
                    Sorted by:

                    Comment


                    • #11
                      That is extremely helpful thank you William. Thank you also for the straight forward explanation, Dan

                      Comment


                      • #12
                        Hi again William, I have a couple of additional steps if you are interested to consider?
                        Looking at the data file you created in #10, I have replaced the constant dates (the event_1 etc variables) with the difference between the date in the "month" variable and the date in the event_1 etc variables by using this code.
                        Code:
                        forvalues e = 1/122 {
                        replace event`e' = month - event`e'
                        format event`e' %15.0gc
                        }
                        There are currently 122 events at present but I am hoping to make the code more robust by including a step to calculate the total number of variables which have the stub "event" and use that count number in the forvalues command.
                        (pls note I have also added this question to an older thread on "Counting the number of variables with same stub" at
                        https://www.statalist.org/forums/forum/general-stata-discussion/general/1249342-counting-the-number-of-variables-with-same-stub )
                        Thank you for any suggestions you might have, Dan

                        Comment


                        • #13
                          If I understand correctly, I think the following demonstrates code that you can adapt to count your event variables.
                          Code:
                          . ds
                          z   x1  x2  x3  y
                          
                          . ds x*
                          x1  x2  x3
                          
                          . local nx : word count `r(varlist)'
                          
                          . display `nx'
                          3
                          
                          . forvalues i=1/`nx' {
                            2. display "x`i'"
                            3. }
                          x1
                          x2
                          x3
                          But for what you want to do, why not just loop over the variable list?
                          Code:
                          . foreach xvar of varlist x* {
                            2. display "`xvar'"
                            3. }
                          x1
                          x2
                          x3

                          Comment


                          • #14
                            Thank you William, yes, that makes a lot of sense - ie to use foreach rather than forvalues. Thank you also for the insight on how to produce the count of those variables - which will be useful for my write-up.

                            If you are still interested to point the project in the right direction, I replaced the "event" variables (from the data file format you prepared in #10) with the difference between the "month" variable and the "event" variables. By copying the structure but with two entities (id's), the data file now looks like
                            Code:
                            * Examples generated by -dataex-. To install: ssc install dataex
                            clear
                            input long id float(month ff event_1 event_2)
                            2708 657     -.005559 -1 -3
                            2708 658 -.0023612394 0 -2
                            2708 659  .0009129993 1 -1
                            2708 660 -.0016764477 2 0
                            2708 661 -.0025546604 3 1
                            2709 657     -.005559 -1 -3
                            2709 658 -.0023612394 0 -2
                            2709 659  .0009129993 1 -1
                            2709 660 -.0016764477 2 0
                            2709 661 -.0025546604 3 1
                            end
                            format %tmCCYY_Mon month
                            I am now trying to calculate an average across the "ff" observations when they have common "event" characteristics. Specifically, I am averaging “ff” where the variables of the “event”'s are -2, -1, 0, 1 or 2. The result should be a list of the average “ff” for each of these "event" variables - which I've renamed "diff" to stand for the difference between the month and an event ate. The result should look like
                            Code:
                            clear
                            input float (diff ave_ff)
                            -2 -0.0025
                            -1 -0.00025
                            0 -0.0035
                            1 0.0005
                            2 -0.0045
                            end
                            My initial plan of attack is to (1) sort by the event variables, (2) produce new observations from each of the event variables, (3) a second sort by the event variables, (4) remove observations with event values outside of the -2 to 2 range, and then (4) collapse to averages for each of the required event (renamed diff) observations.

                            I had a go at posting this question as a stand alone question at
                            https://www.statalist.org/forums/for...haracteristics
                            But I have failed to communicate the task clearly in that posting. Any observations would be greatly appreciated, thank you, Dan

                            Comment


                            • #15
                              This example applied to your data may point you in a useful direction.
                              Code:
                              . reshape long event_, i(id month) j(j)
                              (note: j = 1 2)
                              
                              Data                               wide   ->   long
                              -----------------------------------------------------------------------------
                              Number of obs.                       10   ->      20
                              Number of variables                   5   ->       5
                              j variable (2 values)                     ->   j
                              xij variables:
                                                      event_1 event_2   ->   event_
                              -----------------------------------------------------------------------------
                              
                              . rename event_ event
                              
                              . sort event
                              
                              . list, noobs sepby(event)
                              
                                +-----------------------------------------+
                                |   id      month   j          ff   event |
                                |-----------------------------------------|
                                | 2709   2014 Oct   2    -.005559      -3 |
                                | 2708   2014 Oct   2    -.005559      -3 |
                                |-----------------------------------------|
                                | 2708   2014 Nov   2   -.0023612      -2 |
                                | 2709   2014 Nov   2   -.0023612      -2 |
                                |-----------------------------------------|
                                | 2709   2014 Dec   2     .000913      -1 |
                                | 2709   2014 Oct   1    -.005559      -1 |
                                | 2708   2014 Oct   1    -.005559      -1 |
                                | 2708   2014 Dec   2     .000913      -1 |
                                |-----------------------------------------|
                                | 2708   2014 Nov   1   -.0023612       0 |
                                | 2709   2015 Jan   2   -.0016764       0 |
                                | 2708   2015 Jan   2   -.0016764       0 |
                                | 2709   2014 Nov   1   -.0023612       0 |
                                |-----------------------------------------|
                                | 2708   2014 Dec   1     .000913       1 |
                                | 2709   2014 Dec   1     .000913       1 |
                                | 2708   2015 Feb   2   -.0025547       1 |
                                | 2709   2015 Feb   2   -.0025547       1 |
                                |-----------------------------------------|
                                | 2709   2015 Jan   1   -.0016764       2 |
                                | 2708   2015 Jan   1   -.0016764       2 |
                                |-----------------------------------------|
                                | 2708   2015 Feb   1   -.0025547       3 |
                                | 2709   2015 Feb   1   -.0025547       3 |
                                +-----------------------------------------+
                              
                              . collapse (mean) ave_ff=ff if inlist(event,-2,-1,0,1,2), by(event)
                              
                              . list, noobs
                              
                                +-------------------+
                                | event      ave_ff |
                                |-------------------|
                                |    -2   -.0023612 |
                                |    -1    -.002323 |
                                |     0   -.0020188 |
                                |     1   -.0008208 |
                                |     2   -.0016764 |
                                +-------------------+

                              Comment

                              Working...
                              X