Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Binary variable

    Hi,
    my sample consists of M&A deals and peer companies which are not filtered yet. For each observation there is a unique identifier for the target company (Target6digitCUSIP) and the acquirer (a_cusip_six) of the deal as well as the year. The variable deal determines if the observation is a deal (1) or not (0). If it is not a deal there variable Target6digitCUSIP is empty.Since I want to proceed with Propensity Score Matching to create my final sample, I want to follow Beta and Li (2014) and exclude every company in year t which was part in a deal in the last 3 years, either as an acquirer or a target company.
    I tried many formulas but none of them led to the correct result for the variable participated which should be 1 if the company participated in a deal and 0 otherwise.

    Here is an excerpt of my data.

    Target6digitCUSIP deal Year a_cusip_six participated
    4C8250 1 2014 000307
    7C7919 1 2015 000307
    7C1485 1 2015 000307
    5C7040 1 2015 000307
    *empty* 0 2018 000307
    *empty* 0 2021 00032Q
    *empty* 0 2022 00032Q

    Excuse me if I provided something incorrectly, it's my first time using the forum. Many thanks for your support in advance.

    Best,
    Sebastian
    Last edited by Sebastian Grein; 18 Oct 2023, 08:02.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str7 target6digitcusip byte deal int year str6 a_cusip_six
    "4C8250"  1 2014 "000307"
    "7C7919"  1 2015 "000307"
    "7C1485"  1 2015 "000307"
    "5C7040"  1 2015 "000307"
    "*empty*" 0 2018 "000307"
    "*empty*" 0 2021 "00032Q"
    "*empty*" 0 2022 "00032Q"
    end
    
    preserve
    rename target6digitcusip cusip1
    rename a_cusip_six cusip2
    gen `c(obs_t)' obs_no = _n
    reshape long cusip, i(obs_no)
    drop obs_no _j
    duplicates drop
    tempfile deals
    save `deals'
    
    restore
    gen low = year - 3
    gen high = year - 1
    clonevar cusip = target6digitcusip
    rangejoin year low high using `deals', by(cusip)
    drop if !missing(year_U)
    drop cusip *_U
    clonevar cusip = a_cusip_six
    rangejoin year low high using `deals', by(cusip)
    drop if !missing(year_U)
    drop cusip *_U
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    You do not define what "last three years" means: it could mean the current year and the two preceding, or the preceding year and the two years preceding that. I've assumed the latter. You will need to change the code if you meant otherwise.

    In the future, when showing example data, please use the -dataex- command, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi Clyde, many thanks for your fast response. You assumed the right regarding the 3 years, I meant the 3 years prior to the deal.

      Unfortunately, I get the error message 'no xij variables found' when trying to use the reshape command. I tried to understand by studying the command but I couldn't find an explanation.
      Last edited by Sebastian Grein; 19 Oct 2023, 02:33.

      Comment


      • #4
        I tested the code on your example data before posting it and it ran with no difficulty and produced a correct solution. So your example data is in some way not representative of your real data set. Please post a new data example that reproduces the problem you are having and I will troubleshoot.

        Comment


        • #5
          I want to provide a representative dataset but did not find a way to start the example data with a certain observation with the command dataex. Is there any way to start with a certain observation from which on you can specify the count() option? The first e.g. 100 observations are for example not representative and therefore I need to start from an observation which is representative.
          Last edited by Sebastian Grein; 19 Oct 2023, 15:37.

          Comment


          • #6
            Yes. Let's say you want to start from observation #249 and show 100 observations. You would issue the command:
            Code:
            dataex in 249/L, count(100)

            Comment


            • #7
              This is a representative example, especially due to company 531229 being an acquirer and a target.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float Year str6(a_cusip_six Target6digitCUSIP) float deal
              1999 "530370" ""       0
              2000 "530370" "17881X" 1
              1999 "530436" ""       0
              2000 "530436" ""       0
              2016 "53046P" ""       0
              2017 "53046P" ""       0
              2018 "53046P" ""       0
              1999 "530582" ""       0
              2000 "530582" ""       0
              2001 "530582" ""       0
              2002 "530582" ""       0
              2003 "530582" ""       0
              2008 "53071M" "15100A" 1
              2018 "53115L" ""       0
              2019 "53115L" ""       0
              2020 "53115L" ""       0
              2021 "53115L" ""       0
              2022 "53115L" ""       0
              2013 "531229" ""       0
              2014 "531229" ""       0
              2015 "531229" "531229" 1
              2015 "531229" "531229" 1
              2015 "531229" "531229" 1
              2016 "531229" "24776M" 1
              2016 "531229" "24776M" 1
              2017 "531229" ""       0
              2017 "531229" ""       0
              2018 "531229" ""       0
              2018 "531229" ""       0
              2019 "531229" ""       0
              2019 "531229" ""       0
              2020 "531229" ""       0
              2020 "531229" ""       0
              2021 "531229" ""       0
              2021 "531229" ""       0
              2022 "531229" ""       0
              2022 "531229" ""       0
              2014 "531465" ""       0
              2015 "531465" ""       0
              2016 "531465" ""       0
              2017 "531465" ""       0
              2018 "531465" ""       0
              2019 "531465" ""       0
              2020 "531465" ""       0
              2021 "531465" ""       0
              2022 "531465" ""       0
              2021 "53190C" ""       0
              2022 "53190C" ""       0
              1999 "531914" ""       0
              2000 "531914" ""       0
              2001 "531914" ""       0
              2002 "531914" ""       0
              2003 "531914" ""       0
              2004 "531914" ""       0
              2005 "531914" ""       0
              2006 "531914" "42853H" 1
              2007 "531914" ""       0
              2008 "531914" ""       0
              2009 "531914" "35800E" 1
              2010 "531914" ""       0
              2011 "531914" ""       0
              2012 "531914" ""       0
              2013 "531914" ""       0
              2014 "531914" ""       0
              2015 "531914" ""       0
              2016 "531914" ""       0
              2017 "531914" ""       0
              2018 "531914" ""       0
              2019 "531914" ""       0
              2020 "531914" ""       0
              2021 "531914" ""       0
              2022 "531914" ""       0
              1999 "531925" ""       0
              2000 "531925" ""       0
              1999 "531927" ""       0
              2000 "531927" ""       0
              2001 "531927" ""       0
              2002 "531927" ""       0
              2003 "531927" ""       0
              2004 "531927" ""       0
              end

              Comment


              • #8
                The only substantial difference between this example and the previous one that I can see is that the Year and Target6digitCUSIP variables are no longer in lower case. When I adapt the code accordingly (Stata variable names are case sensitive) it runs properly:
                Code:
                . * Example generated by -dataex-. For more info, type help dataex
                . clear
                
                . input float Year str6(a_cusip_six Target6digitCUSIP) float deal
                
                          Year  a_cusip~x  Target6~P       deal
                  1. 1999 "530370" ""       0
                  2. 2000 "530370" "17881X" 1
                  3. 1999 "530436" ""       0
                  4. 2000 "530436" ""       0
                  5. 2016 "53046P" ""       0
                  6. 2017 "53046P" ""       0
                  7. 2018 "53046P" ""       0
                  8. 1999 "530582" ""       0
                  9. 2000 "530582" ""       0
                 10. 2001 "530582" ""       0
                 11. 2002 "530582" ""       0
                 12. 2003 "530582" ""       0
                 13. 2008 "53071M" "15100A" 1
                 14. 2018 "53115L" ""       0
                 15. 2019 "53115L" ""       0
                 16. 2020 "53115L" ""       0
                 17. 2021 "53115L" ""       0
                 18. 2022 "53115L" ""       0
                 19. 2013 "531229" ""       0
                 20. 2014 "531229" ""       0
                 21. 2015 "531229" "531229" 1
                 22. 2015 "531229" "531229" 1
                 23. 2015 "531229" "531229" 1
                 24. 2016 "531229" "24776M" 1
                 25. 2016 "531229" "24776M" 1
                 26. 2017 "531229" ""       0
                 27. 2017 "531229" ""       0
                 28. 2018 "531229" ""       0
                 29. 2018 "531229" ""       0
                 30. 2019 "531229" ""       0
                 31. 2019 "531229" ""       0
                 32. 2020 "531229" ""       0
                 33. 2020 "531229" ""       0
                 34. 2021 "531229" ""       0
                 35. 2021 "531229" ""       0
                 36. 2022 "531229" ""       0
                 37. 2022 "531229" ""       0
                 38. 2014 "531465" ""       0
                 39. 2015 "531465" ""       0
                 40. 2016 "531465" ""       0
                 41. 2017 "531465" ""       0
                 42. 2018 "531465" ""       0
                 43. 2019 "531465" ""       0
                 44. 2020 "531465" ""       0
                 45. 2021 "531465" ""       0
                 46. 2022 "531465" ""       0
                 47. 2021 "53190C" ""       0
                 48. 2022 "53190C" ""       0
                 49. 1999 "531914" ""       0
                 50. 2000 "531914" ""       0
                 51. 2001 "531914" ""       0
                 52. 2002 "531914" ""       0
                 53. 2003 "531914" ""       0
                 54. 2004 "531914" ""       0
                 55. 2005 "531914" ""       0
                 56. 2006 "531914" "42853H" 1
                 57. 2007 "531914" ""       0
                 58. 2008 "531914" ""       0
                 59. 2009 "531914" "35800E" 1
                 60. 2010 "531914" ""       0
                 61. 2011 "531914" ""       0
                 62. 2012 "531914" ""       0
                 63. 2013 "531914" ""       0
                 64. 2014 "531914" ""       0
                 65. 2015 "531914" ""       0
                 66. 2016 "531914" ""       0
                 67. 2017 "531914" ""       0
                 68. 2018 "531914" ""       0
                 69. 2019 "531914" ""       0
                 70. 2020 "531914" ""       0
                 71. 2021 "531914" ""       0
                 72. 2022 "531914" ""       0
                 73. 1999 "531925" ""       0
                 74. 2000 "531925" ""       0
                 75. 1999 "531927" ""       0
                 76. 2000 "531927" ""       0
                 77. 2001 "531927" ""       0
                 78. 2002 "531927" ""       0
                 79. 2003 "531927" ""       0
                 80. 2004 "531927" ""       0
                 81. end
                
                .
                . preserve
                
                . rename Target6digitCUSIP cusip1
                
                . rename a_cusip_six cusip2
                
                . gen `c(obs_t)' obs_no = _n
                
                . reshape long cusip, i(obs_no)
                (j = 1 2)
                
                Data                               Wide   ->   Long
                -----------------------------------------------------------------------------
                Number of observations               80   ->   160         
                Number of variables                   5   ->   5           
                j variable (2 values)                     ->   _j
                xij variables:
                                          cusip1 cusip2   ->   cusip
                -----------------------------------------------------------------------------
                
                . drop obs_no _j
                
                . duplicates drop
                
                Duplicates in terms of all variables
                
                (62 observations deleted)
                
                . tempfile deals
                
                . save `deals'
                file C:\Users\clyde\AppData\Local\Temp\ST_5594_000002.tmp saved as .dta format
                
                .
                . restore
                
                . gen low = Year - 3
                
                . gen high = Year - 1
                
                . clonevar cusip = Target6digitCUSIP
                (71 missing values generated)
                
                . rangejoin Year low high using `deals', by(cusip)
                  (using rangestat version 1.1.1)
                
                . drop if !missing(Year_U)
                (183 observations deleted)
                
                . drop cusip *_U
                
                . clonevar cusip = a_cusip_six
                
                . rangejoin Year low high using `deals', by(cusip)
                  (using rangestat version 1.1.1)
                
                . drop if !missing(Year_U)
                (13 observations deleted)
                
                . drop cusip *_U
                The adapted code is
                Code:
                preserve
                rename Target6digitCUSIP cusip1
                rename a_cusip_six cusip2
                gen `c(obs_t)' obs_no = _n
                reshape long cusip, i(obs_no)
                drop obs_no _j
                duplicates drop
                tempfile deals
                save `deals'
                
                restore
                gen low = Year - 3
                gen high = Year - 1
                clonevar cusip = Target6digitCUSIP
                rangejoin Year low high using `deals', by(cusip)
                drop if !missing(Year_U)
                drop cusip *_U
                clonevar cusip = a_cusip_six
                rangejoin Year low high using `deals', by(cusip)
                drop if !missing(Year_U)
                drop cusip *_U
                though, again, the only change from before is to conform the case of the variable names to that of the data set.

                Comment


                • #9
                  Unfortunately, I still get the same error message 'no xij variables found' when trying to execute the reshape command. Could that be bcs my dataset contains more variables than only the one's listed above?
                  There are many more but I only listed these because these are the important variables to be considered for the output of dropping the certain observations.
                  Last edited by Sebastian Grein; 21 Oct 2023, 03:02.

                  Comment


                  • #10
                    I still get the same error message 'no xij variables found' when trying to execute the reshape command. Could that be bcs my dataset contains more variables than only the one's listed above?
                    No. Stata is not complaining about excess variables that it can't handle, it's complaining about a lack of variables that match the pattern spelled out in the -reshape-. The xij variables that Stata is looking for in this instance are cusip1 and cusip2 (well, it doesn't know at that point that it's only 1 and 2, it's looking for variables with names of the form cusip# where # is any number). In that code cusip1 and cusip2 must exist because they were brought into being by the -rename- commands a couple of lines earlier in the code.

                    I suggest that you run the code again, and then post back copy/pasting everything that shows in the Results screen (echoed commands, results, and messages) for everything from the first line through the -reshape- command into your post so we can get a closer look at what's going wrong.

                    Comment


                    • #11
                      The reshape commands works now. Unfortunately, my Stata always collapses when I am executing the rangejoin command. Is there any way way to address this issue or is it (probably) just too much data for my RAM (I already closed everything else than Stata)?

                      Comment


                      • #12
                        What do you mean when you say Stata "collapses?" It is possible that the demands on RAM exceed what is possible in your setup and Stata is truly hung. But it is also true that in similar situations, -rangejoin- can take a very long time to put together two very large data sets, and perhaps you just need to be patient. Have you opened Task Manager to see what Stata is really doing? (I'm assuming you are on a Windows machine. I imagine Mac has something analogous.)

                        Assuming that you really are exceeding the available memory, try breaking up your data set into chunks, where each chunk contains all the observations involving some subset of the cusips, the chunks do not overlap each other, and all the chunks combined cover everything. You can run this code on each chunk separately, save the results, and then -append- them all together at the end.

                        Comment


                        • #13
                          By "collapses" I mean my Mac tells me to close Stata because the app Stata is not reacting anymore. Unfortunately, I can not break up my data by acquirer cusips because it could be that an acquirer cusip from another chunk has a target with a cusip which matches to an acquirer cusip which wouldn't be considered then.

                          Comment


                          • #14
                            I tried something else now:
                            a_participated checks if the respective firm (a_cusip_six) participated in a deal as an acquirer (a_cusip_six) in the last 3 years. The variable deal is 1 if the observation is an M&A deal and 0 otherwise. I used 1/99 since I checked before that there's a maximum of 99 observations per company.
                            t_participated checks if the respective firm (a_cusip_six) participated in a deal as a target (Target6digitCUSIP) is the last 3 years. After creating the 3 lists, the command "foreach cusip in `t_list1'...." does not lead to replacements. Can you support on why this might be the case and generally advise on my procedure? Many thanks!

                            Code:
                            gen a_participated=0
                            
                            forval i=1/99 {
                                bysort a_cusip_six (Year):replace a_participated = a_participated + deal[_n-`i'] if Year==Year[_n-`i']+1
                            }
                            forval i=1/99 {
                                bysort a_cusip_six (Year):replace a_participated = a_participated + deal[_n-`i'] if Year==Year[_n-`i']+2
                            }
                            forval i=1/99 {
                                bysort a_cusip_six (Year):replace a_participated = a_participated + deal[_n-`i'] if Year==Year[_n-`i']+3
                            }
                            
                            
                            // Do the same for targets.
                            tostring Year, gen (Year_s)
                            order Year_s
                            gen  a_combined = Year_s + a_cusip_six
                            order a_combined deal a_cusip_six Target6digitCUSIP
                            
                            
                            gen Year1 = Year +1
                            gen Year2 = Year +2
                            gen Year3 = Year +3
                            order Year1 Year2 Year3
                            
                            tostring Year1, replace
                            tostring Year2, replace
                            tostring Year3, replace
                            
                            gen  t_combined1 = Year1 + Target6digitCUSIP
                            replace t_combined1 = "0" if deal == 0
                            
                            gen  t_combined2 = Year2 + Target6digitCUSIP
                            replace t_combined2 = "0" if deal == 0
                            
                            gen  t_combined3 = Year3 + Target6digitCUSIP
                            replace t_combined3 = "0" if deal == 0
                            
                            drop Year1 Year2 Year3
                            
                            order a_combined t_combined1 t_combined2 t_combined3 deal Target6digitCUSIP
                            
                            levelsof t_combined1, local(t_list1)
                            levelsof t_combined2, local(t_list2)
                            levelsof t_combined3, local(t_list3)
                            
                            gen t_participated = 0
                            foreach cusip in `t_list1' {
                                replace t_participated = 1 if a_combined = `cusip'
                            }
                            
                            foreach cusip in `t_list2' {
                                replace t_participated = 1 if a_combined = `cusip'
                            }
                            
                            foreach cusip in `t_list3' {
                                replace t_participated = 1 if a_combined = `cusip'
                            }
                            that's the outcome for the last command:
                            r(110);

                            . do "/var/folders/n1/h6bdd40s7yd0qnd7g38vl3tr0000gn/T//SD01012.000000"

                            . foreach cusip in `t_list1' {
                            2. replace t_participated = 1 if a_combined = `cusip'
                            3. }

                            .
                            . foreach cusip in `t_list2' {
                            2. replace t_participated = 1 if a_combined = `cusip'
                            3. }

                            .
                            . foreach cusip in `t_list3' {
                            2. replace t_participated = 1 if a_combined = `cusip'
                            3. }

                            .
                            end of do-file
                            Last edited by Sebastian Grein; 28 Oct 2023, 07:17.

                            Comment


                            • #15
                              Try this. It is simpler than the approach I previously suggested, and it uses less memory (though whether sufficiently less that it will work with your massive data set, I can't be sure.)
                              Code:
                              rename a_cusip_six cusipa
                              rename Target6digitCUSIP cusipt
                              gen `c(obs_t)' obs_no = _n
                              reshape long cusip, i(obs_no) j(role) string
                              
                              rangestat (sum) deal, by(cusip) interval(Year -3 -1)
                              
                              by obs_no (deal_sum), sort: drop if inrange(deal_sum[_N], 1, .)
                              
                              drop deal_sum
                              reshape wide
                              rename cusipa a_cusip_six
                              rename cusipt Target6digitCUSIP
                              Having already tried using -rangejoin-, you already have -rangestat- installed.

                              It may be that you will encounter the same memory limitations with -rangestat- that you encountered with -rangejoin-, but I don't know.

                              The -reshape- and -rangestat- commands are going to take a long time. Be patient. That said, if you are running a fully updated version 18, you can use -reshape-'s -favor(speed)- option to make it faster. However, the price of -favor(speed)- is that more memory is used, so this might not be viable in your situation.

                              If this approach also exceeds memory limitations, then we will probably have to resort to some complicated code with explicit iteration over observations. It will be difficult to write and debug, and ugly to read. And it will be even slower than anything we have tried so far. But if we cannot avoid it, such an approach is possible as a last resort.

                              Comment

                              Working...
                              X