Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep if/or across many disparate observations

    Hi statalists,

    I'm cleaning data for use in a cumulative abnormal returns M&A event study, and attempting to retain in my stock market data only those companies (identified by their unique co_code observation) which have associated M&A events. I have compiled code to keep all the variables and retain only those with co_code observations drawn from my cleaned M&A data, then taken what I thought were the correct steps to guide Stata to keep the associated code observations, as follows:

    Code:
    keep co_code company_name co_stkdate bse_closing_price bse_returns nse_closing_price nse_returns tyear tmonth fisc_year 
    keep if co_code=="365" | "3990" | "6484" | "7068" | "11063" | "18396" | "19408" | "21042" | "21420" | "23354" | "23668" | "24685" | "28469" | "28932" | "29678" | "33769" | "35548" | "37451" | "41557" | "42443" | "46286" | "47542" | "47757" | "49262" | "50505" | "51238" | "54885" | "62425" | "66192" | "68925" | "69565" | "70319" | "73325" | "73572" | "74600" | "74601" | "76120" | "76120" | "76722" | "81023" | "81038" | "82000" | "830108" | "3015" | "83025" | "85447" | "85649" | "86809" | "92894" | "94136" | "96379" | "96387" | "97115" | "97156" | "98445" | "98457" | "98659" | "98918" | "98964" | "99558" | "99667" | "100644" | "101070" | "102509" | "105677" | "108074" | "108841" | "109300" | "109809" | "112333" | "114749" | "119790" | "121140" | "122573" | "122902" | "122948" | "127106" | "136444" | "136444" | "141598" | "144626" | "144626" | "145425" | "146460" | "149369" | "150573" | "159120" | "159120" | "163705" | "165563" | "165693" | "170018" | "171151" | "172333" | "173299" | "174488" | "175632" | "175679" | "180201" | "182141" | "183791" | "190511" | "191865" | "193553" | "194815" | "196588" | "200352" | "201554" | "204843" | "205370" | "208602" | "212914" | "214742" | "215829" | "223422" | "224351" | "226821" | "228346" | "232916" | "236460" | "236667" | "237266" | "242059" | "243455" | "244856" | "248129" | "248136" | "249332" | "249469" | "249597" | "251109" | "253375" | "256146" | "257613" | "265398" | "265414" | "268177" | "269186" | "269225" | "272854" | "275062" | "317632" | "320945" | "337911" | "338882" | "356581" | "369199" | "369273" | "369866" | "369944"
    Running this I receive a type mismatch error - I'm assuming this may be down to attempting to retain too many co_codes at once? Is there a method for implementing a bulk keep if/or process like I require? Any help greatly appreciated, thanks in advance!

  • #2
    You may wish to read this text.

    In short, if you wish to keep the code above, you must co_code each time.
    Last edited by Marcos Almeida; 18 Mar 2019, 08:51.
    Best regards,

    Marcos

    Comment


    • #3
      Thanks Marcos, that was helpful, though some of the content is difficult to interpret as a total novice with the Stata platform. Approach 4 seems to be the best for me to take:

      Code:
      . clear
      . numlist "1/2 34/56 678/901"
      . tokenize `r(numlist)'
      . local N : word count `r(numlist)'
      . set obs `N'
      . gen id = .
      . forval i = 1 / `N' {
      . qui replace id = ``i'' in `i'
      . }
      After completing this process, how do I perform the keep command to retain the selected companies?

      Comment


      • #4
        A toy example with the (as underlined by Nick in the shared FAQ in #2, "obvious but tedious way") code:

        Code:
         keep co_code company_name if co_code=="365" | co_code== "3990" | co_code== "6484" | co_code == "7068"
        The alternative under - egen - seemed to me quite neat.
        Last edited by Marcos Almeida; 18 Mar 2019, 09:29.
        Best regards,

        Marcos

        Comment


        • #5
          #3 here: It's one of "my" FAQs that is proving over-condensed here.

          Note the title of the last section in the FAQ.
          4 A shortcut with the above when identifiers are numeric

          where the above means the previous section in the FAQ. In other words, you get a dataset of identifiers, then you merge that with your original dataset. The intersection gives you what you want to keep. See also the other FAQ cited within that FAQ.

          It's vital to understand why something like

          Code:
           
           keep if co_code=="365" | "3990"
          fails completely as illegal. You want Stata to read that as
          Code:
           keep if co_code=="365" | co+code == "3990"
          but Stata doesn't work that way. Stata reads your code as if you had written
          Code:
           keep if (co_code=="365") | ("3990")
          See where the parentheses fall. The first parenthesised expression is evaluated as true or false, which means a numeric result of 0 or 1; but the second parenthesised expression is just a string, hence the error message of type mismatch. So, you might ask, why doesn't Stata support a syntax like
          Code:
           keep if co_code== ("365" | "3990" )
          ? And it's more or less the same answer. Logical operators expect numeric operands. But Stata does support inlist(), as already mentioned. That's not much use with a long list of possibilities, which is why other approaches need to be considered.


          Comment


          • #6
            Sorry for the wrong advice in #4. I should have written - keep if - , I mean, without the variables between "keep" and if". Anyway, that wouldn't provide the complete solution, as Nick pointed out in #5.

            That being said, I gather (hopefully I'm not wrong) that you may - keep - using the "if" clause for specific values of the variable(s), then - keep - again, this time concerning the variables you wish, as in this toy example:

            Code:
            sysuse auto
            */ this is not correct
            keep price mpg rep78 foreign if rep78 == 2 | rep78 ==3
            */ but this is correct
            keep if rep78 == 2 | rep78 ==3
            */ then, as a second step, we may keep the variables we wish
            keep price mpg rep78 foreign
            Best regards,

            Marcos

            Comment

            Working...
            X