Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations after checking if match between var1 and part of var2 (numeric values)

    Hi Statalisters,

    I have data that looks like this:
    year becameceo ceoann .. ..
    2006 20060330 CEO
    2003 20100615 CEO
    2004 20040718 -
    2009 20170405 -
    I want to drop observations that have a missing value for ceoann AND where year and the first four characters of becameceo (which also indicates the year) do not match.
    I am still unsure how to do this in Stata.

    I'm assuming the code would be something like: drop if ceoann=="" &
    but I have no idea what the rest should look like.

    I came across the string functions like strpos, but I guess those are only for string variables? In my case year is integer and becomeceo is long.

    I would appreciate any help with this matter!

    KR,

    Shaquille Wijngaarde

  • #2
    Hi Shaquille,

    I'm assuming the 'becameceo' variable is just a YMD notation of e.g. 30th of March in the first row? If that's so, you can convert that float variable to string, then to date and use stata's datetime functions. This will make things easier for you.

    You can do it like this:
    Code:
    tostring becameceo, replace format(%20.0f)
    gen date_becomeceo = date(becameceo,"YMD")
    format date_becomeceo %td
    Now, you can use the datetime functions to write a relative simply if function like this:

    Code:
    drop if ceoann=="" & year != year(date_becomeceo)
    Last edited by Jesse Tielens; 26 Jul 2018, 14:43.

    Comment


    • #3
      this would be easier if you had used -dataex-; please see the FAQ; here I assume that becameceo is a string variable:
      Code:
      drop if ceoann=="" & year!=real(substr(becameceo,1,4)
      if becameceo is actually numeric, the code is a little more complicated:
      Code:
      drop if ceoann=="" & year!=real(substr(string(becameceo),1,4))
      not tested because you did not supply the data in an easy-to-use form; again, please read the FAQ

      Comment


      • #4
        Here's another variation on the themes above. It assumes becameceo is a numeric variable.

        Code:
        clear *
        input long(year becameceo) str3 ceoann
        2006     20060330     "CEO"
        2003     20100615     "CEO"
        2004     20040718     ""         
        2009     20170405     ""
        end
        
        list
        drop if ceoann == "" & year==int(becameceo/10000)
        list
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          Originally posted by Rich Goldstein View Post
          this would be easier if you had used -dataex-; please see the FAQ; here I assume that becameceo is a string variable:
          Code:
          drop if ceoann=="" & year!=real(substr(becameceo,1,4)
          if becameceo is actually numeric, the code is a little more complicated:
          Code:
          drop if ceoann=="" & year!=real(substr(string(becameceo),1,4))
          not tested because you did not supply the data in an easy-to-use form; again, please read the FAQ
          Hi Rich,

          Thank you for your help this works!
          Apologies for not using -dataex-, I will make sure to use it in the future.

          Comment

          Working...
          X