Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to remove the entire data for a certain firm if it starts at certain date

    hello!

    I want to ask a small question, I want to remove all firms that have the date 1962. The reason Why I want to do this is because I'm trying to do a robustness test where if I remove a subset of my sample, I should get the same results.

    A small sample from my sheet follows,

    datayearfiscal firm
    1970 1000
    1971 1000
    1972 1000
    1973 1000
    1974 1000
    1975 1000
    1976 1000
    1977 1000
    1962 1001
    1963 1001
    1964 1001
    1965 1003
    1967 1004
    1968 1004
    1969 1004
    1970 1004

    What It should end up like:

    datayearfiscal firm
    1970 1000
    1971 1000
    1972 1000
    1973 1000
    1974 1000
    1975 1000
    1976 1000
    1977 1000
    1967 1004
    1968 1004
    1969 1004
    1970 1004

    You could see the entire firm was deleted based on that they had the year "1962" this is just an example! not the actual data collected!

    Regards And thank you!

  • #2
    Hello,

    I figured it out!

    What I Did is first running the following code:
    Code:
     
     replace dateyearfiscal = . if dateyearfiscal < 1963
    this will replace all 1962 points with ".".

    Afterwards I ran the following
    Code:
     
     bysort firm (dateyearfiscal) : drop if missing(dateyearfiscal[_N])
    this simply deletes the entire panel.

    Now this won't work every time because you might have a lot of missing data and you just want to delete the panel if only the first point is missing. so i thought of running such a code
    Code:
     
     bysort firm (dateyearfiscal) : drop if missing(dateyearfiscal[1])
    Feel free to correct me, but this was my conclusion and option 1 worked for me since i had no gaps ^_^

    Regards,

    Comment


    • #3
      If you want to remove all firms having any record for 1962 (not necessariliy the first), you can do that:

      Code:
      gen u=datayearfiscal==1962
      bysort firm: egen v=max(u)
      drop if v
      drop u v
      Or alternatively:

      Code:
      gen u=datayearfiscal==1962
      bysort firm (u): drop if u[_N]
      drop u
      If, however, you want to remove only firms for which the first record (in date order) is for 1962, then one line is enough:

      Code:
      bysort firm (datayearfiscal): drop if datayearfiscal[1]==1962
      Replacing dateyearfiscal with a missing is a possibility, but it will fail if there is missing data in this variable, and you don't want to remove those firms. You could instead replace with a small value (say, 0, which won't be a valid year nor a missing value). However, using an additional indicator variable looks cleaner to me.
      Last edited by Jean-Claude Arbaut; 27 Apr 2018, 16:03.

      Comment


      • #4
        What said Jean works too ^_^

        Thanks!

        Comment


        • #5
          Yes Jean, Your way actually works in more cases and mine was just a special case I guess. I did benefit from your codes because I had to do this for variables that had more than 1 missing value as well. Much Appreciated.

          Comment

          Working...
          X