Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping firms based on a dummy counterpart in each year, industry and decile

    Dear Statalist users,

    I have tried multiple methods to try the following:

    - I have a database, where I need to check if firms have at least one counterpart 0 and 1 in each year, industry, and decile. If they do not have a counterpart, I have to drop those firms.
    - In my database, I already created a dummy variable called group where it would show 1 for one group and 2 for the other group.
    - I then proceeded to check the minimum and maximum of the group dummy by year, industry and decile with the following code:
    bysort year industry decile: egen max = max(group)
    bysort year industry decile: egen min = min(group)

    - And then generated the dummy that would show if min and max were equal. If they were equal, it would mean they would have no counterpart and thus be dropped.
    generate check = (max==min)

    -As this would only drop firmyear observations, and not the whole firm, I made a new dummy that would be equal to 1 for the whole firm if one of the firm-year observations was 1 for the checkdummy.
    bys firm (year), sort: gen secondcheck = check[1] !=check[_N]
    drop if secondcheck==1


    Now another method that I tried is the following:

    egen test=group(year industry decile), label
    sort test
    by test: egen max=max(group)
    by test: egen min=min(group)
    by test: generate check = (max==min)

    And same for above then again to drop the whole firm instead of just the firmyear observation:
    bys firm (year), sort: gen secondcheck = check[1] !=check[_N]
    drop if secondcheck==1

    Both methods would give me slightly different results, and I am not sure how to check which one performed correctly.
    Can any of you experts give your opinion on this?
    Thank you so much in advance!

  • #2
    Here is an example of my dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 firm double year byte(industry decile) float group
    "001003" 1989 57  3 0
    "001004" 1989 50  5 0
    "001004" 1990 50  7 0
    "001004" 1991 50  4 0
    "001004" 1992 50  5 0
    "001009" 1989 34  7 0
    "001009" 1990 34  6 0
    "001009" 1991 34  7 0
    "001009" 1992 34  9 0
    "001012" 1989 33  8 0
    "001013" 1989 36  8 0
    "001013" 1990 36 10 0
    "001013" 1991 36  9 0
    "001013" 1992 36  9 0
    "001017" 1989 38  7 0
    "001017" 1990 38  7 0
    "001017" 1991 38  8 0
    "001017" 1992 38  4 0
    "001019" 1989 73  9 1
    "001019" 1990 73 10 1
    "001019" 1991 73  9 1
    "001019" 1992 73  9 1
    "001020" 1989 32  5 0
    "001021" 1989 38  5 0
    "001021" 1990 38  4 0
    "001021" 1991 38  5 0
    "001021" 1992 38  9 0
    "001025" 1989 50  2 1
    "001025" 1990 50  7 1
    "001025" 1991 50  1 1
    "001025" 1992 50  7 1
    "001028" 1989 73  2 1
    "001028" 1990 73  5 1
    "001033" 1989 35  4 1
    "001033" 1990 35  5 1
    "001033" 1991 35 10 1
    "001033" 1992 35  1 1
    "001034" 1989 28  4 0
    "001034" 1990 28  5 0
    "001034" 1991 28  5 0
    "001034" 1992 28  6 0
    "001036" 1989 34  6 0
    "001036" 1990 34  5 0
    "001036" 1991 34  4 0
    "001036" 1992 34  7 0
    "001037" 1989 36  1 0
    "001037" 1990 36  1 0
    "001037" 1991 36  1 0
    "001037" 1992 36  2 0
    "001043" 1989 50  5 0
    "001043" 1990 50  3 0
    "001043" 1991 50  5 0
    "001043" 1992 50  2 0
    "001050" 1989 35  3 1
    "001050" 1990 35  2 1
    "001050" 1991 35  3 1
    "001055" 1989 35  6 0
    "001055" 1990 35 10 0
    "001055" 1991 35  7 0
    "001055" 1992 35  5 0
    "001056" 1989 38  4 0
    "001056" 1990 38  4 0
    "001056" 1991 38  6 0
    "001056" 1992 38  7 0
    "001065" 1989 36  1 0
    "001065" 1990 36  1 0
    end

    Comment


    • #3
      Edit;
      I found a small mistake in the code I wrote in my first post. But here are the correct codes for the two different methods I used:

      bysort year industry decile: egen max = max(group)
      bysort year industry decile: egen min = min(group)
      generate check = (max==min)
      bys firm (check), sort: gen secondcheck = check[1] !=check[_N]
      drop if secondcheck==1

      And for the second method:

      egen test=group(year industry decile), label
      sort test
      by test: egen max=max(group)
      by test: egen min=min(group)
      by test: generate check = (max==min)
      bys firm (check), sort: gen secondcheck = check[1] !=check[_N]
      drop if secondcheck==1

      Any thoughts? Thank you so much in advance! Any advice is appreciated.

      Comment

      Working...
      X