Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping firms beased on certain criteria

    Dear community,

    I have the following challenge with my data sample. I have about 10 000 observations (here, as an example, 100 are shown) in the original sample and more variables (they are irrelevant for this topic, so I don't include them here). The setup of the technical challenge I am facing is as follows: a shock took place in 1990, so the Post variable is a dummy = 1 for Post-shock years, 0 otherwise. Gvkey is a firm id. I need to keep in my data sample only those firms that have at least one observation pre-shock and at least one observation for the post-shock period.

    For example, gvkey = 1009. This firm I do not need to keep as it doesn't have any observations in the post-shock time period. If a firm would have observations only for post-shock period (so, fyear > 1990 or Post = 1 for all observations), then it should be dropped from the sample as well. But, e.g., for gvkey = 1017, I would need to keep ALL available observations for this firm, as it has at least one observation for both pre- and post-shock time periods.

    Another aspect: in the sample, not all observations are consecutive. E.g., it might be that a firm has one observation for 1985 and then one for 1991 (so, no observations between 1985 and 1991) but this firm still needs to remain in the final sample as it satisfies the requirement of having at least one observation for Pre-shock and at least one observation for Post-shock.

    Many thanks in advance for your help,
    Anastassia.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long gvkey float Post double fyear
    1009 0 1985
    1009 0 1986
    1009 0 1987
    1011 0 1985
    1011 0 1986
    1012 0 1985
    1012 0 1986
    1012 0 1987
    1012 0 1988
    1012 0 1989
    1017 0 1985
    1017 0 1988
    1017 0 1989
    1017 1 1990
    1017 1 1991
    1018 0 1985
    1018 0 1986
    1018 0 1987
    1020 0 1985
    1020 0 1986
    1020 0 1987
    1020 0 1988
    1028 0 1985
    1028 0 1986
    1028 0 1987
    1028 0 1988
    1028 0 1989
    1028 1 1990
    1034 0 1989
    1034 1 1990
    1034 1 1991
    1034 1 1992
    1034 1 1993
    1043 0 1989
    1043 1 1990
    1043 1 1991
    1045 0 1985
    1045 0 1986
    1045 0 1987
    1045 0 1988
    1045 0 1989
    1045 1 1990
    1045 1 1991
    1045 1 1992
    1045 1 1993
    1050 0 1985
    1050 0 1986
    1058 0 1985
    1058 0 1986
    1058 0 1987
    1058 0 1988
    1072 0 1985
    1072 0 1986
    1072 0 1987
    1082 0 1985
    1082 0 1986
    1082 0 1987
    1083 0 1987
    1083 0 1988
    1094 0 1985
    1094 0 1988
    1098 0 1985
    1098 0 1988
    1098 0 1989
    1098 1 1990
    1103 0 1985
    1103 0 1986
    1103 0 1987
    1103 0 1988
    1104 0 1985
    1104 0 1986
    1104 0 1987
    1108 0 1985
    1108 0 1986
    1108 0 1987
    1108 0 1988
    1108 1 1992
    1108 1 1993
    1109 0 1985
    1109 0 1986
    1109 0 1989
    1109 1 1990
    1109 1 1991
    1109 1 1992
    1111 0 1985
    1111 0 1989
    1111 1 1993
    1112 0 1985
    1112 0 1986
    1112 0 1987
    1115 1 1992
    1115 1 1993
    1120 0 1985
    1120 0 1986
    1120 0 1987
    1126 0 1985
    1126 0 1986
    1126 0 1987
    1130 0 1986
    1130 0 1987
    end

  • #2
    If you get to the essence of the problem, you are just checking whether the post variable varies within firms:

    Code:
    bys gvkey (Post): gen tokeep= Post[1]!=Post[_N]
    keep if tokeep

    Comment


    • #3
      Thank you so much, Andrew Musau! This solved my problem!

      Comment


      • #4
        See also https://www.stata.com/support/faqs/d...ions-in-group/

        Comment

        Working...
        X