Dropping firms beased on certain criteria

ANASTASSIA VILDERSON

Join Date: Apr 2022

Posts: 16
#1

Dropping firms beased on certain criteria

04 Jun 2022, 10:27

Dear community,

I have the following challenge with my data sample. I have about 10 000 observations (here, as an example, 100 are shown) in the original sample and more variables (they are irrelevant for this topic, so I don't include them here). The setup of the technical challenge I am facing is as follows: a shock took place in 1990, so the Post variable is a dummy = 1 for Post-shock years, 0 otherwise. Gvkey is a firm id. I need to keep in my data sample only those firms that have at least one observation pre-shock and at least one observation for the post-shock period.

For example, gvkey = 1009. This firm I do not need to keep as it doesn't have any observations in the post-shock time period. If a firm would have observations only for post-shock period (so, fyear > 1990 or Post = 1 for all observations), then it should be dropped from the sample as well. But, e.g., for gvkey = 1017, I would need to keep ALL available observations for this firm, as it has at least one observation for both pre- and post-shock time periods.

Another aspect: in the sample, not all observations are consecutive. E.g., it might be that a firm has one observation for 1985 and then one for 1991 (so, no observations between 1985 and 1991) but this firm still needs to remain in the final sample as it satisfies the requirement of having at least one observation for Pre-shock and at least one observation for Post-shock.

Many thanks in advance for your help,
Anastassia.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long gvkey float Post double fyear 1009 0 1985 1009 0 1986 1009 0 1987 1011 0 1985 1011 0 1986 1012 0 1985 1012 0 1986 1012 0 1987 1012 0 1988 1012 0 1989 1017 0 1985 1017 0 1988 1017 0 1989 1017 1 1990 1017 1 1991 1018 0 1985 1018 0 1986 1018 0 1987 1020 0 1985 1020 0 1986 1020 0 1987 1020 0 1988 1028 0 1985 1028 0 1986 1028 0 1987 1028 0 1988 1028 0 1989 1028 1 1990 1034 0 1989 1034 1 1990 1034 1 1991 1034 1 1992 1034 1 1993 1043 0 1989 1043 1 1990 1043 1 1991 1045 0 1985 1045 0 1986 1045 0 1987 1045 0 1988 1045 0 1989 1045 1 1990 1045 1 1991 1045 1 1992 1045 1 1993 1050 0 1985 1050 0 1986 1058 0 1985 1058 0 1986 1058 0 1987 1058 0 1988 1072 0 1985 1072 0 1986 1072 0 1987 1082 0 1985 1082 0 1986 1082 0 1987 1083 0 1987 1083 0 1988 1094 0 1985 1094 0 1988 1098 0 1985 1098 0 1988 1098 0 1989 1098 1 1990 1103 0 1985 1103 0 1986 1103 0 1987 1103 0 1988 1104 0 1985 1104 0 1986 1104 0 1987 1108 0 1985 1108 0 1986 1108 0 1987 1108 0 1988 1108 1 1992 1108 1 1993 1109 0 1985 1109 0 1986 1109 0 1989 1109 1 1990 1109 1 1991 1109 1 1992 1111 0 1985 1111 0 1989 1111 1 1993 1112 0 1985 1112 0 1986 1112 0 1987 1115 1 1992 1115 1 1993 1120 0 1985 1120 0 1986 1120 0 1987 1126 0 1985 1126 0 1986 1126 0 1987 1130 0 1986 1130 0 1987 end
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10481
#2

04 Jun 2022, 10:35

If you get to the essence of the problem, you are just checking whether the post variable varies within firms:

Code:

bys gvkey (Post): gen tokeep= Post[1]!=Post[_N] keep if tokeep
Comment
ANASTASSIA VILDERSON

Join Date: Apr 2022

Posts: 16
#3

05 Jun 2022, 00:06

Thank you so much, Andrew Musau! This solved my problem!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#4

05 Jun 2022, 01:17

See also https://www.stata.com/support/faqs/d...ions-in-group/
1 like
Comment

Announcement

Dropping firms beased on certain criteria

Comment

Comment

Comment