Subgroup in panel data

Ihab Man

Join Date: Jul 2020

Posts: 56
#1

Subgroup in panel data

17 May 2021, 13:12

Hi all.
Please, I am using panel data for 500 firms between 2000-2010. I have analysed all 500 firms together using -logit- , but would like to split the 500 firms into 2 groups based on one of my independent variable "X1" ( using the median of X1) . Group 1 = those banks that lie below the median of X1 and group 2 = those banks that lie above the median of X1. Please, how can I do it ? Thanks a lot.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

17 May 2021, 13:51

One issue is that some firms may fall into both groups as you have panel data.

Code:

sum x1, d gen hix1= x1> r(p50)
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#3

17 May 2021, 13:52

One quick way is to look up the median manually and use this value like:

Code:

sum X1, detail logit ... if X1 <= `r(p50)' logit ... if X1 > `r(p50)'

EDIT Andrew made a good point. Maybe you first want to compute an average for each firm over all dates and use this value for the median split.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Ihab Man

Join Date: Jul 2020

Posts: 56
#4

17 May 2021, 14:26

Dear Andrew Musau & Felix Bittmann
Thanks a lot for your answers.
Please, I have two following questions :
1-) Please, Why some firms may fall into both groups ? since group 1 will have the banks that equal to and above the median , and group 2 will have the banks that lie below the median.
2-) Please, do you mean if the median equal to 0.125 the command will be:
logit ... if X1 <= 0.125 logit ... if X1 > 0.125 Thanks a alot
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#5

17 May 2021, 16:07

1-) Please, Why some firms may fall into both groups ? since group 1 will have the banks that equal to and above the median , and group 2 will have the banks that lie below the median.

With panel data, firms are observed over several time points, say years, so that an observation is a firm-year. Therefore, if we compute the median based on individual observations, there is a chance that in some years, a firm has values below the median value and in other years, above. As Felix suggests, one way to overcome this is to compute a single average for a firm and then compute the median based on these averages. In this way, you guarantee that a firm is either above or below the median value.

2-) Please, do you mean if the median equal to 0.125 the command will be:
logit ... if X1 <= 0.125 logit ... if X1 > 0.125 Thanks a alot

Precisely, the median is a distinct value. All in all

Code:

bys firmid: egen meanX1= mean(X1) egen tag= tag(firmid) sum meanX1 if tag, d gen hiX1= meanX1> r(p50) logit ... if hiX1 logit ... if !hiX1

where you replace "firmid" with the name of your panel identifier.

Last edited by Andrew Musau; 17 May 2021, 16:22.
Comment
Ihab Man

Join Date: Jul 2020

Posts: 56
#6

17 May 2021, 16:43

Dear Andrew Musau
Thanks a lot . Much appreciated.
Now, I understand what you mean for some firms may fall into both groups. Please, as you have mentioned that to overcome this is to compute a single average for a firm and then compute the median based on these averages. please, to be sure , do you mean the commands that will overcome this issue are:
bys firmid: egen meanX1= mean(X1)
egen tag= tag(firmid) ? Well, if yes , the hix1 will be the first group yes ? and most importantly , Stata dose know that !hiX1 is <= X1 ? Thanks a lot.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#7

17 May 2021, 16:56

Correct.
Comment
Ihab Man

Join Date: Jul 2020

Posts: 56
#8

17 May 2021, 17:22

Thanks a lot Dear Andrew Musau
Comment
Ihab Man

Join Date: Jul 2020

Posts: 56
#9

17 May 2021, 18:05

Dear Andrew Musau
Sorry for bothering you with my questions. Please, if I have missing values in the variable X1 ? Your commands in #5 will still the same?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#10

18 May 2021, 03:52

Missing values do not affect the calculations. However, they can bias the average upwards or downwards. For example, if your variable is market capitalization and there is a time effect, e.g., firms in the sample generally grew bigger over time, then by averaging nominal values, a firm whose mean is based on the last 8 years in the sample will have an upward bias in its average compared to a similar firm whose mean is calculated over a 20 year period.
Comment

Announcement

Subgroup in panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment