Panel data, rank and percentiles

Dilyana Berg

Join Date: Jun 2015

Posts: 32
#1

Panel data, rank and percentiles

17 Jun 2015, 09:38

Dear Statalist,
I have unbalanced panel data for funds from 2000 to 2014. I have monthly observations for the fund's betas but not all funds for the whole period.
fund(id) /// month //// beta
1 /// 2000m10 /// 0,987
1 /// 2000m11 /// 0.654
1 /// 2000m12 /// 1.112
1 /// 2001m1 /// 1.022
1 /// 2000m2 /// 0.944
2 /// 2001m1 /// 0.888
2 /// 2001m2 /// 0.921
2 /// 2001m3 /// 0.765
2 /// 2001m4 /// 0.876
2 /// 2001m5 /// 0.645
2 /// 2001m6 /// 1.213
3 /// 2005m1 /// 1.001
3 /// 2005m2 /// 0.732

Moreover I have a dummy variable indicating if the fund is managed by man or woman at the respective month for which the beta is provided. I want to rank the funds in each month/year period (2000m1, 2000m2, 2000m3, ......) for the entire period 2000 -2014. At the end I want to calculate the share of the women in the different percentiles of betas distribution (top 1%, top 10-40%,middle 20%, bottom 10-40%, bottom 1% ). First I am not sure if the statistics will be correct since there is a different number of observations for each month. And second:
I am not sure which command to use -pctile- or -xtile- and how to obtain the share of women per month in the respective rank

I have used the following command:

egen decile = xtile( Beta), by(month) p(10(10)90)

but it generates only numbers from 1 to 9 and I can not "translate" this in: top1%, top 10-40%,middle 20%, bottom 10-40%, bottom 1%

Any help is appreciated
Tags: panel data, percentiles, rank, stata
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

17 Jun 2015, 09:46

it seems to me that you have two options:

Code:

egen decile1=xtile(Beta), by(month) nq(100)

and then collapse, or recode, into the categories you want, or

Code:

egen decile2=xtile(Beta), by(month) p(1 10 40 60 90 99)

but note that (1) I'm not sure I understand your groupings and may have this wrong and (2) your groupings are not exhaustive and this second method will give every observation a value
Comment
Dilyana Berg

Join Date: Jun 2015

Posts: 32
#3

17 Jun 2015, 10:30

Rich, thanks for the immediate reply. I am trying to explain again.
I want to rank fund's betas for each month. After that I have to estimate the percentage of women in the top 1%, top 10-40%,middle 20%, bottom 10-40%, bottom 1% . After that I want to make a graph with the share of women in the respective category for the whole period (should I have use mean or collapse here)
As a whole I need a graph with the percentage of women in each category. Is this the right way
Comment
Dilyana Berg

Join Date: Jun 2015

Posts: 32
#4

18 Jun 2015, 06:05

Hello together,

I am sorry for the repeating my question. I am trying to explain again with more details and data
I have unbalanced panel data for funds from 2000 to 2014. I have monthly observations for the fund's betas but not all funds for the whole period.

fund(id) /// month //// beta /// deciles
1 /// 2000m10 /// 0,987 // 1
1 /// 2000m11 /// 0.654 /// 2
1 /// 2000m12 /// 1.112 /// 2
1 /// 2001m1 /// 1.022 //// 3
1 /// 2000m2 /// 0.944 /// 2
2 /// 2001m1 /// 0.888 //// 1
2 /// 2001m2 /// 0.921 /// 2
2 /// 2001m3 /// 0.765 /// 7
2 /// 2001m4 /// 0.876 /// 10
2 /// 2001m5 /// 0.645 /// 14
2 /// 2001m6 /// 1.213 /// 11
3 /// 2005m1 /// 1.001 /// 5
3 /// 2005m2 /// 0.732 /// 2

Moreover I have a dummy variable indicating if the fund is managed by man or woman at the respective month for which the beta is provided. I want to rank the funds in each month/year period (2000m1, 2000m2, 2000m3, ......) for the entire period 2000 -2014. At the end I want to calculate the share of the women in the different percentiles of betas distribution (top 1%, top 10-40%,middle 20%, bottom 10-40%, bottom 1% ). First I am not sure if the statistics will be correct since there is a different number of observations for each month. And second:
I am not sure which command to use -pctile- or -xtile- and how to obtain the share of women per month in the respective rank

I want to replicate one paper in which is stated that if the women follow an extreme strategy(with respect to Beta) than the Beta for women should be in the tail of the distribution.They compute the share of women in different percentiles in the distribution of Beta.

I have used the following command:

egen deciles=xtile( Beta), by (month) nq(20)
egen all_funds = count( wficn), by (month deciles)

egen male_funds = count( wficn) if females==0, by (month deciles)

egen female_funds = count( wficn) if females==1, by(month deciles)
gen p_male_funds = male_funds/all_funds if all_funds > 0
replace p_male_funds = 0 if male_funds == 0 & all_funds > 0
gen p_female_funds = female_funds/all_funds if all_funds > 0
replace p_female_funds = 0 if female_funds == 0 & all_funds > 0

Here is a citation from the paper that I try to replicate. In the paper the authors make this with single and team-managed funds. I have to use the same idea for women and men in my work

"To get a first idea about the extremity of a fund’s investment style we analyse
the distribution of the factor loadings β1 to β4 from model (1) for team- and singlemanaged
funds. If a fund follows an extreme strategy with respect to a specific style
dimension, its factor loadings are more likely to be in the tail of the distribution of
all fund’s factor loadings in the same year. Thus, if the diversification of opinions
Hypothesis 1 holds, we should observe a larger fraction of single-managed funds in
the most extreme percentiles of the distribution of factor loadings.These shares are calculated as the average of the
respective yearly shares over our sample period. This ensures that our results are not
driven by shifting style preferences within the mutual fund industry in combination
with the increased share of team-managed funds."

Their results show for example that the share of single managers is 73% of the top10% Betas and 70% of the top 10-20% betas.

There is something wrong in this code. My results show that men are only 48% of the top10% Betas and this couldn't be true. In fact men follow more extreme strategies and have to have higher betas than women in the top and the bottom of the distribution.

I am not sure how they make this (cumulative or not).

Could someone tell me is this is the right way: I want to rank fund's betas for each month. After that I have to estimate the percentage of women in the top 1%, top 10-40%,middle 20%, bottom 10-40%, bottom 1% . After that I want to make a graph with the share of women in the respective category for the whole period of 13 years. As a whole I need a graph with the percentage of women in each category.

Could someone help me please?
Comment
Dilyana Berg

Join Date: Jun 2015

Posts: 32
#5

18 Jun 2015, 07:39

Anybody? I really need help please?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#6

18 Jun 2015, 08:36

Comments here http://www.statalist.org/forums/foru...nd-percentiles I think imply that you may need to rewrite this radically to get a response.
Comment
Dilyana Berg

Join Date: Jun 2015

Posts: 32
#7

18 Jun 2015, 10:26

OK Lets say it differently. An example with my data is presented above.I have monthly observations for the fund's betas but not all funds have betas for each month. For example for some months there are 300 betas and for others 500 betas. I also have dummy indicating single or team-managed fund. I want to ranking betas for each month and after that to estimate the percentage of single-managed funds in each rank (top 1%, top 10-40%,middle 20%, bottom 10-40%, bottom 1% ) for each month. After that I want to estimate the average percentage of single funds for the whole period by different ranks. Here is an citation from the paper that I have read and the authors make something similar but with yearly observations.

"To get a first idea about the extremity of a fund’s investment style we analyse
the distribution of the factor loadings β1 to β4 from model (1) for team- and singlemanaged
funds. If a fund follows an extreme strategy with respect to a specific style
dimension, its factor loadings are more likely to be in the tail of the distribution of
all fund’s factor loadings in the same year. Thus, if the diversification of opinions
Hypothesis 1 holds, we should observe a larger fraction of single-managed funds in
the most extreme percentiles of the distribution of factor loadings.These shares are calculated as the average of the
respective yearly shares over our sample period. This ensures that our results are not
driven by shifting style preferences within the mutual fund industry in combination
with the increased share of team-managed funds."

Here is my code:

egen deciles=xtile( Beta), by (month) nq(10)
egen all_funds = count( wficn), by (month deciles)

egen singlef = count( wficn) if team==0, by (month deciles)

egen teamf = count( wficn) if team==1, by(month deciles)
gen p_single_funds = singlef/all_funds if all_funds > 0
replace p_single_funds = 0 if singlef == 0 & all_funds > 0
gen p_team_funds = teamf/all_funds if all_funds > 0
replace p_team_funds = 0 if teamf == 0 & all_funds > 0

The problem is when I use -collapse- (collapse p_single_funds, by (deciles)) I obtain some percentages but they are very different from the expected results. I obtain fro example 45% single funds in the top 10%betas, 48 % in the top 30% betas, 53% in the bottom 30% betas and 55% in the bottom 10% betas. Actually the percentage of single funds should be U-shaped formed (that means that there are more single managers in the both tails of the distribution of betas) - the higher percentage of single funds should be in the top 10% and in the bottom 10% betas.

Is something wrong in my code or the way that I am trying to do it compared to the paper above. I am pretty sure about that how should the results.
Moreover I want to know if there is a command in stata that can direct show the top 1%, top 10% of variable, bottom 20%, bottom 10%, bottom 1%.

Thanks in advance
Comment
Dilyana Berg

Join Date: Jun 2015

Posts: 32
#8

18 Jun 2015, 14:31

Anybody?
Comment
Dilyana Berg

Join Date: Jun 2015

Posts: 32
#9

19 Jun 2015, 04:04

Could someone help me?
Comment

Yves Ongenda

Join Date: Aug 2020
Posts: 8

#10

04 Feb 2021, 11:59

hello, i have the following data, i already grouped it in quantiles

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte nq_incftx long totinc
5 44257
1  7067
1  6124
2 11323
5 53423
5 34801
2 10864
5 37855
5 52098
3 22048
3 26063
4 30409
2 11153
3 19812
1  5082
3 21691
1  6500
3 24261
4 28515
4 29481
3 17840
1  6212
1  7486
1  7895
5 42572
1  6420
3 21365
4 30075
1  5358
3 19847
3 22379
5 52834
4 26701
3 17445
1  7604
2 12507
1  6623
3 23164
4 26270
3 24024
4 26117
4 31047
4 32187
5 43226
2 15190
5 43257
4 32615
1  7092
3 23515
2 10471
4 26898
5 43160
2 10276
5 40343
4 28562
1  5400
4 28710
2 11704
5 35480
3 28512
1  8296
4 31798
4 31912
5 38767
1  5324
1  5970
5 45254
4 29520
4 31959
1  5091
3 21586
5 36165
5 44206
5 43224
2 10509
2 13508
4 31411
3 20315
1  4570
3 18264
5 38677
2  9334
5 37884
5 58665
1  6654
4 27402
4 30314
4 29864
2 13690
4 30850
1  5091
1  7429
4 27285
1  5091
4 33758
2  9427
3 23722
3 25700
4 26776
1  5552
end

i was wondering if anyone could help me to calculate the growth rate of income by using percentile ratio
thanks in advanced

Announcement

Panel data, rank and percentiles

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment