Comparing dummy variables regarding another variable

Tim Wolf

Join Date: Mar 2019

Posts: 25
#1

Comparing dummy variables regarding another variable

24 Mar 2019, 10:17

Hello,

My data set consists of funds and their historical returns from 2004-2018. My goal is in a first step to compare the performance of 20% best and 20% worst (calculated for each observation date) performing funds for the whole time period.

My variables are id (different for each fund), date, and hret (historical return).
I did already build dummy variables for the best and worst 20% of funds for each date. Now I want to analyze if the performance difference within these dummy variables is statistically significant over the whole time period (2004-2018) or not.
For now my code looks the following:

Code:

* Using egenmore, xtile ssc install egenmore egen hret_decile = xtile(hret), by(date) nq(10) gen byte top_performer_hret = 1 if inlist(hret_decile, 9, 10) gen byte bottom_performer_hret = 1 if inlist(hret_decile, 1, 2)

Now I do have two dummy variables for each date’s top and bottom performers.

How can I now compare these two dummy variables regarding their hret (historicalreturn)? I would like to see the mean hret of both dummy variables and analyze if the difference is statistically significant or not by performing a t-test. However, I do not how to do it. I would appreciate any advice.

Furthermore, is my approach of using dummy variables the best one or are there other (perhaps more suitable) possibilities?

Thank you for your help,

Tim Wolf
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

24 Mar 2019, 10:55

Representing two categories with two variables is not the best approach. You need, instead, to represent the two categories by two different values of a single variable. Your job is somewhat complicated by the existence of a third group (those falling between the bottom and top 20%).

Code:

label define top_bottom 0 "bottom" 1 "top" gen bye top_bottom = 1 if inlist(hret_decile, 9, 10) replace top_bottom = 0 if inlist(hret_decile, 1, 2) label values top_bottom top_bottom ttest hret, by(top_bottom)

That said, this strikes me as bordering on nonsensical, so I have a feeling I'm misunderstanding what you want. If the two groups are defined by their values of hret, there is no possibility that the returns in the top 20% and bottom 20% are the same (unless all your stocks have exactly the same return). That difference must exist, so that testing a null hypothesis of no difference is even more of a straw man than it is in most other situations. If you find that the difference is not statistically significant, all that will mean is that the data are so noisy or your sample so small (or both) that you do not have sufficient statistical power to detect the difference that by definition must be non-zero.

So what am I missing here?
Comment
Tim Wolf

Join Date: Mar 2019

Posts: 25
#3

24 Mar 2019, 12:41

Hello Clyde,

first of all thank you very much for your quick reply. In my study I am analyzing differences within so called "socially responsible investment" funds. Since these invest in a similar style due to certain limitations they face, I want to see in a first step how big the performance difference between the top 20% and bottom 20% actually is. Given that they are all so called "social" funds and my sample is not extremely big, I am afraid there is a small chance that the difference between the top and bottom performers is pretty small and therefore not statistically significant. In a way I am testing if my sample is too small or my data too noisy.

However, this is just the first step of my analysis and therefore a bit trivial.

Your recomendation to represent the two categories by two different values of a single variable is very helpful. I did try to use the code you suggested, but after entering the second line I keep getting the error "too many variables specified" including the return code 103.

What am I doing wrong?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#4

24 Mar 2019, 13:25

Oh, sorry, typo in that second line. It should be:

Code:

gen byte top_bottom = 1 if inlist(hret_decile, 9, 10)
Comment
Tim Wolf

Join Date: Mar 2019

Posts: 25
#5

25 Mar 2019, 13:21

Thank you very much for your help, the code works perfectly fine now.
Comment
Tim Wolf

Join Date: Mar 2019

Posts: 25
#6

05 Apr 2019, 03:32

I do have one follow-up question and hope you can help me.

My code looks like this and works fine

Code:

ssc install egenmore egen hret_decile = xtile(hret), by(date) nq(10) label define top_bottom 0 "bottom" 1 "top" gen byte top_bottom = 1 if inlist(hret_decile, 9, 10) replace top_bottom = 0 if inlist(hret_decile, 1, 2) label values top_bottom top_bottom ttest hret, by(top_bottom)

However, as the t-test shows, the number of observations within the "top" and "bottom" groups is not the same. For "bottom" there are 1,857 observations and for "top" 1,736. I do not have observations with missing hret, so this can not be the problem.

Do you have any idea how this difference can be explained?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35791
#7

05 Apr 2019, 06:10

The question in #6 no longer matches the thread title and would have been better posted as a new thread.

But it's very easy to explain. The rule Stata follows is that observations with the same value must be assigned to the same bin, here decile bin. So ties can and in practice often will frustrate the ideal of equal frequencies in each bin. A simple check is that

Code:

quantile het, ms(none) mla(hret_decile) mlabpos(0)

will show you stripes of equal values that frustrate this exercise.

More discussion within https://www.stata-journal.com/articl...article=pr0054

https://www.stata-journal.com/articl...article=dm0095

Last edited by Nick Cox; 05 Apr 2019, 06:26.
Comment
Tim Wolf

Join Date: Mar 2019

Posts: 25
#8

05 Apr 2019, 09:41

Thank you very much, next time I will open a new thread.
Comment
ighoiye fortune

Join Date: Apr 2019

Posts: 3
#9

05 Apr 2019, 12:46

good day, please i am trying to create a new dummy variable which is meant to be the top 20% of the original variable how can i generate this?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35791
#10

05 Apr 2019, 12:50

#9 already answered in https://www.statalist.org/forums/for...4-market-share

ighoiye fortune Please don't ask the same question in more than one thread!
Comment

Announcement

Comparing dummy variables regarding another variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment