t-test error, no observations even though there are lots of observations

Victoria Rogers

Join Date: Oct 2014

Posts: 138
#1

t-test error, no observations even though there are lots of observations

24 Oct 2014, 03:45

I'm running a t-test to compare the returns of men with the returns of women.
There are about 100,000 return observations of men and 10,000 of women but when I run the following code I get an error

Code:

ttest return_men=return_women

no observations
r(2000);

Could you tell me how to solve that because it seems that there must be a logical explanation.
Tags: None
Aljar Meesters

Join Date: Apr 2014

Posts: 30
#2

24 Oct 2014, 03:51

Because of the difference in number of observations, I guess that you want to use the unpaired option.
Best,

Aljar
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#3

24 Oct 2014, 04:07

Thank you for your advice, but shouldn't it be a paired t-test considering men and women are related regardless of the different amount of observations?

http://i.imgur.com/liNvlJm.png

The output doesn't seem to be correct because there's no t-value unfortunately. Does someone know what to do in this case?

Last edited by Victoria Rogers; 24 Oct 2014, 04:11.
Comment
Svend Juul

Join Date: Apr 2014

Posts: 515
#4

24 Oct 2014, 04:11

Is it that the men have valid values of return_men and the women valid values of return_women, but nobody has valid values of both variables? In that case, a paired t-test is impossible, but you can make a two-sample test:

Code:

gen return = return_men replace return=return_women if missing(return) ttest return, by(sex) // if that is its name
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#5

24 Oct 2014, 04:32

If men and women are paired like (e.g.) husbands and wives, then it should be a paired test, but not otherwise. The fact of different numbers alone shows that you can't have pairs. My guess is that Svend's guess is on target here, but you are not showing us how your data are structured.

Two different issues are that with sample sizes like these, almost any difference will count as significant at conventional levels. In addition, if these are financial data in time, independence assumptions are moot.

Last edited by Nick Cox; 24 Oct 2014, 04:35.
Comment

Victoria Rogers

Join Date: Oct 2014
Posts: 138

24 Oct 2014, 04:37

Thank you Svend, I'll try your method.

Code:

gen Male=.
gen Female=.
replace Male=gender if gender==0
replace Female=gender if gender==1

sort caldt
gen alpha_male=.
regress Areturn risk_premium if Male==0
replace alpha_male = _b[_cons]

sort caldt
gen alpha_female=.
regress Areturn risk_premium if Female==1
replace alpha_female = _b[_cons]

replace alpha_male=. if Male==.
replace alpha_female=. if Female==.

ttest alpha_male=alpha_female

alpha=return

EDIT: Svend, your code didn't work or something else is wrong based on my Stata output http://i.imgur.com/Bp9hvGg.png

Last edited by Victoria Rogers; 24 Oct 2014, 04:45.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#7

24 Oct 2014, 04:41

Originally posted by Victoria Rogers View Post

Thank you for your advice, but shouldn't it be a paired t-test considering men and women are related regardless of the different amount of observations?

http://i.imgur.com/liNvlJm.png

The output doesn't seem to be correct because there's no t-value unfortunately. Does someone know what to do in this case?

The output that you show in the attached image is different from the error in your first post in this thread. And it's not a paired t-test. From what's shown in the image, the problem is that you have 100,159 identical values for men and 11,227 identical values for women. The standard deviations and standard error of the means are zero in both groups.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#8

24 Oct 2014, 04:46

Much confusion here. Creating two variables Male and Female is pointless, as you already have gender. Sorting is irrelevant to regression. If you want to save a regression parameter estimate to a variable, you can do it in one command. Those are just extra commands that can be cut.

But there is an enormous problem left. Focusing on the nub of the matter:

Code:

regress RminRF MRP SMB HML MOM if gender==0 gen alpha_male = _b[_cons] if gender==0 regress RminRF MRP SMB HML MOM if gender==1 gen alpha_female = _b[_cons] if gender==1

_b[_cons] is the single intercept estimate from each regression. You can't usefully do a t-test on the same intercept, repeated thousands of times for each regression.

EDIT: Added if qualifiers to generate statements.

Last edited by Nick Cox; 24 Oct 2014, 05:22.
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#9

24 Oct 2014, 05:07

The request of my boss is: "Once you have created a "male manager" and a "female manager" portfolio and have their returns, you should regress the returns on risk_premium and estimate an alpha (regression intercept) for each portfolio, and then test for any significance in the difference between the two alphas." "Each calendar month, form a portfolio of all funds with male manager and all funds with female manager. Compute the portfolio return for each, and estimate the alpha on each portfolio. The comparison of the two alphas will tell you the relative performance of female managers. Note that in this context, unfortunately, you will not be able to use any control variables."

Therefore, I created the 2 variables Male and Female. So, that there are 2 different intercepts.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#10

24 Oct 2014, 05:18

Your code seems intended to compare 100,000 copies of one constant (namely, one estimate of an intercept) with 10,000 copies of another constant (similar in kind) in a t-test. That makes no sense to me statistically. You have thrown away all the information in the data on the uncertainty surrounding those estimates.

Sorry, but I don't understand the prescription enough to tell you what your code should be, or even whether it makes sense.

Not needing to create extra variables for male and female is just a matter of Stata style, on which I am better informed, but that is the least of your problems.

EDIT: Joseph Coveney is making a related and entirely consistent point.

Last edited by Nick Cox; 24 Oct 2014, 05:23.
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#11

24 Oct 2014, 05:33

First, I tried to do what my boss asked by generating the mean excess return for each month for all funds managed by males respectively females, then generated alphas for each month, then did a t-test.

But he replied: Also, as far as I remember the method I described was not simply a matter of t-testing average excess returns. Once you have created a "male manager" and a "female manager" portfolio and have their returns, you should regress the excess returns on mktrf, hml, smb, and umd and estimate an alpha for each portfolio, and then test for any significance in the difference between the two alphas.

Based on that, I believe that I must generate 2 variables with returns and 2 alphas in total. 1 variable called 'male manager' and 1 variable called 'female manager'. 1 alpha in total for 'male manager' as a separate variable and 1 for females and then a t-test to compare those 2 alphas.

After I asked for more details, he said I think I explained clearly enough the first time - please go back to that explanation, read it carefully, and run the method. I look forward to your update on Friday (this Friday).

First I thought that it was due to my lack of experience that I wasn't sure what I needed to do, however, you're the second or third great statistician of this forum who doesn't understand the request. Anyway, I'll probably get fired because I'm relatively new in this company.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#12

24 Oct 2014, 05:42

Sympathies, multiplied, but I have to point out that I am not a statistician, still less a great one, although it's very flattering that you say so.

I have to say that a public forum is not a good place to discuss your job situation. I don't know how identifiable you are, but it's a small world.
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#13

24 Oct 2014, 05:50

Originally posted by Joseph Coveney View Post

The output that you show in the attached image is different from the error in your first post in this thread. And it's not a paired t-test. From what's shown in the image, the problem is that you have 100,159 identical values for men and 11,227 identical values for women. The standard deviations and standard error of the means are zero in both groups.

That's correct. Based on what my boss said, I probably need to compare two different alphas. 1 alpha of males and 1 alpha of females, but apparently I still need standard deviation to compare the two alphas. I'm very confused by his request. Therefore, I first tried a method which seems to be a lot more logical, which I described in my previous post.
Comment
Victoria Rogers

Join Date: Oct 2014

Posts: 138
#14

24 Oct 2014, 06:01

I assumed that because you worked on many Stata commands. Besides that, I always check many Internet websites before I ask something on this forum and your name often shows up.

Thank you for your advice, I'll delete the most sensitive messages after I know how to deal with this gender alpha request. What would you do in my situation, based on the request of my boss?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#15

24 Oct 2014, 06:06

Sorry, but I really can't give you the advice you seek. As said, I don't understand the prescription. I can only work backwards and sometimes see that what you are doing is wrong or could be done better from a Stata point of view.

Also, you really have to appreciate that what you are saying is entirely public. You can't delete stuff unilaterally except within 1 hour of posting.

Last edited by Nick Cox; 24 Oct 2014, 06:10.
Comment

Announcement

t-test error, no observations even though there are lots of observations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment