different outputs for same ttest command

bigl64

Join Date: Jun 2014

Posts: 4
#1

different outputs for same ttest command

01 Jun 2014, 07:36

Hello,
I am currently using stata as part of my research and facing the following problem: when I run a ttest var1=var2, unpaired, I obtain slightly changing results for the same do-file. In particular the p values on all test vary. For instance taking HO var1>var2, the largest p value I got must have been 10,8% and the lowest one 9,5%. Note that these value only change when I run the entire do-file, not if I type the same command over and over.
Should I worry about my coding until the ttest or is it normal for stata to provide slightly different results? After all I am thinking than computing a p-value could involve a simulation on a number of iterations, so that results could change slightly.
Many thanks in advance!
Tags: None
Svend Juul

Join Date: Apr 2014

Posts: 515
#2

01 Jun 2014, 08:25

This should not happen, so the do-file must do something the data - which I cannot see from this distance. Does it remove some observations or change the values of some variables? If that is the case, the problem is solved, but if not ... ???

In general, the ttest var1==var2, unpaired syntax is meant for the situation where the values of different variables in the same observation do not belong to the same unit of analysis. In general, this is not a recommended data structure.

Most of us prefer to communicate with people who display their real names.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#3

01 Jun 2014, 08:42

Please show more of your code and output. My first guess would be that there is some sort of sample selection going on, and that this is not consistent across repetitions. Use of the sort command can lead to situations like this, since the sorting doesn't need to be the same every time.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#4

01 Jun 2014, 09:25

The unpaired syntax does seem quite peculiar to me. The only time I have used it is to show, for pedagogical purposes, how results differ between a matched pairs t-test and an independent samples t-test. I am not sure why anybody would set up their data this way. Maybe data entry is slightly easier, because you don't have to create a separate variable for group membership. Even if that were so, you could probably use the reshape command to get a more conventional data structure. The original poster didn't ask about this, but I wonder why s/he is setting up the data this way and whether a mistake is being made.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
bigl64

Join Date: Jun 2014

Posts: 4
#5

03 Jun 2014, 05:54

Dear all,
First of all my name is Louis Raffestin, I thought it was common practice to use nicknames, sorry about that.
On the file itself, first I take note of your comments on t-tests, unpaired. This was really a preliminary test just to see what happens, I have not yet decided what exact test to use, though I was heading toward a t-test because I have a fairly large sample of variable that are nearly normal so that the mean should be normally distributed.

On the issue of changing p-values,thank you for your answers. I have since investigated and had confirmation something fishy is my code, causing the data to differ slightly from one version to the other ( I have ran the code twice and merged the resulting databases, 3 observations were not matched). I attach the code to this message.

I also have the feeling that the group and sort commands may cause the problem, but I just can't put my finger on it.
Thank you!
Louis

Attached Files

fishy.txt (3.8 KB, 1 view)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#6

03 Jun 2014, 06:34

On your sort commands, I would add the -stable- option, e,g,

sort yoyo, stable

Otherwise the data can get sorted differently each time when cases have the same value on the sorting variable or variables. There might be other issues but start with that.

I can't really tell what your data set measures, but I would be surprised if you want the -unpaired- option.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
bigl64

Join Date: Jun 2014

Posts: 4
#7

03 Jun 2014, 08:57

Stable does solve the problem, thanks! On the unpaired option. What I am trying to do is find out whether my variable of interest is significantly higher for a particular category. More precisely, I want to know if the variable 'change_per_class' is higher when from=="BBBminus" than it is for other values of 'from'.
I have decided to test this by creating a variable 'key_in' that is equal to 'change_per_class' when from=="BBBminus", replace change_per_class ==. when from=="BBBminus" and do the unpaired t test.
I guess I could have just created a dummy that is 1 when from=="BBBminus" and run: " ttest change_per_class, by (dummy)" . i just tried it and the results are the same. Does this sound allright?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#8

03 Jun 2014, 09:36

"Stable does solve the problem..." Well, does it actually solve the problem, or just mask it? Evidently you are doing a -sort- on some variables that do not uniquely identify the data, and then you select from the data in a way that depends on the sort order. The -stable- option assures that the ties will be broken the same way each time so your results are now replicable from run to run. But do you know that the particular order you have now locked into place is the "correct" one, or even that any order is "correct." I don't really understand the full context you are working in, so I don't have further suggestions. But in most contexts this kind of situation is the mark of a problematic analysis. Unless you are explicitly trying to do Monte Carlo simulations, code that produces indeterminate results unless forced not to is usually incorrect code.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#9

03 Jun 2014, 09:58

Clyde makes a very good point which I should have mentioned earlier. Bill Gould has commented on the problems with sort,stable in the past. Here is one post, but I think he has others:

http://www.stata.com/statalist/archi.../msg00582.html

Also see

http://www.stata-journal.com/sjpdf.h...iclenum=dm0019

I don't know why the program is doing all this sorting and dropping, but the procedure could be potentially problematic.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
bigl64

Join Date: Jun 2014

Posts: 4
#10

03 Jun 2014, 11:05

I understand... So I have looked deeper into the code and finally found the command that caused the problem
"sort yoyo
by yoyo: replace killing_not_so_softly=1 if sum(missing(beta_before_group))>0"

Yoyo was a group variable which attached a number to a given bond downgrade, which is what I study. I though this command was to assign a value of 1 to my dummy (killing_not_so_softly ) for the entire yoyo group. Except it is not, for instance the data ;
yoyo beta
1 1
1 .
yield the following value for killing_not_so_softly
0
1
which later leads me to drop the second obs but keep the first one. Whenever stata decided to sort differently and present the data as:
yoyo beta
1 .
1 1
Killing not so softly was 1 for both obs and thus I dropped both obs. Hence the difference. I have changed this and now the results are the same without the stable option.
Thank you to both of you for the comments.
Louis
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#11

03 Jun 2014, 11:39

Good. I was careless in recommending the stable option. Clyde was correct. stable was masking the problem, not solving it. Glad Clyde brought that up.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

different outputs for same ttest command

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment