Troubles with ttest and bysort-option

Kevin Wuensch

Join Date: Mar 2019

Posts: 4
#1

Troubles with ttest and bysort-option

12 Mar 2019, 05:59

Hello Statalist,

I have the following data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double(Sec Unit mean_sec) float(mean_end mean_start diff) 12 2215 2 3 4 -1 13 2215 2 3 4 -1 14 2216 4 5 6 -1 15 2216 4 5 6 -1 16 2216 4 5 6 -1 17 2217 3 4 5 -1 18 2217 3 4 5 -1 19 2218 3 6 7 -1 20 2218 3 6 7 -1 21 2218 5 6 7 -1 22 2218 5 6 7 -1 23 2218 5 6 7 -1 24 2219 5 4 5 -1 25 2219 4 4 5 -1 end label values Unit v2_Num label def v2_Num 2215 "05-002", modify label def v2_Num 2216 "05-003", modify label def v2_Num 2217 "05-004", modify label def v2_Num 2218 "05-005", modify label def v2_Num 2219 "05-006", modify

The dataset is much larger, but those are the relevant variables.
It is based on seconds.
'Unit' defines multiple units of various length. Mean_start is the mean for the first second of a unit, mean_end of the last.
'diff' is mean_ende minus mean_start. I would like to run a dependent ttest to test whether this difference is significant, seperate for each unit.

What I did is:
bysort Unit: ttest mean_ende == mean_start

It runs, but it leaves the field for the t-value empty and therefore also for the p-value.

Does anybody know what went wrong?

Thank you!
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4494
#2

12 Mar 2019, 06:03

you have no variability - look at your data within "Unit" - without any variability, there are no standard errors and thus no test
Comment

Kevin Wuensch

Join Date: Mar 2019
Posts: 4

12 Mar 2019, 06:09

Thank you for your reply. I have some troubles understanding it, though. You mean no variability referring to the mean? This is example-data as I can't post the original online, so it might look a bit strange.

Edit: I also do have a variable that gives the standard deviation for the mean for each second if that helps.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(Sec Unit mean_sec) float(mean_end mean_start diff)
12 2215 2 3 4 -1
13 2215 1 3 4 -1
14 2216 4 5 6 -1
15 2216 4 5 6 -1
16 2216 6 5 6 -1
17 2217 3 4 5 -1
18 2217 4 4 5 -1
19 2218 3 6 7 -1
20 2218 4 6 7 -1
21 2218 5 6 7 -1
22 2218 5 6 7 -1
23 2218 3 6 7 -1
24 2219 4 4 5 -1
25 2219 4 4 5 -1
end
label values Unit v2_Num
label def v2_Num 2215 "05-002", modify
label def v2_Num 2216 "05-003", modify
label def v2_Num 2217 "05-004", modify
label def v2_Num 2218 "05-005", modify
label def v2_Num 2219 "05-006", modify

This is more like the original set - the mean by second varies within the unit. The mean_start and _end are the same because the only contain the main for the first /last second of the unit. I also got a dataset on a unit-basis, giving only one observation for each unit:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(Sec Unit mean_sec) float(mean_end mean_start diff)
12 2215 2 3 4 -1
14 2216 4 5 6 -1
17 2217 3 4 5 -1
19 2218 3 6 7 -1
24 2219 4 4 5 -1
end
label values Unit v2_Num
label def v2_Num 2215 "05-002", modify
label def v2_Num 2216 "05-003", modify
label def v2_Num 2217 "05-004", modify
label def v2_Num 2218 "05-005", modify
label def v2_Num 2219 "05-006", modify

But ttest doesn't work here either.

Last edited by Kevin Wuensch; 12 Mar 2019, 06:20.

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

12 Mar 2019, 06:53

it also seems you have discrete data and the range is quite short.

Additionaly, you have just a few observations per Unit (from 2 to 4), hence I didn't get the reason to perform this test.

That being said, maybe a nonparametric test would be helpful:

Code:

bysort Unit: signrank mean_end=mean_start

Best regards,

Marcos
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17742
#5

12 Mar 2019, 08:53

Kevin:
as you do not have dispersion around the mean, -ttest- (just like any parametric inference on these data) is doomed to fail.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kevin Wuensch

Join Date: Mar 2019

Posts: 4
#6

12 Mar 2019, 09:06

Thank you for your suggestion, Marcos!

I think I wasn't very clear about my problem earlier, so I will try to explain my data and what I'm trying to do a bit more detailed.

My data set consists of approx. 6000 seconds, divided into units of different length (between 2 and 25 seconds).
For each participant there is also a variable (there are 122 participants), which assumes a value of 1-7 for each second. This is an evaluation measure.
The variable 'mean_sec' indicates for each second the mean of all 122 participants.
Within each unit the participants were exposed to certain stimuli, which caused them to change their evaluation. To measure the change in evaluation, the difference between the mean in the first and the mean in the last second of the unit was calculated and stored in the variable 'diff'.

Now I would like to be able to say for each unit whether this difference is significant.
For example, a ttest mean_end==mean_start does not help me, because then I only get the significance of all units together.

The problem is, as mentioned earlier, that the two relevant values - mean_end and mean_start - do not vary within the unit.
I wonder if a wilcoxon rank test is reliable under these cirumstances?

Thanks again for your input!

Edit: Thank you, Carlo! Do you have a suggestion how to calculate significances of the differences another way? Should I leave the mean and go back to the participant level?

Last edited by Kevin Wuensch; 12 Mar 2019, 09:09.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17742
#7

12 Mar 2019, 09:17

Kevin:
I think the best approach is going back to the original dataset (ie, participant level) and think about inference on that basis.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kevin Wuensch

Join Date: Mar 2019

Posts: 4
#8

12 Mar 2019, 11:51

Thank you, I will.
The dataset on the participant level is a huge set with n=100 participants and 6000 obs (Seconds). Plus there is a weight. What I know I can do in SPSS is run a paired ttest and using the first and the last Second of a Unit as a Pair, all with weights on (I need to weigh the data; the means in the other dataset described before were also weighed).
The trouble is that there are loooots of units and this is only on of many datasets.
But since Stata doesn't allow weights for ttest, I guess this is my only option.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4494
#9

12 Mar 2019, 12:23

anything you can do with a t-test can be replicated with regression - and regression does allow weights
1 like
Comment

Announcement