Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Troubles with ttest and bysort-option

    Hello Statalist,

    I have the following data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(Sec Unit mean_sec) float(mean_end mean_start diff)
    12 2215 2 3 4 -1
    13 2215 2 3 4 -1
    14 2216 4 5 6 -1
    15 2216 4 5 6 -1
    16 2216 4 5 6 -1
    17 2217 3 4 5 -1
    18 2217 3 4 5 -1
    19 2218 3 6 7 -1
    20 2218 3 6 7 -1
    21 2218 5 6 7 -1
    22 2218 5 6 7 -1
    23 2218 5 6 7 -1
    24 2219 5 4 5 -1
    25 2219 4 4 5 -1
    end
    label values Unit v2_Num
    label def v2_Num 2215 "05-002", modify
    label def v2_Num 2216 "05-003", modify
    label def v2_Num 2217 "05-004", modify
    label def v2_Num 2218 "05-005", modify
    label def v2_Num 2219 "05-006", modify


    The dataset is much larger, but those are the relevant variables.
    It is based on seconds.
    'Unit' defines multiple units of various length. Mean_start is the mean for the first second of a unit, mean_end of the last.
    'diff' is mean_ende minus mean_start. I would like to run a dependent ttest to test whether this difference is significant, seperate for each unit.

    What I did is:
    bysort Unit: ttest mean_ende == mean_start

    It runs, but it leaves the field for the t-value empty and therefore also for the p-value.

    Does anybody know what went wrong?

    Thank you!

  • #2
    you have no variability - look at your data within "Unit" - without any variability, there are no standard errors and thus no test

    Comment


    • #3
      Thank you for your reply. I have some troubles understanding it, though. You mean no variability referring to the mean? This is example-data as I can't post the original online, so it might look a bit strange.

      Edit: I also do have a variable that gives the standard deviation for the mean for each second if that helps.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(Sec Unit mean_sec) float(mean_end mean_start diff)
      12 2215 2 3 4 -1
      13 2215 1 3 4 -1
      14 2216 4 5 6 -1
      15 2216 4 5 6 -1
      16 2216 6 5 6 -1
      17 2217 3 4 5 -1
      18 2217 4 4 5 -1
      19 2218 3 6 7 -1
      20 2218 4 6 7 -1
      21 2218 5 6 7 -1
      22 2218 5 6 7 -1
      23 2218 3 6 7 -1
      24 2219 4 4 5 -1
      25 2219 4 4 5 -1
      end
      label values Unit v2_Num
      label def v2_Num 2215 "05-002", modify
      label def v2_Num 2216 "05-003", modify
      label def v2_Num 2217 "05-004", modify
      label def v2_Num 2218 "05-005", modify
      label def v2_Num 2219 "05-006", modify
      This is more like the original set - the mean by second varies within the unit. The mean_start and _end are the same because the only contain the main for the first /last second of the unit. I also got a dataset on a unit-basis, giving only one observation for each unit:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(Sec Unit mean_sec) float(mean_end mean_start diff)
      12 2215 2 3 4 -1
      14 2216 4 5 6 -1
      17 2217 3 4 5 -1
      19 2218 3 6 7 -1
      24 2219 4 4 5 -1
      end
      label values Unit v2_Num
      label def v2_Num 2215 "05-002", modify
      label def v2_Num 2216 "05-003", modify
      label def v2_Num 2217 "05-004", modify
      label def v2_Num 2218 "05-005", modify
      label def v2_Num 2219 "05-006", modify
      But ttest doesn't work here either.
      Last edited by Kevin Wuensch; 12 Mar 2019, 06:20.

      Comment


      • #4
        it also seems you have discrete data and the range is quite short.

        Additionaly, you have just a few observations per Unit (from 2 to 4), hence I didn't get the reason to perform this test.

        That being said, maybe a nonparametric test would be helpful:

        Code:
        bysort Unit: signrank mean_end=mean_start
        Best regards,

        Marcos

        Comment


        • #5
          Kevin:
          as you do not have dispersion around the mean, -ttest- (just like any parametric inference on these data) is doomed to fail.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you for your suggestion, Marcos!

            I think I wasn't very clear about my problem earlier, so I will try to explain my data and what I'm trying to do a bit more detailed.

            My data set consists of approx. 6000 seconds, divided into units of different length (between 2 and 25 seconds).
            For each participant there is also a variable (there are 122 participants), which assumes a value of 1-7 for each second. This is an evaluation measure.
            The variable 'mean_sec' indicates for each second the mean of all 122 participants.
            Within each unit the participants were exposed to certain stimuli, which caused them to change their evaluation. To measure the change in evaluation, the difference between the mean in the first and the mean in the last second of the unit was calculated and stored in the variable 'diff'.

            Now I would like to be able to say for each unit whether this difference is significant.
            For example, a ttest mean_end==mean_start does not help me, because then I only get the significance of all units together.

            The problem is, as mentioned earlier, that the two relevant values - mean_end and mean_start - do not vary within the unit.
            I wonder if a wilcoxon rank test is reliable under these cirumstances?

            Thanks again for your input!


            Edit: Thank you, Carlo! Do you have a suggestion how to calculate significances of the differences another way? Should I leave the mean and go back to the participant level?
            Last edited by Kevin Wuensch; 12 Mar 2019, 09:09.

            Comment


            • #7
              Kevin:
              I think the best approach is going back to the original dataset (ie, participant level) and think about inference on that basis.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thank you, I will.
                The dataset on the participant level is a huge set with n=100 participants and 6000 obs (Seconds). Plus there is a weight. What I know I can do in SPSS is run a paired ttest and using the first and the last Second of a Unit as a Pair, all with weights on (I need to weigh the data; the means in the other dataset described before were also weighed).
                The trouble is that there are loooots of units and this is only on of many datasets.
                But since Stata doesn't allow weights for ttest, I guess this is my only option.

                Comment


                • #9
                  anything you can do with a t-test can be replicated with regression - and regression does allow weights

                  Comment

                  Working...
                  X