Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gauß- / Z-Test

    Hello,

    Could someone please tell me what is the command for performing a mean value comparison of two samples by using the Gauß-Test / Z-Test instead of a T-test?

    Thank you!

  • #2
    A z-test is more a pedagogical tool as a stepping stone to a T-test. I cannot think of a single real application where it is appropriate. It would be interesting to find such an application. It would help with teaching. So, could you tell us a bit more about your problem?
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I agree with Maarten. In essence, if the sample size is large enough, the z test would in practice give essentially the same result as the t test.If that's not true, then you shouldn't use a z test. I think these are good reasons that it is not obviously available.

      Comment


      • #4
        Ah ok, so it is not available in stata. Well, I was just looking for a test that could back-up my results from the t-test. I thought the z-test could be an option.

        As alterternative I would calculate the Wilcoxon-Mann-Whitney-Test which should deliver significant results in all cases where the t-tests did so.

        Comment


        • #5
          As before, the z test cannot in any sense "back up" results from t tests. If the results are the same, the z test is not giving independent evidence as it is essentially the same test. If the results differ, your sample size is too small for the z test to apply.

          WMW answers a different question. If t tests don't give the result you expect, it's best to wonder why, not just to move on to another test.

          Comment


          • #6
            The reason why I want to back up the t tests' results is because my data does not follow a normal distribution. Following theory, this violates one of the preconditions of the t test. The WMW does not require a normal distribution and can tell me, whether my samples deviate significantly in their medians. So it at least should point in the same direction as my t tests. Am I wrong about this?

            Comment


            • #7
              A t-Test can easily be expressed in a simple linear regression framework, in which a normal distribution of the variables is not assumed. The assumption is that the error term in this model is normal, but even this assumption can be neglected with a reasonably large sample size (some textbooks state N > 50).


              Edit:

              What might be more relevant than a normal distribution is the assumption of equal variances in the groups, which is called homoscedasticity in the regression framework. The ttest command has an unequal option to account for violation of this assumption. With regress you would specify the vce(robust) option.

              Best
              Daniel
              Last edited by daniel klein; 06 Feb 2015, 07:11.

              Comment


              • #8
                WMW is not really a test of different medians. Textbooks differ on what wording is suitable for the masses.

                But that aside, your objective is not clear here.

                If you want to compare means, then OK, and there are lots of ways to do it. Details can alter conclusions, but here is one counter-example to underline that what is assumed (or better, postulated) about marginal distributions need not be that crucial. Let's underline that 74 is not an especially large size, but for this purpose it's large enough for "being a small sample" not to bite you. We get P-values around 0.68 regardless of entertaining rather different models for the data. Of course, you have to try it for your case. If the reason for being non-normal is a massive outlier, results will be sensitive to assumptions.

                Code:
                 
                . sysuse auto, clear 
                (1978 Automobile Data)
                
                . ttest price, by(foreign)
                
                Two-sample t test with equal variances
                ------------------------------------------------------------------------------
                   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
                ---------+--------------------------------------------------------------------
                Domestic |      52    6072.423    429.4911    3097.104    5210.184    6934.662
                 Foreign |      22    6384.682    558.9942    2621.915     5222.19    7547.174
                ---------+--------------------------------------------------------------------
                combined |      74    6165.257    342.8719    2949.496    5481.914      6848.6
                ---------+--------------------------------------------------------------------
                    diff |           -312.2587    754.4488               -1816.225    1191.708
                ------------------------------------------------------------------------------
                    diff = mean(Domestic) - mean(Foreign)                         t =  -0.4139
                Ho: diff = 0                                     degrees of freedom =       72
                
                    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                 Pr(T < t) = 0.3401         Pr(|T| > |t|) = 0.6802          Pr(T > t) = 0.6599
                
                . glm price foreign
                
                Iteration 0:   log likelihood = -695.62494  
                
                Generalized linear models                          No. of obs      =        74
                Optimization     : ML                              Residual df     =        72
                                                                   Scale parameter =   8799417
                Deviance         =  633558013.5                    (1/df) Deviance =   8799417
                Pearson          =  633558013.5                    (1/df) Pearson  =   8799417
                
                Variance function: V(u) = 1                        [Gaussian]
                Link function    : g(u) = u                        [Identity]
                
                                                                   AIC             =  18.85473
                Log likelihood   = -695.6249418                    BIC             =  6.34e+08
                
                ------------------------------------------------------------------------------
                             |                 OIM
                       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     foreign |   312.2587   754.4488     0.41   0.679    -1166.434    1790.951
                       _cons |   6072.423    411.363    14.76   0.000     5266.166     6878.68
                ------------------------------------------------------------------------------
                
                . glm price foreign, link(log)
                
                Iteration 0:   log likelihood = -699.23223  
                Iteration 1:   log likelihood = -695.81557  
                Iteration 2:   log likelihood = -695.62496  
                Iteration 3:   log likelihood = -695.62494  
                
                Generalized linear models                          No. of obs      =        74
                Optimization     : ML                              Residual df     =        72
                                                                   Scale parameter =   8799417
                Deviance         =  633558013.5                    (1/df) Deviance =   8799417
                Pearson          =  633558013.5                    (1/df) Pearson  =   8799417
                
                Variance function: V(u) = 1                        [Gaussian]
                Link function    : g(u) = ln(u)                    [Log]
                
                                                                   AIC             =  18.85473
                Log likelihood   = -695.6249418                    BIC             =  6.34e+08
                
                ------------------------------------------------------------------------------
                             |                 OIM
                       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     foreign |   .0501438   .1200041     0.42   0.676    -.1850599    .2853475
                       _cons |   8.711513   .0677428   128.60   0.000      8.57874    8.844287
                ------------------------------------------------------------------------------
                
                . glm price foreign, f(gamma)
                
                Iteration 0:   log likelihood = -719.86823  
                Iteration 1:   log likelihood = -719.75548  
                Iteration 2:   log likelihood = -719.75513  
                
                Generalized linear models                          No. of obs      =        74
                Optimization     : ML                              Residual df     =        72
                                                                   Scale parameter =  .2334392
                Deviance         =  12.69664531                    (1/df) Deviance =  .1763423
                Pearson          =  16.80762473                    (1/df) Pearson  =  .2334392
                
                Variance function: V(u) = u^2                      [Gamma]
                Link function    : g(u) = 1/u                      [Reciprocal]
                
                                                                   AIC             =   19.5069
                Log likelihood   = -719.7551282                    BIC             =  -297.196
                
                ------------------------------------------------------------------------------
                             |                 OIM
                       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     foreign |  -8.05e-06   .0000195    -0.41   0.679    -.0000462    .0000301
                       _cons |   .0001647    .000011    14.97   0.000     .0001431    .0001862
                ------------------------------------------------------------------------------
                
                . glm price foreign, f(gamma) link(log)
                
                Iteration 0:   log likelihood = -719.92833  
                Iteration 1:   log likelihood = -719.75532  
                Iteration 2:   log likelihood = -719.75513  
                Iteration 3:   log likelihood = -719.75513  
                
                Generalized linear models                          No. of obs      =        74
                Optimization     : ML                              Residual df     =        72
                                                                   Scale parameter =   .233444
                Deviance         =   12.6966453                    (1/df) Deviance =  .1763423
                Pearson          =  16.80796837                    (1/df) Pearson  =   .233444
                
                Variance function: V(u) = u^2                      [Gamma]
                Link function    : g(u) = ln(u)                    [Log]
                
                                                                   AIC             =   19.5069
                Log likelihood   = -719.7551282                    BIC             =  -297.196
                
                ------------------------------------------------------------------------------
                             |                 OIM
                       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     foreign |   .0501439    .122884     0.41   0.683    -.1907043    .2909922
                       _cons |   8.711513   .0670025   130.02   0.000     8.580191    8.842835
                ------------------------------------------------------------------------------

                Comment


                • #9
                  Christopher:
                  as an aside to previous excellent replies, you may want to compare the results of -ttest- to the ones obtained via a bootstrapped ttest. This procedure is covered under Example 3 of -bootstrap- entry in Stata 13.1 .pdf manual.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Dear mister Lazzaro,
                    I am having the same problem as Christopher, i am at the moment reading the example you stated here, but the thing is that they again use an example that tests a variable per group (only 2 groups allowed), however i was wondering how i could apply this bootstrap method on a t test with two variables?

                    Comment


                    • #11
                      That is possible. Below is an example. As an estimator for the p-value I like to use (#(t>t_obs)+1)/(B+1) rather than #(t>t_obs)/B, #(t>t_obs) is the number of replications in which the t is larger (or more extreme, if we think of t and t_obs as absolute values), and B is the number of replications. See Chapter 4 of A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge: Cambridge University Press. This is a bit pedantic, as you typically need a large B to get a reliable estimate, and at large B the difference between the two becomes quickly very small. In addition, this is a bootstrap test, so there is some randomness the estimate. If we were to run this example again (without setting the seed) then we would get a (slightly) different estimate of the p-value. The uncertainty can be quantified by a Monte Carlo confidence interval. If that interval is too large for your taste, then you need to increase the number of replications.

                      Code:
                      clear all
                      webuse fuel
                      ttest mpg1 = mpg2
                      tempname tobs m1 m2 m
                      scalar `tobs' = r(t)
                      scalar `m1' = r(mu_1)
                      scalar `m2' = r(mu_2)
                      scalar `m' = (`m1' + `m2')/2
                      
                      replace mpg1 = mpg1 - `m1' + `m'
                      replace mpg2 = mpg2 - `m2' + `m'
                      
                      tempfile bs
                      bootstrap t = r(t) , reps(20000) saving(`bs') : ttest mpg1 = mpg2
                      
                      use `bs', clear
                      
                      qui count if abs(t)>=abs(`tobs')
                      local a = r(N) + 1
                      local b = _N +1 - r(N)
                      local alph = (100-c(level))/200
                      local lb = invibeta(`a', `b', `alph')
                      local ub = invibetatail(`a', `b', `alph')
                      di as txt "achieved signiciance level: " as result (r(N)+1)/(_N+1)
                      di as txt "MC CI                     : [" as result `lb' as txt ", " as result `ub' as txt "]"
                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment

                      Working...
                      X