How to compare two quantitative variables using survey data

Camille Fritzell

Join Date: Jan 2017

Posts: 4
#1

How to compare two quantitative variables using survey data

18 Jan 2017, 13:11

Hello everyone,

This is my first post, i hope it will be clear.

I am working on Stata12. I am working with survey data using svy commands. I have two quantitative variables that i would like to compare as i would have done a ttest like ttest var1==var2. I read that there is no svy:ttest command in Stata. How can i compare these two variables? I am not looking to compare a mean over a qualitative variable, but two means from two different quantitative variables.

Thanks in advance,
Camille
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

18 Jan 2017, 14:09

Code:

gen delta = var1 - var2 svy: mean delta
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

18 Jan 2017, 14:51

Hello Camille,

Welcome to the Stata Forum.

Clyde provided an insightful solution.

Unfortunately, you didn't provide an abridged version of your data or at least a short example to work with.

Below, and I hope not to be wrong, I gather you could also perform the t test you wish this way:

Code:

. webuse nhanes2f

.  svyset psuid [pweight=finalwgt], strata(stratid)

      pweight: finalwgt
          VCE: linearized
  Single unit: missing
     Strata 1: stratid
         SU 1: psuid
        FPC 1: <zero>

. */ just to exemplify, let's compare zinc versus corpuscl (both continuous variables)

. */First: we will add the variable "sex" just for the sake of  providing a "classical" comparison (t test "by" group) as well.

. svy: regress zinc corpuscl sex
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =        31                Number of obs     =        9,101
Number of PSUs     =        62                Population size   =  103,158,974
                                              Design df         =           31
                                              F(   2,     30)   =       211.30
                                              Prob > F          =       0.0000
                                              R-squared         =       0.0559

------------------------------------------------------------------------------
             |             Linearized
        zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    corpuscl |  -.0970515   .0368623    -2.63   0.013    -.1722327   -.0218703
         sex |  -6.841737   .3712049   -18.43   0.000    -7.598814   -6.084659
       _cons |   106.3211   3.056874    34.78   0.000     100.0865    112.5556
------------------------------------------------------------------------------

. */ we get the t score. We may check how it works by using "sex" as an example

. test sex

Adjusted Wald test

 ( 1)  sex = 0

       F(  1,    31) =  339.71
            Prob > F =    0.0000

. display sqrt(339.71)
18.431224

. */ Second: now, I'll use only two continuous variables, as you wished

. svy: regress zinc corpuscl
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =        31                Number of obs     =        9,101
Number of PSUs     =        62                Population size   =  103,158,974
                                              Design df         =           31
                                              F(   1,     31)   =        12.48
                                              Prob > F          =       0.0013
                                              R-squared         =       0.0024

------------------------------------------------------------------------------
             |             Linearized
        zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    corpuscl |  -.1308632   .0370503    -3.53   0.001    -.2064277   -.0552987
       _cons |   98.97223   3.242868    30.52   0.000     92.35835    105.5861
------------------------------------------------------------------------------

. disp sqrt(12.48)
3.5327043

Hopefully that helps.

Last edited by Marcos Almeida; 18 Jan 2017, 14:55.

Best regards,

Marcos

Comment

Camille Fritzell

Join Date: Jan 2017

Posts: 4
#4

20 Jan 2017, 06:38

Hello Marcos,
Thank you for your help. However i do have 2 questions:

1) the continous variables i want to compare are scores of risk perception for several diseases as well as: score_diseaseX and score_diseaseY. So if i code like:
svy: regress score_diseaseX score diseaseY and i do have a Prob > F = 0.0000 , does it mean that these 2 scores are significantly different?

2) If i follow the advice of Clyde coding:
gen delta = var1 - var2 svy: mean delta This command doesn't provide any p-value and i can't know if these two means are statistically different?
Thanks in advance
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

20 Jan 2017, 07:05

Hello Camille,

Please provide some output, as requested in the FAQ.

For example, you may "svy:mean" both variables, as well as the difference ("delta", as provide by Clyde).

Best regards,

Marcos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

21 Jan 2017, 09:38

2) If i follow the advice of Clyde coding:
gen delta = var1 - var2 svy: mean delta This command doesn't provide any p-value and i can't know if these two means are statistically different?

Well, it does return an r(table) matrix that contains the p-value. So immediately after the code in #2:

Code:

matrix M = r(table) local pvalue = M[1, 4] display "p = `pvalue'"

That said, the spontaneously displayed output of the -mean- command includes a 95% confidence interval which, for most purposes, is more useful than a p-value.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#7

21 Jan 2017, 12:35

I know it is obvious, but I cannot resist commenting it: t tests, provided we have a very large dataset (as it is usually the case in survey) - what is more, in this particular case, for it was decided to compare two different continuous variables - will virtually always provide a "significant" p-value, whose purpose (theoretically and practically speaking) I fail to envisage. Point.

Best regards,

Marcos
1 like
Comment

Announcement

How to compare two quantitative variables using survey data

Comment

Comment

Comment

Comment

Comment

Comment