Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compare two quantitative variables using survey data

    Hello everyone,

    This is my first post, i hope it will be clear.

    I am working on Stata12. I am working with survey data using svy commands. I have two quantitative variables that i would like to compare as i would have done a ttest like ttest var1==var2. I read that there is no svy:ttest command in Stata. How can i compare these two variables? I am not looking to compare a mean over a qualitative variable, but two means from two different quantitative variables.

    Thanks in advance,
    Camille

  • #2
    Code:
    gen delta = var1 - var2
    svy: mean delta

    Comment


    • #3
      Hello Camille,

      Welcome to the Stata Forum.

      Clyde provided an insightful solution.

      Unfortunately, you didn't provide an abridged version of your data or at least a short example to work with.

      Below, and I hope not to be wrong, I gather you could also perform the t test you wish this way:

      Code:
      . webuse nhanes2f
      
      .  svyset psuid [pweight=finalwgt], strata(stratid)
      
            pweight: finalwgt
                VCE: linearized
        Single unit: missing
           Strata 1: stratid
               SU 1: psuid
              FPC 1: <zero>
      
      . */ just to exemplify, let's compare zinc versus corpuscl (both continuous variables)
      
      . */First: we will add the variable "sex" just for the sake of  providing a "classical" comparison (t test "by" group) as well.
      
      . svy: regress zinc corpuscl sex
      (running regress on estimation sample)
      
      Survey: Linear regression
      
      Number of strata   =        31                Number of obs     =        9,101
      Number of PSUs     =        62                Population size   =  103,158,974
                                                    Design df         =           31
                                                    F(   2,     30)   =       211.30
                                                    Prob > F          =       0.0000
                                                    R-squared         =       0.0559
      
      ------------------------------------------------------------------------------
                   |             Linearized
              zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          corpuscl |  -.0970515   .0368623    -2.63   0.013    -.1722327   -.0218703
               sex |  -6.841737   .3712049   -18.43   0.000    -7.598814   -6.084659
             _cons |   106.3211   3.056874    34.78   0.000     100.0865    112.5556
      ------------------------------------------------------------------------------
      
      . */ we get the t score. We may check how it works by using "sex" as an example
      
      . test sex
      
      Adjusted Wald test
      
       ( 1)  sex = 0
      
             F(  1,    31) =  339.71
                  Prob > F =    0.0000
      
      . display sqrt(339.71)
      18.431224
      
      . */ Second: now, I'll use only two continuous variables, as you wished
      
      . svy: regress zinc corpuscl
      (running regress on estimation sample)
      
      Survey: Linear regression
      
      Number of strata   =        31                Number of obs     =        9,101
      Number of PSUs     =        62                Population size   =  103,158,974
                                                    Design df         =           31
                                                    F(   1,     31)   =        12.48
                                                    Prob > F          =       0.0013
                                                    R-squared         =       0.0024
      
      ------------------------------------------------------------------------------
                   |             Linearized
              zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          corpuscl |  -.1308632   .0370503    -3.53   0.001    -.2064277   -.0552987
             _cons |   98.97223   3.242868    30.52   0.000     92.35835    105.5861
      ------------------------------------------------------------------------------
      
      . disp sqrt(12.48)
      3.5327043
      Hopefully that helps.
      Last edited by Marcos Almeida; 18 Jan 2017, 14:55.
      Best regards,

      Marcos

      Comment


      • #4
        Hello Marcos,
        Thank you for your help. However i do have 2 questions:

        1) the continous variables i want to compare are scores of risk perception for several diseases as well as: score_diseaseX and score_diseaseY. So if i code like:
        svy: regress score_diseaseX score diseaseY and i do have a Prob > F = 0.0000 , does it mean that these 2 scores are significantly different?


        2) If i follow the advice of Clyde coding:
        gen delta = var1 - var2 svy: mean delta This command doesn't provide any p-value and i can't know if these two means are statistically different?
        Thanks in advance

        Comment


        • #5
          Hello Camille,

          Please provide some output, as requested in the FAQ.

          For example, you may "svy:mean" both variables, as well as the difference ("delta", as provide by Clyde).
          Best regards,

          Marcos

          Comment


          • #6
            2) If i follow the advice of Clyde coding:
            gen delta = var1 - var2 svy: mean delta This command doesn't provide any p-value and i can't know if these two means are statistically different?
            Well, it does return an r(table) matrix that contains the p-value. So immediately after the code in #2:
            Code:
            matrix M = r(table)
            local pvalue = M[1, 4]
            display "p = `pvalue'"
            That said, the spontaneously displayed output of the -mean- command includes a 95% confidence interval which, for most purposes, is more useful than a p-value.

            Comment


            • #7
              I know it is obvious, but I cannot resist commenting it: t tests, provided we have a very large dataset (as it is usually the case in survey) - what is more, in this particular case, for it was decided to compare two different continuous variables - will virtually always provide a "significant" p-value, whose purpose (theoretically and practically speaking) I fail to envisage. Point.
              Best regards,

              Marcos

              Comment

              Working...
              X