Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alternatives to ttest using svy? Comparison of means between male and female respondents

    I am using Stata 14.1, I have a dataset obtained via survey, where I have circa. 10 variables which consist of a 1-5 ranking (answers to questions such as: "from 1 to 5, how much do you identify the following statement with Y party?"), circa. 5 which consist on a 0-10 ranking ("from 0 left to 10 right, where do you position Y party?") and demographic variables (gender, age, self positioning on the left-right scale...)
    Since the demographic of respondents was not representative in terms of proportions (too many males, too left leaning...) but I had a good amount of responses (circa. 6000) I have also created a weight variable, where it is stated how much weight each observation should have, taking into account known data of gender and left right self positioning.

    I want to perform a t testto find out if there is significant difference in means due to gender and membership to a party; and between male and female members of the party. I have come to understand that t test cannot be performed with weights, but I am not aware of any alternatives.

    I have tried the following code:

    Code:
     svyset [iw=gndrlrweight]
    svy: ttest lrPP, by(gndr)
    svy: ttest liberalism if member==2, by(gndr)
    and

    Code:
    svyset [iw=gndrlrweight]
    svy: regttest lrPP, by(gndr)
    I have read of the possibility of using the following code, but I do not know if it is equivalent to a ttest or how to interpret it (how to find out significance).

    Code:
    svyset [iw=gndrlrweight]
    test [lrPP]Male = [lrPP]Female

    Any help? Thanks in advance!
    Last edited by Rocio Acebal; 07 Apr 2019, 14:46.

  • #2
    you can always do a regression; I'm not sure I completely understand the above, but the following should give you a t-test, via regression, to match your first ttest statement:
    Code:
    svy: regress lrPP i.gnder

    Comment


    • #3
      Rich already pointed to the regression framework.

      I am not an expert on survey data methodology, but as far as I understand it, svy is supposed to account for the sampling design; it might not necessarily apply to the kind of post-stratification weighting that you propose. Moreover, since you seem to include all variables that were used to construct the weights in your regression framework, I wonder whether you need weighting adjustment at all. These are honest questions and I am happy to learn something here.

      Edit:

      I was a bit quick there. The pdf manual entry for svy has a section on poststratification. I did not get into this deeper (because I do not currently have the time), but I would check the approach using simple i-weights again.

      Best
      Daniel
      Last edited by daniel klein; 08 Apr 2019, 03:48.

      Comment


      • #4
        Thank you very much for your answers! And I apologize if I was not sufficiently clear, this is my first post.

        I would like to know what the "i." in the code stands for, since I have only seen regressions like

        Code:
         
         regress lrPP gndr
        I do not know if there is a better approach regarding the weights, but i-weights was the best I could find. Please reach me if you come up with a better solution, I would appreciate it.

        Best,

        Rocío

        Comment


        • #5
          regarding the "i.", please see
          Code:
          help fvvarlist

          Comment


          • #6
            Thank you for such a quick reply. I have red fvvarlist, however, I do not seem to understand how defining gender as a factor variable changes the result. I have tried the following 4 codes, each with a different output (some showing significance and some not):

            Code:
            svy: regress lrPP i.gndr
            regress lrPP i.gndr [iw=gndrlrweight]
            svy: regress lrPP gndr
            regress lrPP gndr [iw=gndrlrweight]
            Code:
            . svy: regress lrPP i.gndr
            (running regress on estimation sample)
            
            Survey: Linear regression
            
            Number of strata   =         1                  Number of obs     =      5,102
            Number of PSUs     =     5,102                  Population size   =  5,114.961
                                                            Design df         =      5,101
                                                            F(   1,   5101)   =       5.07
                                                            Prob > F          =     0.0244
                                                            R-squared         =     0.0046
            
            ------------------------------------------------------------------------------
                         |             Linearized
                   lrPP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    gndr |
                 Female  |   .2048182   .0909694     2.25   0.024     .0264792    .3831572
                   _cons |   8.737209   .0340848   256.34   0.000     8.670388     8.80403
            ------------------------------------------------------------------------------
            
            
            
            . regress lrPP i.gndr [iw=gndrlrweight]
            
            Number of obs   =     5,114
             F(1, 5112)      =     23.83
            Prob > F        =    0.0000
            R-squared       =    0.0046
            Adj R-squared   =    0.0046
            
                  Source |       SS           df       MS                          
            -------------+----------------------------------                          
                   Model |   53.623727         1   53.623727   
                Residual |  11506.8285     5,112  2.25094454  
            -------------+----------------------------------  
                   Total |  11560.4522     5,113  2.26099203   Root MSE        =    1.5002
            
            ------------------------------------------------------------------------------
            
            
                   . svy: regress lrPP i.gndr
            (running regress on estimation sample)
            
            Survey: Linear regression
            
            Number of strata   =         1                  Number of obs     =      5,102
            Number of PSUs     =     5,102                  Population size   =  5,114.961
                                                            Design df         =      5,101
                                                            F(   1,   5101)   =       5.07
                                                            Prob > F          =     0.0244
                                                            R-squared         =     0.0046
            
            ------------------------------------------------------------------------------
                         |             Linearized
                   lrPP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    gndr |
                 Female  |   .2048182   .0909694     2.25   0.024     .0264792    .3831572
                   _cons |   8.737209   .0340848   256.34   0.000     8.670388     8.80403
            ------------------------------------------------------------------------------
            
            . regress lrPP i.gndr [iw=gndrlrweight]
            
            Number of obs   =     5,114
            F(1, 5112)      =     23.83
            Prob > F        =    0.0000
            R-squared       =    0.0046
            Adj R-squared   =    0.0046
            Root MSE        =    1.5002
            
            
                  Source |       SS           df       MS      
            -------------+----------------------------------  
                   Model |   53.623727         1   53.623727  
                Residual |  11506.8285     5,112  2.25094454  
            -------------+----------------------------------  
                   Total |  11560.4522     5,113  2.26099203  
            
            ------------------------------------------------------------------------------
                   lrPP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    gndr |
                 Female  |   .2048182   .0419596     4.88   0.000     .1225594     .287077
                   _cons |   8.737209   .0293817   297.37   0.000     8.679609     8.79481
            ------------------------------------------------------------------------------
            Sorry if I am asking too much.

            Best,

            Rocío

            Comment


            • #7
              1. if your indicator variable is coded 0/1 already then the regression results will not change (but you can use -margins-); if it is coded something else, then your results will change

              2. if there is a question above, other than that answered in my #1, I don't see it

              Comment


              • #8
                I just recoded it to 0/1, and the regression started omitting standard error. Is that normal?

                Code:
                svy: regress lrPP i.gndr
                (running regress on estimation sample)
                
                Survey: Linear regression
                
                Number of strata   =         1                  
                Number of PSUs     =     5,102                  
                                                                Number of obs     =      5,102
                                                                Number of obs     =      5,102
                                                                Population size   = 2,606.9289
                                                                Design df         =      5,101
                                                                F(   0,   5101)   =          .
                                                                Prob > F          =          .
                                                                R-squared         =     0.0000
                
                ------------------------------------------------------------------------------
                             |             Linearized
                       lrPP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                        gndr |
                       Male  |          0  (omitted)
                       _cons |   8.737209   .0340848   256.34   0.000     8.670388     8.80403
                ------------------------------------------------------------------------------

                Comment


                • #9
                  Originally posted by Rocio Acebal View Post
                  I just recoded it to 0/1, and the regression started omitting standard error. Is that normal?

                  Code:
                  svy: regress lrPP i.gndr
                  (running regress on estimation sample)
                  
                  Survey: Linear regression
                  
                  Number of strata = 1
                  Number of PSUs = 5,102
                  Number of obs = 5,102
                  Number of obs = 5,102
                  Population size = 2,606.9289
                  Design df = 5,101
                  F( 0, 5101) = .
                  Prob > F = .
                  R-squared = 0.0000
                  
                  ------------------------------------------------------------------------------
                  | Linearized
                  lrPP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  gndr |
                  Male | 0 (omitted)
                  _cons | 8.737209 .0340848 256.34 0.000 8.670388 8.80403
                  ------------------------------------------------------------------------------
                  That's not normal. One other issue is that your population size changes markedly - it's 5,115 in other regressions. Can you show the Stata syntax you used to recode the gender variable?

                  I am not that familiar with survey weighting, but going back to your earlier posts, I see that you used importance weights in svysetting the data. In general, I believe that you're supposed to use probability weights unless you were specifically told to use importance weights by the survey authors. Some info available here. I believe the site says that importance weighting may produce wrong results when used in survey weighted data.

                  Going back to your post #6, you said you have different standard errors (and thus, t-statistics and p-values) using different versions of the command. As I stated, I suspect that if you used iweight wrongly, this could be the issue. Observe the differences in the standard errors under these 3 specifications of the regress command using a stock Stata dataset (this is the example dataset for the svyset command):

                  Code:
                  use http://www.stata-press.com/data/r15/stage5a
                  svyset _n [pweight = pw]
                  svy: reg yreg i.x3
                  Survey: Linear regression
                  
                  Number of strata   =         1                  Number of obs     =     11,039
                  Number of PSUs     =    11,039                  Population size   = 529,810.54
                                                                  Design df         =     11,038
                                                                  F(   1,  11038)   =       2.21
                                                                  Prob > F          =     0.1373
                                                                  R-squared         =     0.0002
                  
                  ------------------------------------------------------------------------------
                               |             Linearized
                          yreg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                          1.x3 |   .0341322    .022968     1.49   0.137    -.0108893    .0791537
                         _cons |    2.90668   .0162928   178.40   0.000     2.874743    2.938617
                  ------------------------------------------------------------------------------
                  
                  reg yreg i.x3 [pweight = pw]
                  (sum of wgt is 529,810.53717804)
                  
                  Linear regression                               Number of obs     =     11,039
                                                                  F(1, 11037)       =       2.21
                                                                  Prob > F          =     0.1373
                                                                  R-squared         =     0.0002
                                                                  Root MSE          =     1.1558
                  
                  ------------------------------------------------------------------------------
                               |               Robust
                          yreg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                          1.x3 |   .0341322   .0229691     1.49   0.137    -.0108913    .0791557
                         _cons |    2.90668   .0162936   178.39   0.000     2.874742    2.938619
                  ------------------------------------------------------------------------------
                  
                  reg yreg i.x3 [iweight = pw]
                  
                        Source |       SS           df       MS      Number of obs   =   529,810
                  -------------+----------------------------------   F(1, 529808)    =    115.53
                         Model |  154.293761         1  154.293761   Prob > F        =    0.0000
                      Residual |  707583.842   529,808  1.33554767   R-squared       =    0.0002
                  -------------+----------------------------------   Adj R-squared   =    0.0002
                         Total |  707738.136   529,809  1.33583638   Root MSE        =    1.1557
                  
                  ------------------------------------------------------------------------------
                          yreg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                          1.x3 |   .0341322   .0031756    10.75   0.000     .0279082    .0403562
                         _cons |    2.90668   .0022563  1288.23   0.000     2.902258    2.911103
                  ------------------------------------------------------------------------------
                  I svyset using probability weights, as I believe is normally done. Comparing results with the svy prefix and from regress with pweights, the standard errors are very similar. When I used regress with importance weights, the SE is very much smaller. Hence, I suspect that coding issues aside, the use of importance weights could be an issue (but please ignore if the survey authors specifically asked you to do this).
                  Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

                  Code:
                  ssc install dataex

                  Comment


                  • #10
                    Sorry for the delay in the response. I was away from my computer.

                    I have changed to pweights and the issue has been solved.

                    The code I used for gender is the following.

                    Code:
                    encode genero, gen (gndr)
                    tab gndr, nol
                    recode gndr (1/5=.) (7/8=.) (10/99=.)
                    recode gndr (6=1) (9=0)
                    lab define etgndr 1"Male" 0"Female"
                    lab val gndr etgndr
                    tab gndr
                    drop if missing(gndr)
                    Thank you very much for your help!

                    Comment


                    • #11
                      Just to repeat a word of caution: I really would not be so sure that what you describe in #1 qualifies as probability weights (pweights). The latter are known in advance and come from the survey design; they are not created post-hoc to make the sample match some known distribution. So while Weiwen is correct in that svyset is usually used with pweights, and while the syntax might work, I am not sure that what you do is statistically sound. I am not saying it is not sound either. Unfortunately, I cannot tell for sure and I cannot tell what else you would. so.

                      Best
                      Daniel
                      Last edited by daniel klein; 14 Apr 2019, 09:54.

                      Comment


                      • #12
                        Also, I am under the impression that it is bad form to drop some respondents, as the code in #10 does. It would be better to use the svy, subpop(...) option.
                        Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                        Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

                        Code:
                        ssc install dataex

                        Comment

                        Working...
                        X