Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • svy glm and robust standard errors

    I am using Stata SE 12.1 with cross-sectional complex survey design data that I have svyset. I am trying to obtain prevalence ratios using glm with family(poisson) and link(log) . Based on this page (http://www.ats.ucla.edu/stat/stata/f...ative_risk.htm) I think that I should use vce(robust) to obtain the correct standard errors. When I use vce(robust) , however, I receive the error: option vce() of glm is not allowed with the svy prefix
    Is there a different way to obtain the correct standard errors? Do I have to change how I account for the complex survey design?

  • #2
    The svy preface gives you the equivalent of robust standard errors. If you try to specify both, In effect you are asking for the same thing twice. The svy results will differ from what you get just using the vce(robust) option if your svy design contains strata and/or finite sample corrections.
    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

    Comment


    • #3
      To account for the survey design, supply a svyset statement.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thank you. I hadn't realized I was asking for the same thing twice! I will use the command with the svy but without the vce(robust) since the svy design does contain strata. I appreciate the help!
        Annie

        Comment


        • #5
          Hi, I have a related question.

          I would like to test the difference between two group means. I understand this can be done via a weighted regression for unequal variance. I tried three methods:

          A. Specify svyset using pweights and
          Code:
          svy: regress y x
          This would give the same p-value as
          Code:
          svy: mean y, over(x) coeflegend
          test _b[[email protected]]= _b[[email protected]]
          The p-value here is based on the Adjusted Wald test using F-statistic.

          B. Does not specify svyset and
          Code:
          regress y x [pweight = pweight], vce(robust)
          C. Using t-test
          Code:
          ttesti weighted_N1 weighted_mean1 weighted_SD1 weighted_N2 weighted_mean2 weighted_SD2, unequal
          Specifically, my codes are:
          Code:
          ttesti 263 3.092498 .39497688 117 3.014948 .34671047, unequal
          where i obtained the weighted N, mean and sd from
          Code:
          svy: mean y, over(x) coeflegend
          estat size
          estat sd, srssubpop
          I also obtained the same values for weighted SD if i computed the weighted SD from scratch using the formula from https://www.statology.org/weighted-s...viation-excel/.


          The p-values that i got from the three methods are slightly different: 0.0449, 0.0452 and 0.0552 respectively.

          I understand from various forum replies that the svy module already handles the robust part, hence the robust option is not available in the svy module. But I am curious to know:

          Q1. Why are the p-values from methods A and B different since both methods use the F-statistic? Which method gives a more accurate result? I'm asking because I'm wondering what to conclude if one p-value is slightly below 0.05 and the other p-value is slightly above 0.05.

          Q2. Why is the p-value from method C (t-test for unequal variance) different from the p-values in methods A and B? I understand that the F-distribution and t-distribution are equivalent, hence i was expecting the p-values to be more similar. For method C, i also did a t-test in Python using the
          Code:
          ttest_ind(gp_a, gp_b, usevar='unequal', weights=(weightsa, weightsb))
          function from the statsmodels.stats.weightstats library, and got p-value of 0.0554, which is very near to the t-test in Stata.

          Q3. Which method would be more appropriate? Do I conclude the means are different or same, given the different conclusions from t-test (method C) and F-test (methods A and B)?
          Last edited by Liting Cai; 02 Jun 2022, 20:13.

          Comment

          Working...
          X