Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • svy: difference of proportions test vs. Excel

    Greetings,

    I'm using Stata to conduct a difference of proportions test from survey data with a complex sample design. For this kind of task, I have some colleagues who prefer to use their preferred software package (Stata, SAS, R, etc.) to calculate group estimates and SEs, but then they conduct the statistical test in Excel by plugging in the group estimates and SEs into an Excel formula that uses the typical equation for a t-test. Wanting to conform to their process, I tried it out but noticed Stata and Excel produce different pooled estimates and therefore a different test statistic. I'm wondering if someone can explain the reason for the difference: what is Stata doing to calculate the test statistic? How would one adjust the t-test equation to replicate Stata's result "by hand" (i.e., using an Excel formula)?

    Code:
    /* example dataset (borrowed from another example on Statalist: https://www.statalist.org/forums/forum/general-stata-discussion/general/432450-survey-adjustment-for-t-tests-and-difference-in-proportions) */
    sysuse auto, clear
    gen mkr = substr(make,1,2) /*artificial PSU */
    svyset mkr [pw = turn]
    gen hiprice = price>8000  /* outcome */
    
    /* test difference of proportions low-price domenstic vs. high-price domestic */
    svy: tab hiprice foreign, row se
    matrix list e(b)
    lincom _b[p11]-_b[p21]
    
    /* t-test formula to mimic Excel style formula, using estimates and SEs from 
    above tabulation */
    di (.7467 - .6855)/sqrt( (.1007)^2 + (.154)^2 )
    
    /* pooled standard error using above equation */
    di sqrt( (.1007)^2 + (.154)^2 )

  • #2
    The formula that Excel is using is appropriate only for independent estimates. But if you -matrix list e(V)- you will see that in fact the estimates are correlated with each other. So the correct way to calculate a pooled standard error for the difference would be

    [/code]
    matrix V = e(V)
    display sqrt(V["p11", "p11"] + V["p21", "p21"] - 2*V["p11", "p21"])
    [/code]
    which does, indeed, reproduce the standard error that Stata used in -lincom-.

    Comment


    • #3
      Thank you much, this is very helpful.

      Comment


      • #4
        Hello, I do have a follow up question--can you point me to an online resource that discusses how to calculate the pooled standard error in the way you've done here? I was hoping to read more about it but a search using keywords like "correlated" or "dependent" mostly turns up results for paired t-tests, which seems like something different from what you're doing here and doesn't give special guidance about how to subtract the correlated variances.

        Comment


        • #5
          Try https://stats.libretexts.org/Bookshe...ated_Variables

          It is not exactly on point, as it deals with the pooled standard deviation for a pair of correlated variables. But the principle is the same. In your case you are dealing with standard errors. But those are just the estimated standard deviations of the sampling distributions of two coefficients. And their variances and covariance are given in e(V).

          Comment


          • #6
            Once again, very helpful. Thanks for your response.

            Comment

            Working...
            X