Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pearson chi2 test of independence

    Dear all,

    I'm facing some difficulties with my analysis of survey data.

    I'm using DHS data and analyzing the association between stunting and regions of the country. I set the data for "svy" and the result is below:

    Code:
    svy: tab region stunted, row
    (running tabulate on estimation sample)
    
    Number of strata   =         9                  Number of obs     =      4,523
    Number of PSUs     =       354                  Population size   = 4,713.6234
                                                    Design df         =        345
    
    ----------------------------------------
    Region of |           Stunted           
    residence | not-stun   stunted     Total
    ----------+-----------------------------
     Dushanbe |    .8122     .1878         1
         GBAO |    .7579     .2421         1
        SUGHD |    .7328     .2672         1
          DRS |    .7378     .2622         1
      KHATLON |    .7285     .2715         1
              | 
        Total |    .7391     .2609         1
    ----------------------------------------
      Key:  row proportion
    
      Pearson:
        Uncorrected   chi2(4)         =   11.1518
        Design-based  F(2.89, 998.03) =    2.0425     P = 0.1087
    So, according to the results there is no significance.

    I was asked to perform ANOVA to test relationship among the region, however I've read that ANOVA is not suitable for categorical variables.
    So I tried Pearson chi2 test of independence using the following code:

    Code:
    . tab region stunted, row chi2
    
    +----------------+
    | Key            |
    |----------------|
    |   frequency    |
    | row percentage |
    +----------------+
    
     Region of |        Stunted
     residence | not-stunt    stunted |     Total
    -----------+----------------------+----------
      Dushanbe |       580        131 |       711 
               |     81.58      18.42 |    100.00 
    -----------+----------------------+----------
          GBAO |       298         96 |       394 
               |     75.63      24.37 |    100.00 
    -----------+----------------------+----------
         SUGHD |       667        242 |       909 
               |     73.38      26.62 |    100.00 
    -----------+----------------------+----------
           DRS |       931        330 |     1,261 
               |     73.83      26.17 |    100.00 
    -----------+----------------------+----------
       KHATLON |       918        330 |     1,248 
               |     73.56      26.44 |    100.00 
    -----------+----------------------+----------
         Total |     3,394      1,129 |     4,523 
               |     75.04      24.96 |    100.00 
    
              Pearson chi2(4) =  20.0773   Pr = 0.000
    My question is how to assign weights for Pearson chi2 test? And which test is better?
    In my understanding, in svy the chi2 statistic is converted to an F statistic. Does this mean that the test of independence was performed?
    Then, why the separate tab command with chi2 produces significant results?

    Can ANOVA be performed for such data?

    I'm not good at statistics so I apologize for such nonconstructive question.

    Thanks beforehand

  • #2
    With regards to the command - svy:tab -, there is this information in the Stata Manual:

    pearson requests that the Pearson chi-squared statistic be computed. By
    default, this is the test of independence that is displayed. The
    Pearson chi-squared statistic is corrected for the survey design with
    the second-order correction of Rao and Scott (1984) and is converted
    into an F statistic.
    This is to say that I think the "survey-corrected" chi2 is, theoretically speaking (caveat: model misspecification), the best approach.

    With regards to ANOVA, I gather it should not be used in this scenario.

    To end, the regular chi2 test may easily provide significant p-values when the sample size is big.

    Hopefully that helped.
    Last edited by Marcos Almeida; 17 Apr 2018, 05:12.
    Best regards,

    Marcos

    Comment


    • #3
      Originally posted by Marcos Almeida View Post
      With regards to the command - svy:tab -, there is this information in the Stata Manual:



      This is to say that I think the "survey-corrected" chi2 is, theoretically speaking (caveat: model misspecification), the best approach.

      With regards to ANOVA, I gather it should not be used in this scenario.

      To end, the regular chi2 test may easily provide significant p-values when the sample size is big.

      Hopefully that helped.
      Dear Marcos, Thank you for your reply! It was helpful.
      One more question. So, you say that svy-corrected chi2 is the best approach. Hence, should I trust the result of svy? Or can I perform separate Pearson chi2 test and use the result?

      Comment


      • #4
        I gather the questions in #3 are basically the same shown in #1 and were already replied in #2.

        That said, for the sake of clarifying, here it goes: Yes. Yes, provided the model is correctly specified. No.
        Best regards,

        Marcos

        Comment

        Working...
        X