Pearson chi2 test of independence

Kamola Babamuradova

Join Date: Jan 2018
Posts: 20

Pearson chi2 test of independence

17 Apr 2018, 04:34

Dear all,

I'm facing some difficulties with my analysis of survey data.

I'm using DHS data and analyzing the association between stunting and regions of the country. I set the data for "svy" and the result is below:

Code:

svy: tab region stunted, row
(running tabulate on estimation sample)

Number of strata   =         9                  Number of obs     =      4,523
Number of PSUs     =       354                  Population size   = 4,713.6234
                                                Design df         =        345

----------------------------------------
Region of |           Stunted           
residence | not-stun   stunted     Total
----------+-----------------------------
 Dushanbe |    .8122     .1878         1
     GBAO |    .7579     .2421         1
    SUGHD |    .7328     .2672         1
      DRS |    .7378     .2622         1
  KHATLON |    .7285     .2715         1
          | 
    Total |    .7391     .2609         1
----------------------------------------
  Key:  row proportion

  Pearson:
    Uncorrected   chi2(4)         =   11.1518
    Design-based  F(2.89, 998.03) =    2.0425     P = 0.1087

So, according to the results there is no significance.

I was asked to perform ANOVA to test relationship among the region, however I've read that ANOVA is not suitable for categorical variables.
So I tried Pearson chi2 test of independence using the following code:

Code:

. tab region stunted, row chi2

+----------------+
| Key            |
|----------------|
|   frequency    |
| row percentage |
+----------------+

 Region of |        Stunted
 residence | not-stunt    stunted |     Total
-----------+----------------------+----------
  Dushanbe |       580        131 |       711 
           |     81.58      18.42 |    100.00 
-----------+----------------------+----------
      GBAO |       298         96 |       394 
           |     75.63      24.37 |    100.00 
-----------+----------------------+----------
     SUGHD |       667        242 |       909 
           |     73.38      26.62 |    100.00 
-----------+----------------------+----------
       DRS |       931        330 |     1,261 
           |     73.83      26.17 |    100.00 
-----------+----------------------+----------
   KHATLON |       918        330 |     1,248 
           |     73.56      26.44 |    100.00 
-----------+----------------------+----------
     Total |     3,394      1,129 |     4,523 
           |     75.04      24.96 |    100.00 

          Pearson chi2(4) =  20.0773   Pr = 0.000

My question is how to assign weights for Pearson chi2 test? And which test is better?
In my understanding, in svy the chi2 statistic is converted to an F statistic. Does this mean that the test of independence was performed?
Then, why the separate tab command with chi2 produces significant results?

Can ANOVA be performed for such data?

I'm not good at statistics so I apologize for such nonconstructive question.

Thanks beforehand

Tags: categorical, chi2, Suggestion, svy, svy:tabulate

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

17 Apr 2018, 05:10

With regards to the command - svy:tab -, there is this information in the Stata Manual:

pearson requests that the Pearson chi-squared statistic be computed. By
default, this is the test of independence that is displayed. The
Pearson chi-squared statistic is corrected for the survey design with
the second-order correction of Rao and Scott (1984) and is converted
into an F statistic.

This is to say that I think the "survey-corrected" chi2 is, theoretically speaking (caveat: model misspecification), the best approach.

With regards to ANOVA, I gather it should not be used in this scenario.

To end, the regular chi2 test may easily provide significant p-values when the sample size is big.

Hopefully that helped.

Last edited by Marcos Almeida; 17 Apr 2018, 05:12.

Best regards,

Marcos
1 like
Comment
Kamola Babamuradova

Join Date: Jan 2018

Posts: 20
#3

17 Apr 2018, 23:13

Originally posted by Marcos Almeida View Post

With regards to the command - svy:tab -, there is this information in the Stata Manual:

This is to say that I think the "survey-corrected" chi2 is, theoretically speaking (caveat: model misspecification), the best approach.

With regards to ANOVA, I gather it should not be used in this scenario.

To end, the regular chi2 test may easily provide significant p-values when the sample size is big.

Hopefully that helped.

Dear Marcos, Thank you for your reply! It was helpful.
One more question. So, you say that svy-corrected chi2 is the best approach. Hence, should I trust the result of svy? Or can I perform separate Pearson chi2 test and use the result?
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

20 Apr 2018, 02:56

I gather the questions in #3 are basically the same shown in #1 and were already replied in #2.

That said, for the sake of clarifying, here it goes: Yes. Yes, provided the model is correctly specified. No.

Best regards,

Marcos
Comment

Announcement

Pearson chi2 test of independence

Comment

Comment

Comment