Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "chitesti" with trend

    Hi there

    I have a straightforward problem and would be grateful for advice.
    I want to know whether my number of cases of heart disease differs significantly by social class (ordered 1 to 5, where 1 is the most deprived and 5 is the least deprived).
    The null hypothesis is that there is no difference, so I would expect the distribution to be equal (20%) for each category.
    I am using Nick Cox's "chitesti" command (chi sq distribution for goodness of fit test) to tell me whether to reject the null hypothesis:

    Code:
    chitesti 296 205 218 165 123
    Pearson chi2(4) = 82.9652 Pr = 0.000
    likelihood-ratio chi2(4) = 82.6660 Pr = 0.000

    But it also looks like there is a trend in my data (for example, cat.1 contains 296 cases, while cat.5 contains only 123 cases), so I would like to know whether there is a significant trend to this distribution.

    Is there an option in the chitesti command or should I use an alternative command?

    With thanks

  • #2
    There is a nonparametric test for trend. An official command
    Code:
    help nptrend
    It's not an immediate command, and the data setup is long
    Code:
    input byte class int count
    1 296
    2 205
    3 218
    4 165
    5 123
    end
    
    nptrend count, by(class)
    
    exit
    Also, try
    Code:
    search trend nonparametric
    for some alternatives.

    Comment


    • #3
      I confirm what the help implies: chitesti (tab_chi, SSC, as you are asked to explain) has no specific option for testing trend.

      A deeper problem for you is disentangling unequal group frequencies that arise in any case from a trend. Unless social classes occur with equal frequency -- which I don't believe unless your sampling design ensures it for your data -- I can't see that your chi-square test is valid at all. You need to work with a two-way table.

      Comment


      • #4
        Thanks, Joseph, for your suggestion and, Nick, for your good point.
        I should have explained that social class 1 to 5 is actually a division into quintiles of population-based deprivation rank. By definition, therefore, deprivation levels 1 to 5 occur with equal frequency in the general population. I hope that addresses your point, Nick.
        Joseph, I ran your suggestion and the p-value was 0.072. Eyeballing the data makes me wonder whether this can be correct - would you disagree? It looks like there is a strong trend, but that may just be my own cognitive bias!

        Comment


        • #5
          Yes; an explanation that you are using quintile bins does address my point. (I just worry about loss of information!)

          Comment


          • #6
            Thanks, Nick.
            My colleague suggests using "ptrend", but I fear this would mean treating each proportion as coming from five separate samples of 1007 when, in fact, they all come from the same single sample.

            Comment


            • #7
              Why not just use deprivation rank as a predictor?

              Comment


              • #8
                Good question! Unfortunately my dataset only contains quintile of deprivation rank (I think rank itself may be too sensitive).

                Comment


                • #9
                  Someone degraded your data in advance then!

                  Comment


                  • #10
                    unfortunately so

                    Comment


                    • #11
                      For a two-way table (as suggested by Nick in #3), you could compute a 1-df ordinal Chi-square as follows:

                      Code:
                      * Use David Howell's example from this page:
                      * https://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html
                      
                      clear
                      input r c n
                      1 1 25    
                      1 2 13    
                      1 3  9    
                      1 4 10
                      1 5  6
                      2 1 31
                      2 2 21
                      2 3  6
                      2 4  2
                      2 5  3
                      end
                      
                      tabulate r c [fweight = n], chi2
                      local dfPearson = (r(r)-1)*(r(c)-1)
                      local Pearson = r(chi2)
                      correlate r c [fweight = n]
                      local Linear = (r(N)-1)*r(rho)^2
                      local p1 = chi2tail(`dfPearson',`Pearson')
                      local p2 = chi2tail(1,`Linear')
                      local p3 = chi2tail(`dfPearson'-1,`Pearson'-`Linear')
                      
                      display "              Pearson = " `Pearson' "  p = " `p1'
                      display "     Linear-by-linear = " `Linear' "  p = " `p2'
                      display "Deviation from linear = " `Pearson'-`Linear' "  p = " `p3'
                      HTH.
                      --
                      Bruce Weaver
                      Email: [email protected]
                      Version: Stata/MP 19.5 (Windows)

                      Comment


                      • #12
                        Thanks, Bruce.
                        2-way tables are more familiar territory for me but, in this case, I'm not sure what I'd be plotting deprivation level against. I only have a single sample of 1007 cases, distributed as follows:

                        Code:
                        clear
                        input r n
                        1 296
                        2 205
                        3 218
                        4 165
                        5 123
                        end

                        Comment


                        • #13
                          Hi Raph. Posts 4-10 appeared after I started composing my reply, so I had not yet seen that you have only quintile of deprivation rank. What I posted (#11) won't help you in that case.
                          --
                          Bruce Weaver
                          Email: [email protected]
                          Version: Stata/MP 19.5 (Windows)

                          Comment


                          • #14
                            I have run the ptrend command on this data (installing with "SSC install ptrend"), with my data set up as follows:

                            Code:
                            clear
                            input depriv yes no
                            1 296 711
                            2 205 802
                            3 218 789
                            4 165 842
                            5 123 884
                            end
                            ptrend command as follows:

                            Code:
                            ptrend yes no depriv
                            Which gives the following results:

                            Trend analysis for proportions
                            ------------------------------
                            Regression of p = yes/(yes+no) on depriv:

                            Slope = -.03833, std. error = .00399, Z = 9.616

                            Overall chi2(4) = 103.707, pr>chi2 = 0.0000
                            Chi2(1) for trend = 92.475, pr>chi2 = 0.0000
                            Chi2(3) for departure = 11.231, pr>chi2 = 0.0105


                            As I mentioned before, this means treating each proportion as coming from five separate samples of 1007 when, in fact, they all come from the same single sample.

                            It also gives a very different result from Joseph's suggestion above, which gave p-value of 0.072.

                            Does anyone have a view on using ptrend for testing distribution of categories in a single sample?

                            With thanks

                            Comment


                            • #15
                              Originally posted by raph crompton View Post
                              As I mentioned before, this means treating each proportion as coming from five separate samples of 1007 when, in fact, they all come from the same single sample.

                              Does anyone have a view on using ptrend for testing distribution of categories in a single sample?
                              I think that they need to be independent proportions, and yours aren't.

                              Originally posted by raph crompton View Post
                              It also gives a very different result from Joseph's suggestion above, which gave p-value of 0.072.
                              Nonparametric tests tend to have somewhat lower power than parametric tests. You can consider using the conventional Jonckheere-Terpstra test for trend with jonter (a user-written command from among those popping up from search trend nonparametric), which I believe comes in at a "statistically significant" p-value (albeit asymptotic) with your dataset, and so might be a bit more powerful than the official nptrend. Otherwise, if you need additional power, then you could consider using some kind of parametric model, such as
                              Code:
                              regress count c.class
                              graph twoway lfitci count class, level(50) || ///
                                  scatter count class, mcolor(red) msize(small) ///
                                  ylabel( , angle(horizontal) nogrid) legend(off)
                              If the distribution of the count residuals is of concern, then perhaps you can consider a permutation test
                              Code:
                              set seed 1396525
                              permute class t = (_b[class] / _se[class]), reps(1000) nodots: regress count c.class
                              which I believe comes in a bit more powerful than jonter. (With just the five observations, you should be able to enumerate the permutations of class, and get at it that way, too.)

                              Comment

                              Working...
                              X