"chitesti" with trend

raph crompton

Join Date: Jun 2017

Posts: 7
#1

"chitesti" with trend

06 Jun 2017, 07:58

Hi there

I have a straightforward problem and would be grateful for advice.
I want to know whether my number of cases of heart disease differs significantly by social class (ordered 1 to 5, where 1 is the most deprived and 5 is the least deprived).
The null hypothesis is that there is no difference, so I would expect the distribution to be equal (20%) for each category.
I am using Nick Cox's "chitesti" command (chi sq distribution for goodness of fit test) to tell me whether to reject the null hypothesis:

Code:

chitesti 296 205 218 165 123

Pearson chi2(4) = 82.9652 Pr = 0.000
likelihood-ratio chi2(4) = 82.6660 Pr = 0.000

But it also looks like there is a trend in my data (for example, cat.1 contains 296 cases, while cat.5 contains only 123 cases), so I would like to know whether there is a significant trend to this distribution.

Is there an option in the chitesti command or should I use an alternative command?

With thanks
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4433
#2

06 Jun 2017, 08:09

There is a nonparametric test for trend. An official command

Code:

help nptrend

It's not an immediate command, and the data setup is long

Code:

input byte class int count 1 296 2 205 3 218 4 165 5 123 end nptrend count, by(class) exit

Also, try

Code:

search trend nonparametric

for some alternatives.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#3

06 Jun 2017, 08:14

I confirm what the help implies: chitesti (tab_chi, SSC, as you are asked to explain) has no specific option for testing trend.

A deeper problem for you is disentangling unequal group frequencies that arise in any case from a trend. Unless social classes occur with equal frequency -- which I don't believe unless your sampling design ensures it for your data -- I can't see that your chi-square test is valid at all. You need to work with a two-way table.
Comment
raph crompton

Join Date: Jun 2017

Posts: 7
#4

06 Jun 2017, 08:27

Thanks, Joseph, for your suggestion and, Nick, for your good point.
I should have explained that social class 1 to 5 is actually a division into quintiles of population-based deprivation rank. By definition, therefore, deprivation levels 1 to 5 occur with equal frequency in the general population. I hope that addresses your point, Nick.
Joseph, I ran your suggestion and the p-value was 0.072. Eyeballing the data makes me wonder whether this can be correct - would you disagree? It looks like there is a strong trend, but that may just be my own cognitive bias!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#5

06 Jun 2017, 08:31

Yes; an explanation that you are using quintile bins does address my point. (I just worry about loss of information!)
Comment
raph crompton

Join Date: Jun 2017

Posts: 7
#6

06 Jun 2017, 08:36

Thanks, Nick.
My colleague suggests using "ptrend", but I fear this would mean treating each proportion as coming from five separate samples of 1007 when, in fact, they all come from the same single sample.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#7

06 Jun 2017, 08:42

Why not just use deprivation rank as a predictor?
Comment
raph crompton

Join Date: Jun 2017

Posts: 7
#8

06 Jun 2017, 08:44

Good question! Unfortunately my dataset only contains quintile of deprivation rank (I think rank itself may be too sensitive).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#9

06 Jun 2017, 08:48

Someone degraded your data in advance then!
Comment
raph crompton

Join Date: Jun 2017

Posts: 7
#10

06 Jun 2017, 08:51

unfortunately so
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1139

#11

06 Jun 2017, 08:56

For a two-way table (as suggested by Nick in #3), you could compute a 1-df ordinal Chi-square as follows:

Code:

* Use David Howell's example from this page:
* https://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html

clear
input r c n
1 1 25    
1 2 13    
1 3  9    
1 4 10
1 5  6
2 1 31
2 2 21
2 3  6
2 4  2
2 5  3
end

tabulate r c [fweight = n], chi2
local dfPearson = (r(r)-1)*(r(c)-1)
local Pearson = r(chi2)
correlate r c [fweight = n]
local Linear = (r(N)-1)*r(rho)^2
local p1 = chi2tail(`dfPearson',`Pearson')
local p2 = chi2tail(1,`Linear')
local p3 = chi2tail(`dfPearson'-1,`Pearson'-`Linear')

display "              Pearson = " `Pearson' "  p = " `p1'
display "     Linear-by-linear = " `Linear' "  p = " `p2'
display "Deviation from linear = " `Pearson'-`Linear' "  p = " `p3'

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

raph crompton

Join Date: Jun 2017

Posts: 7
#12

06 Jun 2017, 09:09

Thanks, Bruce.
2-way tables are more familiar territory for me but, in this case, I'm not sure what I'd be plotting deprivation level against. I only have a single sample of 1007 cases, distributed as follows:

Code:

clear input r n 1 296 2 205 3 218 4 165 5 123 end
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1139
#13

06 Jun 2017, 09:31

Hi Raph. Posts 4-10 appeared after I started composing my reply, so I had not yet seen that you have only quintile of deprivation rank. What I posted (#11) won't help you in that case.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
raph crompton

Join Date: Jun 2017

Posts: 7
#14

06 Jun 2017, 10:38

I have run the ptrend command on this data (installing with "SSC install ptrend"), with my data set up as follows:

Code:

clear input depriv yes no 1 296 711 2 205 802 3 218 789 4 165 842 5 123 884 end

ptrend command as follows:

Code:

ptrend yes no depriv

Which gives the following results:

Trend analysis for proportions
------------------------------
Regression of p = yes/(yes+no) on depriv:

Slope = -.03833, std. error = .00399, Z = 9.616

Overall chi2(4) = 103.707, pr>chi2 = 0.0000
Chi2(1) for trend = 92.475, pr>chi2 = 0.0000
Chi2(3) for departure = 11.231, pr>chi2 = 0.0105

As I mentioned before, this means treating each proportion as coming from five separate samples of 1007 when, in fact, they all come from the same single sample.

It also gives a very different result from Joseph's suggestion above, which gave p-value of 0.072.

Does anyone have a view on using ptrend for testing distribution of categories in a single sample?

With thanks
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4433
#15

06 Jun 2017, 22:48

Originally posted by raph crompton View Post

As I mentioned before, this means treating each proportion as coming from five separate samples of 1007 when, in fact, they all come from the same single sample.

Does anyone have a view on using ptrend for testing distribution of categories in a single sample?

I think that they need to be independent proportions, and yours aren't.

Originally posted by raph crompton View Post

It also gives a very different result from Joseph's suggestion above, which gave p-value of 0.072.

Nonparametric tests tend to have somewhat lower power than parametric tests. You can consider using the conventional Jonckheere-Terpstra test for trend with jonter (a user-written command from among those popping up from search trend nonparametric), which I believe comes in at a "statistically significant" p-value (albeit asymptotic) with your dataset, and so might be a bit more powerful than the official nptrend. Otherwise, if you need additional power, then you could consider using some kind of parametric model, such as

Code:

regress count c.class graph twoway lfitci count class, level(50) || /// scatter count class, mcolor(red) msize(small) /// ylabel( , angle(horizontal) nogrid) legend(off)

If the distribution of the count residuals is of concern, then perhaps you can consider a permutation test

Code:

set seed 1396525 permute class t = (_b[class] / _se[class]), reps(1000) nodots: regress count c.class

which I believe comes in a bit more powerful than jonter. (With just the five observations, you should be able to enumerate the permutations of class, and get at it that way, too.)
Comment

Announcement

"chitesti" with trend

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment