Trend Test for Slope Coefficients of a Set of Dummy Variables

Jun Xu

Join Date: Sep 2014

Posts: 33
#1

Trend Test for Slope Coefficients of a Set of Dummy Variables

11 Jan 2018, 08:36

Dear Statalisters,

Here I have a statistical question, related to I believe trend test, for a substantive problems. In public health literature, the SES-health gradient is quite well-konwn; that is as one's SES level increases, so does his/her health status. So suppose here I have an income variable with, say four levels, corresponding to first, second, third, and fourth quartiles, and I use the fourth quartile as the reference category with Q1, Q2, Q3 as the dummy indicators for the first three levels. To simplify, I will just have a bivariate linear regression model with only dummy variables on the right hand side, where y* is this latent variable with higher values denoting poorer health status

y* = b0+ b1Q1 + b2Q2 + b3Q3 + e

One way to see if the gradient holds is to test if 1) b1, b2, and b3 are all positive and significantly different from zero; 2) b1 is greater than b2 and b3; and 3) b2 is greater than b3 all simultaneously. Here I can use different options, for example, an awkward one would be the Bonferroni correction.

I am wondering if there is any trend test I can use here to show that the coefficients, from b1 to b3, reduce their magnitude in a roughly linear trend, and how to do that in Stata. Thanks a lot!

Jun Xu, PhD
Professor and Graduate Director
Department of Sociology
Ball State University
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

11 Jan 2018, 09:28

This is very confusing. You stay that y* is a latent variable. But the equation you show cannot be identified without some observables that indicate y*. Can you clarify?
Comment
Jun Xu

Join Date: Sep 2014

Posts: 33
#3

11 Jan 2018, 10:20

Sorry for the confusion. y* could be a latent variable in a binary regression or ordinal regression model, given that the probability distribution is known, such as normal or ordinal. For example self-rated health, either good or poor (binary) or ordinal (excellent 1, very good 2, good 3, fair 4, poor 5). Or I can replace y* with an observed y, and then we have a linear regression. Hope this clarifies.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

11 Jan 2018, 10:47

OK. Well, I would take a slightly different approach and not base it on a regression. There is a command:

Code:

nptrend outcome, by(income_quartile)

which will work well whether the outcome variable you are examining for trend is continuous or discrete, and where income_quartile is a single variable coded 1 through 4.
1 like
Comment
Jun Xu

Join Date: Sep 2014

Posts: 33
#5

11 Jan 2018, 11:02

Clyde,

Thanks a lot! I wasn't very clear in the first post. The dependent variable is a binary variable with one denoting having poor health and zero good health. My goal is to see if there is any linear trend in the set of dummy variables for income. I think I should be able to nptrend for the example that I described above, but in my real analysis, I have a set of covariates (other independent variables). So in the latter case，how can I test if there is linear trend in the set of coefficients for income? Thanks.

Jun
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#6

11 Jan 2018, 11:18

Check out https://www.stata.com/support/faqs/s...est-for-trend/, particularly the approaches offered towards the end.
Comment
Jun Xu

Join Date: Sep 2014

Posts: 33
#7

11 Jan 2018, 11:44

Thanks a lot!
Comment
Jun Xu

Join Date: Sep 2014

Posts: 33
#8

12 Jan 2018, 13:12

Clyde or anyone else,

I read through the Stata post on trend test. I am wondering if I could just recode the income variable into an ordinal income variable (call it incmLin), such that incmLIN is valued one if one's income is within the first quartile, 2 if in second quartile, 3 in the third, and 4 in the fourth. Then I simply run a binary logit regression of self-rated health on incmLin along with other controls. If the coefficient for this variable is negative (poor health is coded 1 and good zero) and significant, then we say there is a linear trend between income and log odds of health? Any pointer or help would be greatly appreciated.

Jun
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#9

12 Jan 2018, 14:11

Yes and no. The problem with this approach is that the coding as 1, 2, 3, 4 is arbitrary and it implicitly models the difference between, say, the fourth and 2nd quartiles as being the same as the difference between the 1st and 3rd. If that's true, then great. But if not, you could end up failing to find a relationship that would show up if you coded them as, say, 1, 2, 4, 8 or some other monotone transformation of 1, 2, 3, 4. You are converting ordinal information into interval information arbitrarily, and the results will be sensitive to the particular way you do it.

Why don't you just model your outcome against the income variable itself instead of the quartiles? Taking a continuous variable and breaking it up into groups is occasionally useful for descriptive purposes, but when used for analysis creates problems such as this, and also discard information. Let's suppose the 75th percentile income is 1000 currency units. Using quartiles then says that a person whose income is 1000 units is the same as a person whose income is 100000 units, but is radically different from one whose income is 999 units. Grouping continuous variables discard information and introduces distortions. I think you should just not use this approach if the income variable itself is available to you.

That said, if you insist on proceeding with the quartiles, what about an approach where you carry out your regression and then follow it with:

Code:

contrast a.quartile

That will give you regression adjusted contrasts of the outcome in each quartile with the next quartile up, as well as a Joint test of all three of those comparisons.
1 like
Comment
Jun Xu

Join Date: Sep 2014

Posts: 33
#10

13 Jan 2018, 09:06

Clyde,

Thank you very much for your help. I can understand the common concerns about my approach, but it again goes back to my original problem, whether there is a (linear) trend in the coefficients for Q1, Q2, and Q3 in the equation:

Logit(y=1; poor health) = b0+ b1Q1 + b2Q2 + b3Q3 + other variables, where Q1-Q3 are dummy indicators for income variable. Here I have to add that in this case I probably have to use quartiles because I am using an international database, a quartile measure might be more feasible than some continuous measure of income adjusted by currency exchange rate. My intention is to test the well-known health gradient for income. One strategy is to conduct simultaneously hypothesis testing for the following: 1) b1> b2; 2) b1 > b3; 3) b2>b3. Or I can simply test if there is a linear trend in the coefficients.

I also saw in the post that you referred me and I found by myself that in the example, the outcome variable a is also coded as 1 (good), 2 (better), and 3 (best), William Sribney even suggested using a regression model. This made me think about recoding the income variable from binary indicators to an ordinal one, and then test if there is a trend there for the ordinal income groups with regard to health. For nptrend and ptrend, my understanding is that they are used for bivariate cases.

This actually surprises that I don't see a lot of literature on this topic online after extensive google search. Thanks a lot!

Jun
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#11

13 Jan 2018, 18:57

Originally posted by Jun Xu View Post

Or I can simply test if there is a linear trend in the coefficients.

Yes; this is the way. It is trivial in Stata, especially when you use a factor variables to indicate quartiles of income and not separate manually generated indicator variables. See below. (Start at the "Begin here" comment. The stuff at the top is just to create a artificial toy dataset for illustration.)

.ÿversionÿ15.1

.ÿ
.ÿclearÿ*

.ÿsetÿseedÿ`=strreverse("1425307")'

.ÿ
.ÿquietlyÿdrawnormÿlatent_healthÿincome,ÿdoubleÿcorr(1ÿ0.25ÿ\ÿ0.25ÿ1)ÿn(500)

.ÿ
.ÿgenerateÿbyteÿmanifest_healthÿ=ÿlatent_healthÿ<ÿ0ÿ//ÿIndicatesÿpoorÿhealth

.ÿ
.ÿegenÿbyteÿincome_quartileÿ=ÿcut(income),ÿgroup(4)ÿ//ÿIncomeÿatÿleastÿithÿquartile

.ÿquietlyÿreplaceÿincome_quartileÿ=ÿincome_quartileÿ+ÿ1

.ÿ
.ÿtabulateÿmanifest_healthÿincome_quartile

manifest_hÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿincome_quartile
ÿÿÿÿÿealthÿ|ÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿ4ÿ|ÿÿÿÿÿTotal
-----------+--------------------------------------------+----------
ÿÿÿÿÿÿÿÿÿ0ÿ|ÿÿÿÿÿÿÿÿ54ÿÿÿÿÿÿÿÿÿ66ÿÿÿÿÿÿÿÿÿ70ÿÿÿÿÿÿÿÿÿ85ÿ|ÿÿÿÿÿÿÿ275ÿ
ÿÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿÿÿÿÿÿ71ÿÿÿÿÿÿÿÿÿ59ÿÿÿÿÿÿÿÿÿ55ÿÿÿÿÿÿÿÿÿ40ÿ|ÿÿÿÿÿÿÿ225ÿ
-----------+--------------------------------------------+----------
ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿ125ÿÿÿÿÿÿÿÿ125ÿÿÿÿÿÿÿÿ125ÿÿÿÿÿÿÿÿ125ÿ|ÿÿÿÿÿÿÿ500ÿ

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿlogitÿmanifest_healthÿi.income_quartile,ÿnolog

LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(3)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ16.08
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0011
Logÿlikelihoodÿ=ÿ-336.03101ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0234

---------------------------------------------------------------------------------
manifest_healthÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
----------------+----------------------------------------------------------------
income_quartileÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.3858131ÿÿÿ.2543692ÿÿÿÿ-1.52ÿÿÿ0.129ÿÿÿÿ-.8843676ÿÿÿÿ.1127414
ÿÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.5148579ÿÿÿ.2550893ÿÿÿÿ-2.02ÿÿÿ0.044ÿÿÿÿ-1.014824ÿÿÿÿ-.014892
ÿÿÿÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-1.027468ÿÿÿ.2633775ÿÿÿÿ-3.90ÿÿÿ0.000ÿÿÿÿ-1.543678ÿÿÿ-.5112571
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.2736958ÿÿÿ.1805631ÿÿÿÿÿ1.52ÿÿÿ0.130ÿÿÿÿ-.0802013ÿÿÿÿÿ.627593
---------------------------------------------------------------------------------

.ÿcontrastÿpw.income_quartile,ÿpveffects

Contrastsÿofÿmarginalÿlinearÿpredictions

Marginsÿÿÿÿÿÿ:ÿasbalanced

---------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿchi2ÿÿÿÿÿP>chi2
----------------+----------------------------------
income_quartileÿ|
ÿÿÿÿÿÿ(linear)ÿÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿ14.97ÿÿÿÿÿ0.0001
ÿÿÿ(quadratic)ÿÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0.12ÿÿÿÿÿ0.7290
ÿÿÿÿÿÿÿ(cubic)ÿÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0.63ÿÿÿÿÿ0.4272
ÿÿÿÿÿÿÿÿÿJointÿÿ|ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿ15.51ÿÿÿÿÿ0.0014
---------------------------------------------------

--------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿContrastÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|
----------------+---------------------------------------
income_quartileÿ|
ÿÿÿÿÿÿ(linear)ÿÿ|ÿÿ-.3590508ÿÿÿ.0927953ÿÿÿÿ-3.87ÿÿÿ0.000
ÿÿÿ(quadratic)ÿÿ|ÿÿ-.0316992ÿÿÿ.0914931ÿÿÿÿ-0.35ÿÿÿ0.729
ÿÿÿÿÿÿÿ(cubic)ÿÿ|ÿÿ-.0715914ÿÿÿ.0901722ÿÿÿÿ-0.79ÿÿÿ0.427
--------------------------------------------------------

.ÿ
.ÿexit

endÿofÿdo-file

.
Comment
Jun Xu

Join Date: Sep 2014

Posts: 33
#12

15 Jan 2018, 08:02

Joseph,

Thanks a lot for your help! Here I am trying to figure out what the contrast command is doing. I checked the pw. and pveffects options, but couldn't figure out what those linear, quadratic, cubic, and joint rows are exactly doing....Any pointer or reference would be greatly appreciated. Thanks.

Jun
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#13

15 Jan 2018, 16:53

They're orthogonal polynomial contrasts. See here, as well. The w is if you have unbalanced data.
Comment
Tim Lynns

Join Date: Jul 2020

Posts: 1
#14

02 Jul 2020, 11:30

Thanks Joseph and Clyde. Does it even make sense to adjust for covariates (e.g., age) in the logistic regression model, and then check which contrasts (linear, quadratic, cubic..etc) contribute significantly to the differences (in what..?) between the income quartile group, something like

logitÿmanifest_healthÿi.income_quartile + age,ÿnolog
contrastÿpw.income_quartile,ÿpveffects

In this scenario, what is the contrast command comparing -- the predicted probability based on the logistic regression with income quartile and age as covariates, or just with income quartile as the covariate?
Comment

Announcement

Trend Test for Slope Coefficients of a Set of Dummy Variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment