Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trend Test for Slope Coefficients of a Set of Dummy Variables

    Dear Statalisters,

    Here I have a statistical question, related to I believe trend test, for a substantive problems. In public health literature, the SES-health gradient is quite well-konwn; that is as one's SES level increases, so does his/her health status. So suppose here I have an income variable with, say four levels, corresponding to first, second, third, and fourth quartiles, and I use the fourth quartile as the reference category with Q1, Q2, Q3 as the dummy indicators for the first three levels. To simplify, I will just have a bivariate linear regression model with only dummy variables on the right hand side, where y* is this latent variable with higher values denoting poorer health status

    y* = b0+ b1Q1 + b2Q2 + b3Q3 + e

    One way to see if the gradient holds is to test if 1) b1, b2, and b3 are all positive and significantly different from zero; 2) b1 is greater than b2 and b3; and 3) b2 is greater than b3 all simultaneously. Here I can use different options, for example, an awkward one would be the Bonferroni correction.

    I am wondering if there is any trend test I can use here to show that the coefficients, from b1 to b3, reduce their magnitude in a roughly linear trend, and how to do that in Stata. Thanks a lot!

    Jun Xu, PhD
    Professor and Graduate Director
    Department of Sociology
    Ball State University

  • #2
    This is very confusing. You stay that y* is a latent variable. But the equation you show cannot be identified without some observables that indicate y*. Can you clarify?

    Comment


    • #3
      Sorry for the confusion. y* could be a latent variable in a binary regression or ordinal regression model, given that the probability distribution is known, such as normal or ordinal. For example self-rated health, either good or poor (binary) or ordinal (excellent 1, very good 2, good 3, fair 4, poor 5). Or I can replace y* with an observed y, and then we have a linear regression. Hope this clarifies.

      Comment


      • #4
        OK. Well, I would take a slightly different approach and not base it on a regression. There is a command:

        Code:
        nptrend outcome, by(income_quartile)
        which will work well whether the outcome variable you are examining for trend is continuous or discrete, and where income_quartile is a single variable coded 1 through 4.

        Comment


        • #5
          Clyde,

          Thanks a lot! I wasn't very clear in the first post. The dependent variable is a binary variable with one denoting having poor health and zero good health. My goal is to see if there is any linear trend in the set of dummy variables for income. I think I should be able to nptrend for the example that I described above, but in my real analysis, I have a set of covariates (other independent variables). So in the latter case,how can I test if there is linear trend in the set of coefficients for income? Thanks.

          Jun

          Comment


          • #6
            Check out https://www.stata.com/support/faqs/s...est-for-trend/, particularly the approaches offered towards the end.

            Comment


            • #7
              Thanks a lot!

              Comment


              • #8
                Clyde or anyone else,

                I read through the Stata post on trend test. I am wondering if I could just recode the income variable into an ordinal income variable (call it incmLin), such that incmLIN is valued one if one's income is within the first quartile, 2 if in second quartile, 3 in the third, and 4 in the fourth. Then I simply run a binary logit regression of self-rated health on incmLin along with other controls. If the coefficient for this variable is negative (poor health is coded 1 and good zero) and significant, then we say there is a linear trend between income and log odds of health? Any pointer or help would be greatly appreciated.

                Jun

                Comment


                • #9
                  Yes and no. The problem with this approach is that the coding as 1, 2, 3, 4 is arbitrary and it implicitly models the difference between, say, the fourth and 2nd quartiles as being the same as the difference between the 1st and 3rd. If that's true, then great. But if not, you could end up failing to find a relationship that would show up if you coded them as, say, 1, 2, 4, 8 or some other monotone transformation of 1, 2, 3, 4. You are converting ordinal information into interval information arbitrarily, and the results will be sensitive to the particular way you do it.

                  Why don't you just model your outcome against the income variable itself instead of the quartiles? Taking a continuous variable and breaking it up into groups is occasionally useful for descriptive purposes, but when used for analysis creates problems such as this, and also discard information. Let's suppose the 75th percentile income is 1000 currency units. Using quartiles then says that a person whose income is 1000 units is the same as a person whose income is 100000 units, but is radically different from one whose income is 999 units. Grouping continuous variables discard information and introduces distortions. I think you should just not use this approach if the income variable itself is available to you.

                  That said, if you insist on proceeding with the quartiles, what about an approach where you carry out your regression and then follow it with:
                  Code:
                  contrast a.quartile
                  That will give you regression adjusted contrasts of the outcome in each quartile with the next quartile up, as well as a Joint test of all three of those comparisons.

                  Comment


                  • #10
                    Clyde,

                    Thank you very much for your help. I can understand the common concerns about my approach, but it again goes back to my original problem, whether there is a (linear) trend in the coefficients for Q1, Q2, and Q3 in the equation:

                    Logit(y=1; poor health) = b0+ b1Q1 + b2Q2 + b3Q3 + other variables, where Q1-Q3 are dummy indicators for income variable. Here I have to add that in this case I probably have to use quartiles because I am using an international database, a quartile measure might be more feasible than some continuous measure of income adjusted by currency exchange rate. My intention is to test the well-known health gradient for income. One strategy is to conduct simultaneously hypothesis testing for the following: 1) b1> b2; 2) b1 > b3; 3) b2>b3. Or I can simply test if there is a linear trend in the coefficients.

                    I also saw in the post that you referred me and I found by myself that in the example, the outcome variable a is also coded as 1 (good), 2 (better), and 3 (best), William Sribney even suggested using a regression model. This made me think about recoding the income variable from binary indicators to an ordinal one, and then test if there is a trend there for the ordinal income groups with regard to health. For nptrend and ptrend, my understanding is that they are used for bivariate cases.

                    This actually surprises that I don't see a lot of literature on this topic online after extensive google search. Thanks a lot!

                    Jun

                    Comment


                    • #11
                      Originally posted by Jun Xu View Post
                      Or I can simply test if there is a linear trend in the coefficients.
                      Yes; this is the way. It is trivial in Stata, especially when you use a factor variables to indicate quartiles of income and not separate manually generated indicator variables. See below. (Start at the "Begin here" comment. The stuff at the top is just to create a artificial toy dataset for illustration.)

                      .ÿversionÿ15.1

                      .ÿ
                      .ÿclearÿ*

                      .ÿsetÿseedÿ`=strreverse("1425307")'

                      .ÿ
                      .ÿquietlyÿdrawnormÿlatent_healthÿincome,ÿdoubleÿcorr(1ÿ0.25ÿ\ÿ0.25ÿ1)ÿn(500)

                      .ÿ
                      .ÿgenerateÿbyteÿmanifest_healthÿ=ÿlatent_healthÿ<ÿ0ÿ//ÿIndicatesÿpoorÿhealth

                      .ÿ
                      .ÿegenÿbyteÿincome_quartileÿ=ÿcut(income),ÿgroup(4)ÿ//ÿIncomeÿatÿleastÿithÿquartile

                      .ÿquietlyÿreplaceÿincome_quartileÿ=ÿincome_quartileÿ+ÿ1

                      .ÿ
                      .ÿtabulateÿmanifest_healthÿincome_quartile

                      manifest_hÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿincome_quartile
                      ÿÿÿÿÿealthÿ|ÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿ4ÿ|ÿÿÿÿÿTotal
                      -----------+--------------------------------------------+----------
                      ÿÿÿÿÿÿÿÿÿ0ÿ|ÿÿÿÿÿÿÿÿ54ÿÿÿÿÿÿÿÿÿ66ÿÿÿÿÿÿÿÿÿ70ÿÿÿÿÿÿÿÿÿ85ÿ|ÿÿÿÿÿÿÿ275ÿ
                      ÿÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿÿÿÿÿÿ71ÿÿÿÿÿÿÿÿÿ59ÿÿÿÿÿÿÿÿÿ55ÿÿÿÿÿÿÿÿÿ40ÿ|ÿÿÿÿÿÿÿ225ÿ
                      -----------+--------------------------------------------+----------
                      ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿ125ÿÿÿÿÿÿÿÿ125ÿÿÿÿÿÿÿÿ125ÿÿÿÿÿÿÿÿ125ÿ|ÿÿÿÿÿÿÿ500ÿ


                      .ÿ
                      .ÿ*
                      .ÿ*ÿBeginÿhere
                      .ÿ*
                      .ÿlogitÿmanifest_healthÿi.income_quartile,ÿnolog

                      LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(3)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ16.08
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0011
                      Logÿlikelihoodÿ=ÿ-336.03101ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0234

                      ---------------------------------------------------------------------------------
                      manifest_healthÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
                      ----------------+----------------------------------------------------------------
                      income_quartileÿ|
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.3858131ÿÿÿ.2543692ÿÿÿÿ-1.52ÿÿÿ0.129ÿÿÿÿ-.8843676ÿÿÿÿ.1127414
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.5148579ÿÿÿ.2550893ÿÿÿÿ-2.02ÿÿÿ0.044ÿÿÿÿ-1.014824ÿÿÿÿ-.014892
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-1.027468ÿÿÿ.2633775ÿÿÿÿ-3.90ÿÿÿ0.000ÿÿÿÿ-1.543678ÿÿÿ-.5112571
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                      ÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.2736958ÿÿÿ.1805631ÿÿÿÿÿ1.52ÿÿÿ0.130ÿÿÿÿ-.0802013ÿÿÿÿÿ.627593
                      ---------------------------------------------------------------------------------

                      .ÿcontrastÿpw.income_quartile,ÿpveffects

                      Contrastsÿofÿmarginalÿlinearÿpredictions

                      Marginsÿÿÿÿÿÿ:ÿasbalanced

                      ---------------------------------------------------
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿchi2ÿÿÿÿÿP>chi2
                      ----------------+----------------------------------
                      income_quartileÿ|
                      ÿÿÿÿÿÿ(linear)ÿÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿ14.97ÿÿÿÿÿ0.0001
                      ÿÿÿ(quadratic)ÿÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0.12ÿÿÿÿÿ0.7290
                      ÿÿÿÿÿÿÿ(cubic)ÿÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0.63ÿÿÿÿÿ0.4272
                      ÿÿÿÿÿÿÿÿÿJointÿÿ|ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿ15.51ÿÿÿÿÿ0.0014
                      ---------------------------------------------------

                      --------------------------------------------------------
                      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿContrastÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|
                      ----------------+---------------------------------------
                      income_quartileÿ|
                      ÿÿÿÿÿÿ(linear)ÿÿ|ÿÿ-.3590508ÿÿÿ.0927953ÿÿÿÿ-3.87ÿÿÿ0.000
                      ÿÿÿ(quadratic)ÿÿ|ÿÿ-.0316992ÿÿÿ.0914931ÿÿÿÿ-0.35ÿÿÿ0.729
                      ÿÿÿÿÿÿÿ(cubic)ÿÿ|ÿÿ-.0715914ÿÿÿ.0901722ÿÿÿÿ-0.79ÿÿÿ0.427
                      --------------------------------------------------------

                      .ÿ
                      .ÿexit

                      endÿofÿdo-file


                      .


                      Comment


                      • #12
                        Joseph,

                        Thanks a lot for your help! Here I am trying to figure out what the contrast command is doing. I checked the pw. and pveffects options, but couldn't figure out what those linear, quadratic, cubic, and joint rows are exactly doing....Any pointer or reference would be greatly appreciated. Thanks.

                        Jun

                        Comment


                        • #13
                          They're orthogonal polynomial contrasts. See here, as well. The w is if you have unbalanced data.

                          Comment


                          • #14
                            Thanks Joseph and Clyde. Does it even make sense to adjust for covariates (e.g., age) in the logistic regression model, and then check which contrasts (linear, quadratic, cubic..etc) contribute significantly to the differences (in what..?) between the income quartile group, something like

                            logitÿmanifest_healthÿi.income_quartile + age,ÿnolog
                            contrastÿpw.income_quartile,ÿpveffects


                            In this scenario, what is the contrast command comparing -- the predicted probability based on the logistic regression with income quartile and age as covariates, or just with income quartile as the covariate?

                            Comment

                            Working...
                            X