trend test after linear and logistic regression: neonatology

Carlo Pietrasanta

Join Date: Dec 2015

Posts: 5
#1

trend test after linear and logistic regression: neonatology

29 Dec 2015, 09:03

Dear Stata users,

I am a fellow in neonatology.
My knowledge in statistics is not very deep and I am quite new to Stata, trying to learn upon a previous output generated by a statistician of my hospital, so please apologize if my words are not correct.
I am analyzing a database of neonatal and maternal data, with the goal of establishing if there is a correlation between the different grades and stages of intrauterine inflammation (chorioamnionitis and funisitis in placental pathology) and neonatal adverse outcomes.

I am running linear (using xtreg command) and logistic (using logit command) analysis on the variables, stratifying according to grade/stage of inflammation and adjusting for different variables as gestational age and birth weight (I know that gestational age in this case could be a collider, I had long and useful discussion with my statistician but definitely the world of neonatologists still adjusts for gestational age, and so I do by now).

This is an example of output for linear regression, adjusted for e.g. (gestational age):

Summary for variables: apgar1
by categories of: totstage (totstage)

totstage | N min p50 mean max sd
---------+------------------------------------------------------------
0 | 668.0 0.0 8.0 7.3 10.0 1.8
1 | 73.0 0.0 8.0 6.9 9.0 2.1
2 | 34.0 1.0 5.5 5.5 9.0 2.2
3 | 26.0 1.0 5.0 4.8 9.0 2.5
---------+------------------------------------------------------------
Total | 801.0 0.0 8.0 7.1 10.0 2.0
----------------------------------------------------------------------
i.totstage _Itotstage_0-3 (naturally coded; _Itotstage_0 omitted)

Random-effects ML regression Number of obs = 801
Group variable: bracciale1 Number of groups = 600

Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 1.3
max = 3

LR chi2(4) = 257.10
Log likelihood = -1520.6991 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
apgar1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Itotstage_1 | -.0199738 .2049183 -0.10 0.922 -.4216063 .3816588
_Itotstage_2 | -.5320909 .2998786 -1.77 0.076 -1.119842 .0556604
_Itotstage_3 | -1.070005 .3376757 -3.17 0.002 -1.731837 -.4081724
eg | .3592327 .0230176 15.61 0.000 .3141191 .4043463
_cons | -4.319941 .7449413 -5.80 0.000 -5.779999 -2.859883
-------------+----------------------------------------------------------------
/sigma_u | .7995537 .1466053 .5581774 1.14531
/sigma_e | 1.418514 .0794971 1.270955 1.583205
rho | .2411064 .0848319 .1081827 .4328017
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 7.06 Prob>=chibar2 = 0.004

This is an example of output for logistic regression (this time adjusted for 4 confounders, gestational age, bw, SGA and steroids use):

i.totstage _Itotstage_0-3 (naturally coded; _Itotstage_0 omitted)

Logistic regression Number of obs = 805
Wald chi2(7) = 124.86
Prob > chi2 = 0.0000
Log pseudolikelihood = -135.21677 Pseudo R2 = 0.4014

(Std. Err. adjusted for 602 clusters in bracciale1)
--------------------------------------------------------------------------------
| Robust
ivh | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
_Itotstage_1 | .8225153 .376481 -0.43 0.669 .3353765 2.017229
_Itotstage_2 | 1.120536 .6718614 0.19 0.849 .345983 3.629085
_Itotstage_3 | 1.144978 .798461 0.19 0.846 .2918798 4.491489
eg | .662688 .0800413 -3.41 0.001 .5229962 .8396912
bw | .998587 .0008331 -1.69 0.090 .9969554 1.000221
agasgafenton03 | .5515735 .2948085 -1.11 0.266 .1934852 1.572385
steroidi | 1.290917 .6427349 0.51 0.608 .486512 3.425336
_cons | 103420 270176.6 4.42 0.000 617.8858 1.73e+07
--------------------------------------------------------------------------------

My question is: which test can I run to say "there is/there is not a linear trend in each outcome (e.g. apgar1 or ivh) from _Itotstage_1 to _itotstage_3 ? ".
In other words, it is enough to run the regression models and use the Wald chi2 as a trend indicator, or do I have to run other post-regression tests?
I am trying with _nptrend and _tabodds commands, but I am not sure if it is correct to use them after regression analysis. besides, using _tabodds for generating Mantel-Haenszel odds ratios I am having troubles when trying to adjust for more than 1 variable (in my case, 4 variables, with a nonsense output).

Thank you very much for the time you can dedicate me!

Carlo
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5011
#2

29 Dec 2015, 09:17

Welcome to Statalist. A couple of suggestions for asking your question more effectively:
Use code tags when presenting your code and output (see pt 12 of the FAQ). Even though you use a monospaced font, your output is very hard to read because spaces get deleted. Code tags avoid that.

Include the actual commands used, not just the output.

It looks like you are using the xi: prefix. Unless you are using an ancient version of Stata you should use factor variable notation instead. See -help fvvarlist-. The output will look nicer and you will have additional options with regards to post-estimation commands.

I wonder why you are using xtreg but not xtlogit?

None of this answers your questions, but hopefully it will help you to ask the questions more effectively.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

Carlo Pietrasanta

Join Date: Dec 2015
Posts: 5

29 Dec 2015, 11:43

Thank you for your time Richard, very much appreciated!

- I will convert the codes to code tags, Courier looked fine in the draft but not in the published post.
- My version is Stata 13.0
- yes, I am using xi:prefix as this was done in the output I am studying on, but I will try the factor variable notation, focusing on post-estimation commands, thanks!
- same for xtreg, i will study on xtlogit. As I told, all my experience comes from an old output generated months ago, and I only remember that in our case we had a long discussion on the effect of twin birth (recorded as variable "bracciale1", to indicate the number of bracelet which is identical for twins

) as a possible confunder, so the statistician decided to run a random effect analysis considering the effect of twin birth (twins are more likely to have "similar" outcomes compared to 2 independent newborns)

these are the command for linear regression ("foreach" option to analyze multiple outcomes, here I report just the outcome Apgar1):

Code:

foreach Y of var maternalage orelatenza eg bw apgar1 apgar5 duratadegenza ggcvc ggtet ggniv dosisurf {
tabstat `Y', by(totstage) s(N min median mean max sd) format(%8.1f)
capture noisily xi: xtreg `Y' i.totstage, i(bracciale1) mle nolog
capture noisily xi: xtreg `Y' i.totstage eg bw agasgafenton03 steroidi, i(bracciale1) mle nolog
}

Example output:

Code:

Summary for variables: apgar1
     by categories of: totgrade (totgrade)

totgrade |         N       min       p50      mean       max        sd
---------+------------------------------------------------------------
       0 |     668.0       0.0       8.0       7.3      10.0       1.8
       1 |      73.0       0.0       8.0       6.9       9.0       2.1
       2 |      40.0       1.0       5.0       5.3       9.0       2.4
       3 |      20.0       1.0       5.0       4.8       9.0       2.3
---------+------------------------------------------------------------
   Total |     801.0       0.0       8.0       7.1      10.0       2.0
----------------------------------------------------------------------
i.totgrade        _Itotgrade_0-3      (naturally coded; _Itotgrade_0 omitted)

Random-effects ML regression                    Number of obs      =       801
Group variable: bracciale1                      Number of groups   =       600

Random effects u_i ~ Gaussian                   Obs per group: min =         1
                                                               avg =       1.3
                                                               max =         3

                                                LR chi2(3)         =     53.68
Log likelihood  = -1622.4085                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      apgar1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Itotgrade_1 |  -.3171051   .2352666    -1.35   0.178    -.7782192     .144009
_Itotgrade_2 |  -1.725816   .3116504    -5.54   0.000    -2.336639   -1.114992
_Itotgrade_3 |  -2.324815   .4245396    -5.48   0.000    -3.156897   -1.492732
       _cons |   7.242662   .0805059    89.96   0.000     7.084873     7.40045
-------------+----------------------------------------------------------------
    /sigma_u |   1.305504   .0971979                      1.128247     1.51061
    /sigma_e |   1.375178   .0718416                       1.24134    1.523446
         rho |   .4740264    .057851                      .3631797    .5869352
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)=   40.71 Prob>=chibar2 = 0.000
i.totgrade        _Itotgrade_0-3      (naturally coded; _Itotgrade_0 omitted)

Random-effects ML regression                    Number of obs      =       799
Group variable: bracciale1                      Number of groups   =       599

Random effects u_i ~ Gaussian                   Obs per group: min =         1
                                                               avg =       1.3
                                                               max =         3

                                                LR chi2(7)         =    281.53
Log likelihood  = -1516.2946                    Prob > chi2        =    0.0000

--------------------------------------------------------------------------------
        apgar1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
  _Itotgrade_1 |    .002677   .2052016     0.01   0.990    -.3995107    .4048647
  _Itotgrade_2 |  -.7715936   .2797955    -2.76   0.006    -1.319983   -.2232044
  _Itotgrade_3 |  -.9345552   .3825459    -2.44   0.015    -1.684331   -.1847789
            eg |   .3279281   .0412002     7.96   0.000     .2471772    .4086789
            bw |   .0002193   .0002259     0.97   0.332    -.0002235    .0006621
agasgafenton03 |  -.1458929   .1996317    -0.73   0.465    -.5371637     .245378
      steroidi |   .0873188   .1554466     0.56   0.574     -.217351    .3919886
         _cons |  -3.730229   1.032828    -3.61   0.000    -5.754535   -1.705922
---------------+----------------------------------------------------------------
      /sigma_u |   .7826163   .1497877                      .5378169    1.138842
      /sigma_e |   1.425328   .0797434                      1.277298    1.590513
           rho |    .231648   .0853609                       .099883    .4267548
--------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)=    6.51 Prob>=chibar2 = 0.005

and logistic regression:

Code:

foreach Y of var etnia tipodigrav tipodiparto tiporottura caclinica antibio steroidi preeclampsiadefinitasibai2003 tv01 ore_lat2 sex agasgafenton03 gemellarita rianimsp o2maxinsp decessoneonatale sepsi* ggcvc_cat ggtet_cat ggniv_cat dosisurf_cat2 bpd01 ivh* rds rmn pda nec rop* {
capture noisily xi: logit `Y' i.totstage, or iter(20) nolog cluster(bracciale1)
capture noisily xi: logit `Y' i.totstage eg bw agasgafenton03 steroidi, or iter(20) nolog cluster(bracciale1)
}

Output

Code:

------------------------------------------------------------------------------
             |               Robust
         ivh | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Itotgrade_1 |   1.678409   .7950092     1.09   0.274      .663301    4.247027
_Itotgrade_2 |   6.002586   2.472652     4.35   0.000     2.677331    13.45782
_Itotgrade_3 |     7.9125   3.926773     4.17   0.000     2.991465    20.92876
       _cons |   .0631912   .0112414   -15.52   0.000     .0445894    .0895532
------------------------------------------------------------------------------
i.totgrade        _Itotgrade_0-3      (naturally coded; _Itotgrade_0 omitted)

Logistic regression                               Number of obs   =        805
                                                  Wald chi2(7)    =     124.20
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -135.16091                 Pseudo R2       =     0.4016

                             (Std. Err. adjusted for 602 clusters in bracciale1)
--------------------------------------------------------------------------------
               |               Robust
           ivh | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
  _Itotgrade_1 |   .8213548   .3761204    -0.43   0.667     .3347671    2.015203
  _Itotgrade_2 |   1.235818   .7301186     0.36   0.720     .3882101    3.934074
  _Itotgrade_3 |   .9554344    .702838    -0.06   0.951     .2259637    4.039829
            eg |   .6615841   .0796516    -3.43   0.001     .5225218     .837656
            bw |   .9985903   .0008306    -1.70   0.090     .9969637     1.00022
agasgafenton03 |    .552358   .2954489    -1.11   0.267     .1936085    1.575858
      steroidi |   1.272392   .6064567     0.51   0.613     .4999367    3.238372
         _cons |   109487.4     285781     4.45   0.000     657.0227    1.82e+07
--------------------------------------------------------------------------------
i.totgrade        _Itotgrade_0-3      (naturally coded; _Itotgrade_0 omitted)

Logistic regression                               Number of obs   =        807
                                                  Wald chi2(3)    =      22.64
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -94.986359                 Pseudo R2       =     0.0910

                           (Std. Err. adjusted for 603 clusters in bracciale1)

Best and thank you again,
Carlo

Last edited by Carlo Pietrasanta; 29 Dec 2015, 11:49.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#4

29 Dec 2015, 14:45

Carlo may be interested in the following thread:
http://stats.stackexchange.com/quest...tic-regression.

Kind regards,
Carlo
(Stata 19.0)
Comment
Carlo Pietrasanta

Join Date: Dec 2015

Posts: 5
#5

29 Dec 2015, 16:33

Thank you Carlo, I will study that.
Actually I work at Boston Children's Hospital but I also come from Milan, Unimi, Ospedale Policlinico.

Regards,
Carlo
Comment
Carlo Pietrasanta

Join Date: Dec 2015

Posts: 5
#6

29 Dec 2015, 16:37

Anyway the post you suggested is related to "R" software i think: is there a corresponding command in Stata?
Thanks
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#7

30 Dec 2015, 01:53

Carlo (I suppose you're member of Prof. Mosca's team) may want to try:

Code:

logit neonatal_AE i.totgrade eg bw <other_indepvars>

.

The undelying issue I see from your previous code is that you have too many clusters vs a limited sample size: this may downsize the statistical significance of your regression coefficients because each cluster has, on average, a small nummer of observations (and this will increase the cluster-adjusted standard errors).
As a closing-out aside, I would think about using panel data commands (such as -xtlogit-) only with periodically repeated measures on the same patients (I do not thnk this is the case with your dataset).
Eventually, if your depvar is binary (neonatal_AE yes/no), there's no methodological reason for using -regress- or -xtreg-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

30 Dec 2015, 05:07

Hello Carlo Pietrasanta,

Many insightful comments have already been given by Richard and Carlo Lazzaro. Since your issue relates to health sciences, though, I just wish to make a few remarks. The Apgar score, as you know, is ordinal and includes only integer numbers, therefore you have a discrete variable that goes from zero to 10, On account of that, maybe a linear regression wouldn't represent reality, once you have a truly ordinal variable. Perhaps - ologit - could do the trick for you (maybe transformed in 4 or 5 categories), including giving some sort of information on "trends" by checking the cutpoints. On a slightly different note, I see that the median Apgar for totgrade 0 and 1 is the same, as well as the median for totgrade 2 and 3. Considering that most of your sample is totstage =0, maybe you could try to aggregate at least totstage 2 and 3, and see if the difference would reach statistical signficance. Finally, I don't know what - ivh - means. If it is binary and you want to check your model according to 4 levels of totstage, I remind you could use - lroc - for this, and estimate the AUC as well as the ROC curve for the levels of totstage. Last but not least, you may use - margins - and - marginsplot - to present a "trendy" view of totgrade under your model.

Hopefully that helps.

Best,

Marcos

Last edited by Marcos Almeida; 30 Dec 2015, 05:18.

Best regards,

Marcos
1 like
Comment
Carlo Pietrasanta

Join Date: Dec 2015

Posts: 5
#9

02 Jan 2016, 16:35

Hello to everybody,

many thanks to you all. Carlo, you are right about my italian team.
I tried what you and Richard suggest, which is to run all the analysis with factor variables instead of xi. I didn't see any significant change in my results, but it works and I agree that the output is smoother.
I also run -xtlogit as well as -xtreg analysis, as I am interested in random-effects for all my outcomes and I think it was a mistake.

Carlo and Marcos, your concern about sample size in the "high grade or high stage" groups perfectly fit my doubts about stratifying in too many groups: basically, what I am trying now is to perform different level of stratification and see where I can reasonably "push it" maintaining a meaningful sense of my work and, if not similar, acceptable sample sizes for all the group. The output I posted was just one of this tests, although what I am trying to prove is if there are (or there are not) different outcomes between low and high grades/stages of fetal perinatal inflammation. Aggregating too many groups would give a very "simpler" output under a clinical point of view, which would not be my goal. It's a balance, you know.
I am not sure to understand Carlo's suggestion about not using xtreg: yes, my -depvar are binary in logistic regression and not repeated on the same patient in linear regression. But as I got from my statistical confluent (yes, due to my bad statistical knowledge I need support!), the use of a random effect model allows to "weight" for the variable "twins", which is named in my dataset as "bracciale1": this way, our goal was to adjust for twin birth, as a possible confounder of random variability (two twins have some higher probably to have similar outcomes, thus are not completely independent). Please apologize my low-level explanation, I hope you can understand.
Marcos, thank you very much. I agree with -ologit suggestion for the Apgar score. about aggregating stages 2 and 3, unfortunately it is not a good idea as I said before.

I try to go deeper in the problem of "trend": I recently read a paper presenting a dataset and an analysis very similar to what I am trying to do: the Authors used "linear-by-linear association or Jonckheere-Terpstra test" to identify a trend: this, specifically, is the subject of my question. I surfed a lot on the web trying to study these tests, finding as always many different point of views and much confusion. Shall I run these (or other) trend's test as a post-estimation analysis or the following parts of my outputs (LR chi2 for linear and Wald chi2 for logistic) represent a sufficient indicator of trend?

Code:

LR chi2(3) = 53.68 Log likelihood = -1622.4085 Prob > chi2 = 0.0000

Code:

Logistic regression Number of obs = 805 Wald chi2(7) = 124.20 Prob > chi2 = 0.0000 Log pseudolikelihood = -135.16091 Pseudo R2 = 0.4016

Thank you to you all and have a nice sunday

Carlo
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

03 Jan 2016, 05:45

Carlo Pietrasanta The Jonckheere-Terpstra test has its background in the medians. As I remarked, you have 2 equal medians for the first 2 groups and 2 equal medians for the last two groups. I doubt if a trend can be "statistically" demonstrated. What is more, if so, that doesn't mean you truly have a trend, for the very same reason concerning the medians. With regards to the mean, we have hints there might well be a trend, but, again, the n for the groups differ much, you have 2 subgroups with a tiny size, the SDs increase on a par with a decrease in the mean in the last 2 groups and, on these grounds, you are bound to get a non significant p-value of trend. That said, even if you, as you said, "push it" and go for the Jonckheerre-Terpstrat test, I fear you'll need to face (at least) 2 important aspects: first, this test is indicated for independent samples, and you informed us that you have "twin" observations. Second, this test is just for the group variable and the dependent variable, i.e,, it is a totally "unadjusted" test and according to what you presented, I believe you need to adjust for an array of covariates. My concerns and suggestions about "adjusted" tests are already expressed in #8.

Hopefully that helps!

Best,

Marcos

Last edited by Marcos Almeida; 03 Jan 2016, 06:00.

Best regards,

Marcos
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#11

03 Jan 2016, 12:01

Carlo:
as an aside to excellent Marcos' insights, I would only add thatyou can take into account the higher similarity between twins via clustering the standard errors of your regressione coefficients on "bracciale1" (meaning "wristband1" in Italian"):

Code:

logit neonatal_AE i.totgrade eg bw <other_indepvars>, vce(cluster bracciale1)

.
Eventually, it is advisable to post what you typed and what Stata gave you back (including the regression outcome), so that we can see the same big picture you describe in your message,
I reciprocate nice sunday everybody (in Milan is fading away, whereas you've got still time for relaxing yourself across the pond!).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement