STATA: Statistic significance test within groups?

Daniel Kak

Join Date: Apr 2020

Posts: 15
#1

STATA: Statistic significance test within groups?

24 Apr 2020, 06:18

Hi all,
I am stuck at the following problem: I need to determine if there is a significant difference in Length of stay (LOS) in two groups: decendents and survivors, in different age groups. Can somebody help me with this? In a similar scientific article I found they did use kruskal wallis to calculate this. Somebody some ideas to solve this in stata?

I attached a shot of my data example.

Thanks a lot!

Daniel

Attached Files

Last edited by Daniel Kak; 24 Apr 2020, 06:25.
Tags: data, significance, stata
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

24 Apr 2020, 06:32

You are not showing the data, but the aggregated results (median and IQR). Please beware snapshots are not the ideal way to share data as well.

If you wish to compare two age groups, the post-hoc estimation for the Kruskal-Wallis could be the Mann-Whitney test.

Just wondering, is this a homework?

Best regards,

Marcos
Comment
Daniel Kak

Join Date: Apr 2020

Posts: 15
#3

24 Apr 2020, 06:39

Dear Marcos,

You are right, this is not the actual data.. The p-value which is given is the calculated significance of the median of length of stay between decedents and survivors. I would like to see if there is a significant difference between the LOS of decedents and survivors in different age groups. I attached a table from another article, maybe that helps! Thanks for your reply anyway!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#4

24 Apr 2020, 06:49

Daniel:
usaully this kind of issues can be dealt with -regress-.
Obviously, considering as regressand (independent variable) the LOS and as a single predictor the categorical variable dead or alive means throwing away many pieces of information that the study you quoted conveys (by the way: for the future, please do not post screenshots. See the FAQ).
That said, you may want to try something like:

Code:

regress los i.alive i.comorbidities i.age i.Charlson

.

The -i.- prefix identifies categorical variables (see -fvvarlist-).

Obviously, Marcos' insightful comment about grouped data still applies.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

24 Apr 2020, 12:41

By taking a look at the full picture, I am a little bit surprised. Apparently, the authors did 3 Kruskal-Wallis tests. So far, so good.

But taking DV as los, and, say, "Age group" as the IV, there is no room for another IV, yet the model should encompass (at least) the survival (decedents versus survivors) group.

Therefore, maybe it is just the kwallis test of los for age group "if" survival == decedents, and if survival == survivors (hence, two extra tests).

If we take a look at the title in bold letters, the results are fundamentally related to the decedents.

Again, the table is not 100% clear. In short, it is puzzling, to say the least.

Little wonder that people who died before leaving the hospital will have on average, well, a shorter los if compared with the survivors.

But I fear this sort of comparison is too problematic, truncation being an issue.

Anyway, for los, a Poisson-like regression could be used instead.

Last edited by Marcos Almeida; 24 Apr 2020, 12:43.

Best regards,

Marcos
1 like
Comment

Daniel Kak

Join Date: Apr 2020
Posts: 15

26 Apr 2020, 03:44

Dear Marcos and Carlo,

Thanks for taking the time to look at my problem. I must conclude a regression is necessary like you said.

Input regress LOSintdis i.Event i.agegroup (Event=dead/alive)

Code:

Input: regress LOSintdis i.Event i.agegroup
output:
LOSintdis |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.Event |  -12.44521   .7785079   -15.99   0.000    -13.97188   -10.91854
             |
    agegroup |
          2  |  -3.597032   1.342318    -2.68   0.007    -6.229339   -.9647255
          3  |  -3.893011   1.325733    -2.94   0.003    -6.492795   -1.293227
             |
       _cons |   25.15311   1.218395    20.64   0.000     22.76382     27.5424

I am now wondering if what p-value to use and if this value is similair to the example article.. (and if this is the right way) I hope you understand my question! I am trying to learn stata but I am not an expert yet...
Thanks!

Last edited by Daniel Kak; 26 Apr 2020, 03:55.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#7

26 Apr 2020, 04:41

Daniel:
actually, -regress- outcome conveys far more results than a simple -ttest- or median comparison, as you can see the contribution of each predictor (when adjusted for the remaining ones) in expaioning variations in LOS.
In addition, the p-values are in line with those obtained via the comparisons that you mentioned in your first post.
That said:
1) I would have added in the right-hand side of the OLS equation the i.comorbidities i.Charlson (and test whether thay are perfectly collinear with the other predictors. If that were the case, Stata would omit one of the collinear predictor automatically;
2) the results of a regression model should not be rolled out and disseminate without a careful post estimation check (see -estat hettest-, ans more substantively, -estat ovtest- or, alternatively, -linktest-).

Last edited by Carlo Lazzaro; 26 Apr 2020, 04:44.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

26 Apr 2020, 11:31

This is just to add that, if you’re dealing with length of stay, a Poisson-like model (- help poissson - help nbreg - ) may fit well.

Best regards,

Marcos
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#9

26 Apr 2020, 11:42

Marcos wisely highlights a long-lasting debate in applied health economics: is LOS a continuous variable (and this makes things easier) or does it (potentially) follow a -poisson- distribution, as LOS is a count variable that can take on >=0 values (ie, days)?

Kind regards,
Carlo
(Stata 19.0)
Comment
Daniel Kak

Join Date: Apr 2020

Posts: 15
#10

27 Apr 2020, 07:12

Dear Carlo and Marcos,

I think it is distributed poisson like, but not sure. If I make a histogram events seems to occur in the beginning. Also, the LOS variable can not have a value of 1.3, it is a round number like 0/1/2. Do I understand well that I needs to be a poisson-like model and so a poisson regression?

Kind regards,
Daniel
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#11

27 Apr 2020, 10:38

Daniel:
count variables can actually take on integer values >=0. For instance, 1.3 can well be the mean of LOS but not the number of days spent in inpatient setting by a given patient.
I'm not clear with what you mean by "events seem to occur at the begin". Do you mean that LOS distribution shows a spike at 1 day? Or else? Please clarify.
Usually, patient referred to inward hopsital setting (ie, who are not in day-hospital setting) total LOS=0 if they tragically pass away the very same hospitalization day or if they are trasferred to another health care facility just after being hospitalized.
Conversely, if your zeros is a mix of data concerning those who accesssed the Emergency Department but were then discharged without hospitalization (who totalled 0 because they actually were not hopistalized) and those who totaled 0 for the reasons explained above, you actually have a more complex situation to handle, as you zero actually belongs to ywo different data generating process.
If that were the case, -poisson- won't do a good job and you should switch to -zip-. The Stata .pdf manual entry for -zip- is clear and offers many interesting references on this topic; my favourite one (Specification and testing of some modified count data models. Journal of Econometrics 1986;33: 341–365.) is a pivotal article of a towering health econometrician, who is also a regular contributor of this list, John Mullahy .

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#12

27 Apr 2020, 16:10

Just a side note after Carlo’s insightful reply. A good starting point is a Poisson model. Then, if you have count data, say, with ‘lots’ of zeroes, you may select a zero-inflated model; on the contrary, if you have, say, los >=1, you may select a zero-truncated model. Beware overdispersion is usually an issue to curb. All in all, please start by grasping a good knowledge of the matter before jumping on this (remarkable) bandwagon.

Best regards,

Marcos
Comment

Announcement