Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • t-test to check if two means are statistically different

    Hi,

    I have a dataset like this. Each patient has multiple admissions and cost corresponding to that admission in a particular year, and either falls in sample a or sample b. the variable ' months' gives the number of months that the patient is alive in that year. So, I calculate the average monthly cost as total cost/total months. So, for sample a, the average cost is 200/6 and for sample b, the average cost is 210/14. How do I apply t-test to check if these two means are statistically different?
    PT_ID Cost subsample months
    1 20 a 4
    1 30 a 4
    1 40 a 4
    2 10 b 6
    2 20 b 6
    2 30 b 6
    3 50 a 2
    3 60 a 2
    4 70 b 8
    4 80 b 8
    Thanks a lot!

    Regards,
    Ishwarya

  • #2
    I'm not sure how hospital costs are typically modeled nowadays, but how about something like the following?
    Code:
    version 17.0
    
    clear *
    
    input byte(PT_ID Cost) str1 subsample byte months
    1     20     a     4
    1     30     a     4
    1     40     a     4
    2     10     b     6
    2     20     b     6
    2     30     b     6
    3     50     a     2
    3     60     a     2
    4     70     b     8
    4     80     b     8
    end
    
    bysort PT_ID: generate byte last = _n == _N
    by PT_ID: generate int sco = sum(Cost)
    by PT_ID: generate int tim = sum(months)
    encode subsample, generate(grp) label(Subsamples)
    
    glm sco i.grp if last, family(poisson) link(log) exposure(tim) vce(robust)
    
    exit
    . . . months' gives the number of months that the patient is alive in that year.
    Hmm. Might be important, too.

    You might want to look into some kind of multivariate regression, say, a so-called mixed-response model.

    Comment


    • #3
      Sorry, too fast: given "in a particular year", the data shown cannot span more than one year.

      So, modify the line
      Code:
      by PT_ID: generate int tim = sum(months)
      to
      Code:
      by PT_ID: assert months == months[1]
      and modify the estimation command as follows:
      Code:
      glm sco i.grp if last, family(poisson) link(log) exposure(months) vce(robust)

      Comment


      • #4
        Thank you! Can you pls tell me what this command is doing? Is this a t-test?

        Comment


        • #5
          Ishana:
          in line with your inputs, Joseph provided you with a -glm- regression, with family(poisson) (that is, the regressand is assumed to come from a poisson distribution, as you shared with the forum a count variable) and a log link between the regressand and the index function (that is, the right-hand side of the regerssion equation).
          Obviously, this is not a tttest, because you do not provide continuous data.
          That said, I do not think that you modeled hospitalization costs the right way.
          Unlike days of hospitalization (that can be considered discrete or, in some cases continuous, as they sum up hours of hospitalization, that are continous), cospitalization costs are continuous.
          Therefore, you should have reported the cost per patient totaled at each episode of hospitalization (and not their mean).
          This way you will end up with a longitudinal dataset that can be dealt with -xtreg-.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Hi Carlo and Joseph,

            Thank you. But, I think I may not have been clear with my question. 'Cost' is the cost per admission. So, I have, say 100 admissions of which 50 fall in sample a and 50 fall in sample b. If I have to find the mean cost of all the admissions in sample A vs sample B and test if they are statistically different, I can simply use the command ttest Cost, by (sample). However, the problem is I want to calculate the average monthly cost of all patients in sample A vs sample B. Therefore, I use Cost as well as months to calculate average monthly cost as follows.

            Sum of all admissions in sample A / total no of months that patients in sample A were alive

            In the example, patient 1 and 3 (who were alive for 4 and 2 months respectively in that year) fall in sample A. So, I calculate their average monthly cost as (20+30+40+50+60)/(4+6). Similarly, for sample B, the average monthly cost is 210/14. My question is 'Is the average monthly cost for sample A statistically different from that of sample B?'

            Comment


            • #7
              Originally posted by Ishana Balan View Post
              Can you pls tell me what this command is doing? Is this a t-test?
              It does test for a difference between the two samples, yes. For an actual Student's t-test, you could use regress, but because of the nature of the distribution of hospital costs, I chose a generalized linear model, instead, as more suitable. To see more of what that is all about, read the blog post here.

              . . . the problem is I want to calculate the average monthly cost of all patients in sample A vs sample B. . . . My question is 'Is the average monthly cost for sample A statistically different from that of sample B?'
              Yes, that was assumed: you're interested in whether the monthly rate of hospital cost (spending) differs between the two samples. And that's what the exposure(months) is for. The generalized linear model that I show tests whether cumulative hospital costs, as a time-dependent rates, differs between the two.

              Because of the logarithmic link function, the difference in rates between samples is expressed as a ratio. You can see the ratio of the two rates with the eform option of the glm estimation command. If you want to see the two rates, themselves, you can look into the margins postestimation command. See their help files for more about the estimation and postestimation commands.
              Code:
              help glm
              help glm_postestimation##margins

              Comment


              • #8
                Ishana:
                you may also want to consider:
                Code:
                . g subsample_num=0 if subsample=="a"
                . replace subsample_num=1 if subsample=="b"
                . xtset PT_ID months ///if Stata returns  r(451) error, just go: xtset PT_ID
                . xtreg Cost i.months i.subsample_num, fe
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thank you Joseph and Carlo..much appreciated! I will read about this more.

                  Comment

                  Working...
                  X