Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • linear regression query

    Hi all

    I'm trying to find out the difference in the time to surgery between anticoagulated and non-anticoagulated patients between four different countries. my hypothesis is that there is substantial variation in the time difference between the two patient groups.

    i have tried to approach this via linear regression however i am not sure whether i have done this correctly...

    the command i am using is

    regress timetosurgeryinhours ib0.country ib2.patient_anticoagulant_med, base

    Click image for larger version

Name:	1.PNG
Views:	1
Size:	19.2 KB
ID:	1726613





    my confusion stems around whether this is actually giving me the difference in time (hours) between the anticoagulated and non-anticoagulated patient groups? if it is providing this data then is the coefficient for whether patient is anticoagulated or not specific to each country or just overall using entire dataset? the latter would obviously be incorrect.

    . dataex timetosurgeryinhours country patient_anticoagulant_med if n<=20

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(timetosurgeryinhours country) byte patient_anticoagulant_med
    36.434105 0 1
     36.23081 0 1
    36.049545 0 1
    36.334103 0 2
     36.03004 0 2
    111.29443 0 2
        36.35 0 2
     36.41846 0 2
    36.165154 0 2
     36.39701 0 2
     36.11144 0 2
    65.655876 0 2
    36.016186 3 2
     36.25143 0 2
    36.015553 1 2
     36.03304 1 2
    36.047688 0 2
     36.17926 0 2
     36.00271 1 2
     48.01098 0 2
    end
    label values country country
    label values patient_anticoagulant_med patient_anticoagulant_med_
    label def patient_anticoagulant_med_ 1 "Yes", modify
    label def patient_anticoagulant_med_ 2 "No", modify
    ------------------ copy up to and including the previous line ------------------

    Listed 20 out of 9837 observations


  • #2
    Originally posted by ahmed farhan View Post
    . . . is the coefficient for whether patient is anticoagulated or not specific to each country or just overall using entire dataset?
    It's the latter. If you want the former, then you'll need to include country × coagulation interaction in your regression model.
    Code:
    help fvvarlist
    You have time-to-event data and so you also might want to consider alternatives to linear regression.
    Code:
    help glm
    help st
    You could give this blog post a look, too..

    Comment


    • #3


      thanks, that makes sense.
      so i have done the following and got the results below.
      Am i right in interpreting this to say that the difference in time to surgery between anticoagulated and non-anticoagulated patients in country 3 is +33.76 hours compared to country 0?




      Click image for larger version

Name:	2.PNG
Views:	1
Size:	24.6 KB
ID:	1726622


      Comment


      • #4
        i am also looking to make inferences however timetosurgerinhours is unfortunately not normally distributed (see below). i have log transformed this data and appears more appropriate (see below). However, i am not sure how to interpret the coefficient value in the new regression model using the log transformed data. if i exponentiate say the coefficient for country 3 (0.691688) then the value is 1.997 (?hours) which does not seem correct?
        thank you for your help



        Click image for larger version

Name:	logg.PNG
Views:	1
Size:	24.5 KB
ID:	1726627






        Click image for larger version

Name:	log.png
Views:	1
Size:	19.9 KB
ID:	1726626





        .
        Click image for larger version

Name:	standard.png
Views:	1
Size:	22.1 KB
ID:	1726625

        Comment


        • #5
          Originally posted by ahmed farhan View Post
          Am i right in interpreting this to say that the difference in time to surgery between anticoagulated and non-anticoagulated patients in country 3 is +33.76 hours compared to country 0?
          You can try testparm just to be sure, but eyeballing the interaction terms in the regression table output, it seems as if you don't have an interaction of country and coagulation status, and so my interpretation would be that there's evidence of a difference in time-to-surgery between the countries, but not necessarily one that differs depending upon coagulation treatment.

          Originally posted by ahmed farhan View Post
          i am also looking to make inferences however timetosurgerinhours is unfortunately not normally distributed (see below). i have log transformed this data and appears more appropriate (see below). However, i am not sure how to interpret the coefficient value in the new regression model using the log transformed data.
          glm (help file referred to above) with a log link function would help avoid the need for back-transformations. And do take a look a that blog post linked to above.

          Comment


          • #6
            Ahmed:
            two asides about your original post:
            1) with such a large sample, it is unlikely that default standard errors are the way to go (that is, heve already checked for heteroskedasticity and autocorrelation of the epsilon)?;
            2) normality is a (weak) requirement for residual (epsilon) distribution.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Ahmed:
              two asides about your original post:
              1) with such a large sample, it is unlikely that default standard errors are the way to go (that is, heve already checked for heteroskedasticity and autocorrelation of the epsilon)?;
              2) normality is a (weak) requirement for residual (epsilon) distribution.
              thanks, so it turns out there is heteroskedasticity even when using the logtransformed data. i have tried square-root transformation which removes heteroskedasticity but data then deviates from normality on visualisation ... by point two, did you mean that it would be okay to infer despite this?

              Comment


              • #8
                Originally posted by Joseph Coveney View Post
                You can try testparm just to be sure, but eyeballing the interaction terms in the regression table output, it seems as if you don't have an interaction of country and coagulation status, and so my interpretation would be that there's evidence of a difference in time-to-surgery between the countries, but not necessarily one that differs depending upon coagulation treatment.

                glm (help file referred to above) with a log link function would help avoid the need for back-transformations. And do take a look a that blog post linked to above.
                thanks, that glm function sounds useful and would use it but turns out my data is heteroskedastic even when log transformed. not really sure what to do now.

                also, appreciate theres no significant interaction but out of interest it would be the interaction coefficients rather than the country coefficients to report, right?

                unfortunately, i did try reading that blog post however couldn't quite understand it. also, not sure i can use poisson as it is more related to counts and rate, whereas in my study each patient only undergoes a single operation
                Last edited by ahmed farhan; 10 Sep 2023, 08:43.

                Comment


                • #9
                  should i go with weighted least squares method? dont seem to be winning here :/

                  Last edited by ahmed farhan; 10 Sep 2023, 08:15. Reason: nevermind, tried this and turns out doesnt allow "factor-variable and time-series operators not allowed"

                  Comment


                  • #10
                    here is the poisson regression as per that article. i presume coefficients are in units hours too? also, cant seem to work out residuals as stata reports "option res not allowed" when using command "predict res, res". also, not sure if data is homo or hetero-skskedastic as when running command "hettest", i get message "last estimates not found". any help would be much appreciated

                    Click image for larger version

Name:	poisson1.PNG
Views:	1
Size:	26.0 KB
ID:	1726644

                    Comment


                    • #11
                      ahmed, several points.

                      1. You don't need to test for normality. You have lots of observations, and it's not clear what you would do if you reject.
                      2. You don't need to test for heteroskedasticity (in the linear case) or for the variance equal to the mean in the Poisson case. The vce(robust) options allow for fully robust standard errors.
                      3. You should almost certainly use the ltimetosurgery in the linear model or use the exponential model and Poisson quasi-MLE. But you need to use vce(robust) in the former case.
                      4. You can see the linear model for ltimetosurgery and the exponential Poisson give pretty similar results. To first order, the effects is a 17% longer time to surgery if given the anticoag medicine for the base country.
                      5. The effect varies by country but not in a statistically significant way. Use the test command to jointly test the three interaction terms. I suspect they are not significant at even a high significance level.
                      6. Given the outcome of the test in 5, argue for the simpler model and reestimate using Poisson regression or linear regression with
                      ltimetosurgery as the dependent variable.
                      7. To get a more precise estimate of the percentage effect, use

                      Code:
                      nlcom 100*(exp(_b[patient_anticoagulent.Yes]) - 1)
                      You should read up on how to interpret coefficients in a linear model when the dependent variable is in logarithmic form. They have percentage change interpretations when multiplied by 100.

                      Comment


                      • #12
                        thanks Jeff

                        for point 5, here are the results for the interaction after running both linear and poisson models and appears to be significant?

                        when i try and run the command you posted in point 7, stata says following for some reason. i also tried the command without the label (i.e. value 1) which didnt work either.

                        patient_anticoagulant_med: operator invalid
                        r(198);


                        also, i realised it was a percentage change after some extensive googling shortly following that post. i presume the above command is meant to revert it back to hours?

                        Click image for larger version

Name:	linear.PNG
Views:	1
Size:	8.3 KB
ID:	1726658



                        Click image for larger version

Name:	poissn.PNG
Views:	1
Size:	12.2 KB
ID:	1726659

                        Comment


                        • #13
                          Ahmed:
                          follows Jeff's enlightening guidance.
                          As far as your questions about my previous reply are concerned:
                          1) once you go -vce(robust)- to take heteroskedasticity into account, you should not check for heteroskedastcity again;
                          2) once you have invoked -vce(robust)- as per 1), the regression can be run (and its results read) without further problems.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            ahmed. No, the transformation does not "convert back to hours." It's a more precise estimate of the percentage effect when the effect is somewhat large. So, in the linear case, exp(.196) - 1 = .21652691 and so about 21.7%.

                            You didn't use the correct test. You don't want to include the country dummies themselves, only the interaction terms. In other words, the first three coefficients should not be in the test; only coefficients (4), (5), and (6). You want to test the null the the effect doesn't depend on country. The time to surgery clearly depends on country, so those three dummies must be included.

                            Please read the FAQ about how to show your results between code delimiters. Using screen shots makes it difficult on us to copy and past your command.

                            If you want to have the effect in hours, use Poisson regression and the margins command for the patient_anticoagulant_med variable.

                            Comment


                            • #15
                              thanks you for getting back to me.
                              i'm more interested in seeing whether there is a difference in timetosurgery between countries among the two groups (anticoagulated and non-anticoagulated). in other words, is the timetosurgery difference between the two patient groups in country0 the same as in country1/2/3. not sure this is telling me that?

                              also, here is what i get with the margins command. not sure what this is telling me though? the values look too large for them to be differences between patient groups and how to get it to compare to country 0 (as base)?

                              . margins country

                              Predictive margins Number of obs = 9,813
                              Model VCE : Robust

                              Expression : Predicted number of events, predict()

                              ------------------------------------------------------------------------------
                              | Delta-method
                              | Margin Std. Err. z P>|z| [95% Conf. Interval]
                              -------------+----------------------------------------------------------------
                              country |
                              0 | 42.42029 .409571 103.57 0.000 41.61754 43.22303
                              1 | 41.60721 1.09005 38.17 0.000 39.47075 43.74367
                              2 | 47.6196 1.926301 24.72 0.000 43.84412 51.39508
                              3 | 77.8627 2.692047 28.92 0.000 72.58639 83.13902

                              Comment

                              Working...
                              X