Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is the Tobit Regression appropriate for data ranging from 0-300? Censored at 300

    I have a dataset with bacteria colony counts. The study was conducted as a trial and we did not plan to dilute highly positive samples and our level of countability was set to 300 colonies.

    So, 0= negative ones; from 1 to 300 = countable positive samples, and 99999 as the uncountable positive samples (above 300 colonies).

    I am interested in looking at the impact ot the intervention on the reduction of colony count. And when I try to use the Tobit Regression, it seems like I have too many zeros (0) in the data.

    Please any suggestion knowing I can't use the linear regression due to censoring ?

    Thanks

  • #2
    Let me make sure I understand. The actual variable can take on negative values as well as values above 300, but the y variable you observe is censored at zero and 300? If that's true, and you want to knew the effect on the underlying variable, then you can use a two-limit Tobit model. But that does assume that the underlying, uncensored variable follows a homoskedastic normal distribution.

    Comment


    • #3
      I guess I don't know more about microbiology than Jeff Wooldridge but it's hard for me to imagine negative counts of (colonies of) bacteria, unless in practice there is some subtraction

      observed amount MINUS background amount

      so that negative values arise from measurement error. Otherwise put, I am guessing that negative here means not observed rather than a negative value, and so has medical rather than mathematical meaning, as in testing negative for Covid or the influence of some malign software.

      That aside: do you expect linear relationships when all the censoring is dealt with as best you can?

      Comment


      • #4
        Dear Jeff Wooldridge
        thanks for your response. Sorry for the confusion. Please see attached a picture of how the data is organized. No negative value in the count variable. It ranges from 0-300 while the 99999 mean "Positive" but could not be numbered because greater than 300 colonies. Again can I use the Tobit regression Knowing we have a lot of zeros? Thanks
        Click image for larger version

Name:	Screenshot 2024-07-09 at 1.00.34 PM.png
Views:	1
Size:	69.9 KB
ID:	1758229

        Comment


        • #5
          Dear Nick Cox

          Thanks for your effort in understanding the issue here. And YES no negative values at all. I was referring to "Negative" as the final outcome. Again YES the big issue here is how to deal with the censored values especially the above 300 ones. As for the 0s, they are all negative results


          Also, can you just highlight for me here what are the key assumptions of the Tobit regression if I am willing to use it here?

          thanks you all

          Comment


          • #6
            The word assumption is over-used in statistical science, and I try to avoid it. My point is in a sense more nearly biological: what is the scientific rationale for fitting straight lines or the equivalent in your problem? The thread is dominated by the real issue of how to deal with censoring, but why expect linearity, which is the remaining part?

            The details are different but there is still a similarity to logit regression where fitting sigmoids is not just a reaction to limits at 0 and 1; it is a recognition that linear fits don't match the science.

            Comment


            • #7
              So you have a lot of inflation at 300 (all the 99999).

              Some type of selection correction may be useful, as you do not observe the outcome for 99999. But, you do know all the "missing" are large, so I'm not sure whether the standard selection methods would do.

              Interesting problem.

              Comment

              Working...
              X