Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to account for standard error of outcome in a linear regression model?

    Year N (number of hospitalizations) Standard error of N
    1 1000 500
    2 2000 600
    3 3000 700
    4 4000 800
    5 5000 900
    6 6000 1000
    7 7000 1100
    I have a dataset of hospitalizations for heart failure with columns and rows as above. I'm trying to fit this linear regression model to the data:
    E (N)=b0 + b1*Year
    Where E(N) is expected number of outcomes, b0 and b1 are intercept and coefficient. How should I account for the standard error of N? Would it be appropriate to use the inverse of standard error as a sample weight in the regression equation?
    as in, regress N c.year [pweight=1/SE]

  • #2
    nilaykumar83:
    it is difficult to advise without further details about your sample; see -help weights- for further clarifications.
    Kind regards,
    Carlo
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      While I agree with Carlo that knowing more about your purposes might lead to more specific advice, as a general principle, using aweight, rather than pweight, and setting the weight to the inverse of the square of the SE is more likely to be appropriate.

      Comment


      • #4
        have you looked at the -vwls- command?

        Comment


        • #5
          I don't see how you get a standard error for a total number of hospitalizations. How is this sum varying within an observation? Are you aggregating somehow? If you are aggregating, I would think about estimating a model on the original disaggregated data.

          I see two different ways to interpret your post. First, you actually have a variance on the measured total of hospitalizations and you use measured hospitalizations as the dv. Second, the error variance in the regression with hospitalizations as the dv varies.

          If the dv is measured with error, that error in theory appears in the regression error term (along with the error in explaining the "true" dv). If the measurement error is random, this normally does not make your parameters biased or inconsistent. While you might improve the estimates by weighting the regression based on this varying measurement error, I don't see why you would rather than weighting based on changes in the variance in the regression error term.

          I suspect my colleagues are assuming that what you've called standard error of N is really the error variance from your regression. If your problem is changes in the error variance from the regression, then the weighted least squares procedures noted above are the way to go. But, the weighting would be based on the error variance from the regression and not just the measurement error in the dv.

          If you are not talking about the error variance from your regression, you might want to take another shot at explaining the problem.

          Comment


          • #6
            I suspect my colleagues are assuming that what you've called standard error of N is really the error variance from your regression.
            Actually, my assumption was that the original poster is dealing with aggregated data and that the original disaggregated results are not available. That kind of situation arises frequently in health care investigations.

            Comment


            • #7
              Hi everyone,

              Thank you for your valuable suggestions. The purpose of my research is to analyze time trends in heart failure hospitalizations in the US i.e. are they increasing, decreasing, seasonal etc. I'm using a survey dataset to calculate an estimate of hospitalizations in a given year (each row is one patient in this dataset and is associated with a sample weight). Because I'm obtaining my cumulative number of hospitalizations from a survey dataset, the estimates have a standard error (to answer Phil's question). I want to know how I can account for this standard error in a regression model where number of hospitalizations is the outcome variable and time is the predictor.
              I greatly appreciate your help with my problem.

              Thanks,
              Nilay
              Last edited by nilaykumar83; 25 Oct 2014, 10:03.

              Comment

              Working...
              X