Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Re: type of weight for regression analysis

    Hi All,

    I'm new to using weights in Stata, and want to ensure I'm using the correct type.

    In my analysis, I'm assessing how variables are associated with rates of HIV testing coverage proportions at different health facilities. What I'd like to do is give greater emphasis to larger facilities, as the dataset has everything from large hospitals with >10,000 patient visits per month, to small remote health facilities with only a handful.

    From reviewing prior posts, I think I should use analytic weighting, set to the variable containing the attendance figure for each facility. For example, if I were to regress testing coverage and number of hiv counselors: regress hivtestpct counselors [aw=attendance]

    I originally chose frequency weighting, but it quickly became clear that this was falsely increasing the number of observations I actually have to work with.

    Thanks!

    Robert

  • #2
    Hello Robert,

    Welcome to the Stata Forum / Statalist!

    Putting aside the issues on the pros of using weights (apart from the survey data analysis and few other situations), if I understood your question correctly, I believe a regression with clusters could do the trick for you.

    I mean it in rather general terms, because there is much information on the design still lacking.

    By the way, if your dependent variable is a proportion, as it seems to be, there is a bunch of interesting threads on this Forum which I recommend you take a look at. They approach the most appropriate way to deal with models whose outcome is a proportion.

    Hopefully that helps.

    Marcos
    Last edited by Marcos Almeida; 25 Jan 2016, 08:18.
    Best regards,

    Marcos

    Comment


    • #3
      Hi Marcos,

      Thank you! Your response was very helpful. I've dug a bit deeper, and will instead use a GLM model given that the dependent variable is a proportion.

      Now for the missing details behind the design (apologies)...

      Basically this is a crude thought experiment to assess whether the number of HIV counselors is correlated with the proportion of facility attendees with known HIV status. Each observation in my small (~100 obs) dataset represents an individual facility. The proportion of facility attendees with known HIV status was calculated by dividing number with known status by total number of attendees.

      When I first tried to fit models to this data, it was obvious that several sites with extremely low attendance (<10 per month) and poor rates of coverage were heavily affecting the model. It occurred to me that large sites with more attendance should be given more weight in the model. I hope that makes sense.

      I think clustering would make sense if each observation represented an individual within a facility with a binary variable describing status known vs unknown and a facility_id variable. However as it stands these are summary statistics for facilities.

      I suppose I could expand the dataset into a format where the smallest unit of observation is an individual, however I feel like there should be some way to do this in its current form, and assumed that weighting would provide the answer.

      Best,

      Robert

      Comment


      • #4
        Robert:
        instead of weighting, you may probably plug in among the predictors a categorical variable representing the "size" on the healthcare facilities (e.g.: number of beds; catchment area inhabitants).
        Besides, if your depvar is a time-or volume-related proportion (rate), you may want to consider -poisson-.
        Eventually, you may want to consider -expand-ing your dataset, in order to create individual-level, instead of aggregated observations.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X