Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical tests to compare a subpopulation to the overall population (inclusive of that subpopulation)

    While typically we compare groups, sometimes there are instances where people want to compare a mean / frequency of a subpopulation to that of the overall population (inclusive of the subpopulation of interest).

    Others have asked this question before, on this and other places, and often the answer is: it can't be done, or, why would you want to? Just choose a comparator group or compare your subgroup of interest against the population not in that group.

    However, I think there are times when one would want to take this approach. For instance, imagine you have a nationally-representative dataset of individuals with an occupation variable with 25 specific occupations, and you're looking at, for argument's sake, differences in heart disease rates. Choosing a single reference category would feel extremely arbitrary. Particularly when what you really want to know is, how does the heart disease rate for each occupation differ from the nation as a whole? You have zero interest in comparing every occupation against farmers in particular, or lawyers, or whatever.

    And some seem to accomplish this. A recent study published in JAMA, for instance, specifically compared (and calculated p-values) for mean healthcare spending for each racial/ethnic groups to the all-population mean: there was no reference group (I've quoted the relevant text below*). (https://jamanetwork.com/journals/jam...%20individuals.)

    I also found reference to something called an "Analysis of Means" (or ANOM) analysis in a SAS bulletin board, as one way to compare overall population means to a subpopulation mean. (https://communities.sas.com/t5/Stati...at/td-p/210627).

    I'm curious if others have thought about this, have used Analysis of Means approaches or the approach used in the JAMA article, or have suggestions about how one might approach this in Stata.

    I realize that one could just do 25 separate analyses (in the example above) with 25 dummy variables indicating the occupation of interest vs. all other occupations as one potentially way out. In each case, however, the reference group would differ in each analysis, and it would feel less interpretable perhaps.



    * "The coefficient of variation was calculated to quantify variation in spending levels among race and ethnicity groups. Estimates could not be made past 2016 because the underlying Disease Expenditure data do not go past 2016. Reported uncertainty intervals (UIs) were from a percentile bootstrap. These were estimated by bootstrapping the underlying data 1000 times and completing each part of the analysis 1000 independent times. The survey data used to calculate the race and ethnicity–specific spending and volume of care fractions and the underlying Disease Expenditure project data were bootstrapped, using the same methods that incorporated the complex survey design associated with each data set. The estimates reported in this study are the mean of these 1000 estimates, with the 95% UIs estimated as the range from the 2.5th to the 97.5th percentiles of the 1000 estimates. A 2-sided bootstrap P value with an α of .05 was used to calculate which race and ethnicity spending and utilization estimates were different from the all-population mean" from: https://jamanetwork.com/journals/jam...%20individuals.

  • #2
    thank you for the cite to the JAMA article, I will read and may have comments later but I do not find your example convincing

    re: ANOM - this is as you say but not that this is, primarily at least, a graphic procedure used primarily by those who use control charts; many years ago, I tried to get StataCorp to beef up the control chart and quality control part of the software but was told no -

    Comment


    • #3
      Thanks Rich — to be clear, I'm not trying to make a strong case for doing this sort of analysis. I'm open to the argument that it should be avoided! I saw it in the JAMA article and so am very curious what others think.

      Comment

      Working...
      X