We are trying to estimate the rate of event X per hour exposed to certain activities (e.g. commuting) by combining statistics from two multi-year cross-sectional surveys, one for the denominator and the other for the numerator. However we don’t really know how to do this and are reaching out for any ideas.
The first question is, is it possible to estimate MEAN(X/Y) and VAR(X/Y), where X indicates cases and Y indicates hours of exposure? The typical ways to estimate ratio (e.g. delta method) may not work here, because X and Y are from two different surveys. We noticed that an old thread (http://www.stata.com/statalist/archi.../msg00723.html) suggests that theoretically it is possible to estimate MEAN( X /Y) even though X and Y are from different surveys. Would doing so be exceedingly complicated (e.g. re-weighting the X, re-weighting Y…etc.)?
So we came up with plan B: what about just estimate MEAN(X)/MEAN(Y), and hence VAR(MEAN(X)/MEAN(Y))? Certainly, by doing so we are not trying to calculate the average of individual rates, but simply the ratio between the total events divided by total exposure hours, a commonly-used method in epidemiological studies in estimating risk at the population level.
Then main question is, under plan B how do we calculate VAR(MEAN(X)/MEAN(Y))? Also, is estimating MEAN(X)/MEAN(Y) as straightforward as we thought, i.e. just substitute MEAN(X) with the sample mean, and do the same thing for MEAN(Y)?
Thank you kindly
Tin-chi Lin
Research Scientist
Center for Injury Epidemiology
Liberty Mutual Research Institute for Safety
71 Frankland Road
Hopkinton, MA 01748
The first question is, is it possible to estimate MEAN(X/Y) and VAR(X/Y), where X indicates cases and Y indicates hours of exposure? The typical ways to estimate ratio (e.g. delta method) may not work here, because X and Y are from two different surveys. We noticed that an old thread (http://www.stata.com/statalist/archi.../msg00723.html) suggests that theoretically it is possible to estimate MEAN( X /Y) even though X and Y are from different surveys. Would doing so be exceedingly complicated (e.g. re-weighting the X, re-weighting Y…etc.)?
So we came up with plan B: what about just estimate MEAN(X)/MEAN(Y), and hence VAR(MEAN(X)/MEAN(Y))? Certainly, by doing so we are not trying to calculate the average of individual rates, but simply the ratio between the total events divided by total exposure hours, a commonly-used method in epidemiological studies in estimating risk at the population level.
Then main question is, under plan B how do we calculate VAR(MEAN(X)/MEAN(Y))? Also, is estimating MEAN(X)/MEAN(Y) as straightforward as we thought, i.e. just substitute MEAN(X) with the sample mean, and do the same thing for MEAN(Y)?
Thank you kindly
Tin-chi Lin
Research Scientist
Center for Injury Epidemiology
Liberty Mutual Research Institute for Safety
71 Frankland Road
Hopkinton, MA 01748
Comment