I want to produce predicted rates (with CIs) by exposure categories, adjusted for other covariates. I used -poisson- then -margins- (in Stata 13.1) but the results from my data are not what I expected. The reference manual (p1153) says that "the marginal mean . . for males is the predicted mean of the dependent variable where every observation is treated as if it represents a male" etc. I can reproduce my problem using the Doll & Hill dataset:
The crude death rates in smokers and non-smokers in the Doll and Hill data are 2.58 and 4.43 /1000 in non-smokers and smokers. The smokers are older, so I would their expect age-adjusted rates to be lower than their crude rates (and the opposite for non-smokers). This is what I see when calculating adjusted rates by hand, but not what margins produces. The rates from margins are double the crude rates and I'm not sure how to use them. I've obviously misunderstood margins - can someone help?
Code:
webuse dollhill3, clear poisson deaths i.smokes agecat, irr exp(pyears) margins smokes, predict(ir) // produces rates of 6.77/1000 in non-smokers, 10.16 in smokers * calculate predicted deaths and rates by hand from model coefficients plus data gen pred_smoker_deaths = exp(_b[1.smokes] + (_b[agecat]*agecat) + _b[_cons] + ln(1)) * pyears // no of deaths in cohort if all smoked gen pred_nsmoker_deaths = exp((_b[agecat]*agecat) + _b[_cons] + ln(1)) * pyears // no of deaths in the cohort if nobody smoked table smokes, c(sum deaths sum pred_smoker_deaths sum pred_nsmoker_deaths sum pyears) row // show observed and predicted deaths preserve // then calculate rates using the predicted deaths collapse (sum) pred_smoker_deaths (sum) pred_nsmoker_deaths (sum) pyears gen pred_rate_smo = pred_smoker_deaths/pyears gen pred_rate_nsmo = pred_nsmoker_deaths/pyears list // gives rates of 2.87/1000 in non-smokers, 4.31 in smokers, NOT same as margins restore
Comment