Dear all,
I aim to predict the annual CT scan counts from 2015 to 2022. My dataset rows are per presentation with year and CT scan count, and many other variables are omitted for simplicity. Below is a summary of the total CT scans and annual population per year.
Initially, I used the code where the year was treated as a factor variable
I did a hand calculation and it matches the above results.
However, I have been asked to collapse the data by time instead of presentation. Each row must now include the annual count of CT scans and population size. Then, to run the negative binomial regression using an offset.
Why do the results of the margins, when an offset is included, do not match the hand calculation and the first approach?
I tried using exposure ( as the code below) instead of offset, but the issue remains the same
Any advice is appreciated
I aim to predict the annual CT scan counts from 2015 to 2022. My dataset rows are per presentation with year and CT scan count, and many other variables are omitted for simplicity. Below is a summary of the total CT scans and annual population per year.
| Year | Counts of CT | Population size |
| 2015 | 6010 | 59188 |
| 2016 | 8760 | 68875 |
| 2017 | 9036 | 71747 |
| 2018 | 10062 | 71373 |
| 2019 | 10614 | 71373 |
| 2020 | 12622 | 72725 |
| 2021 | 13350 | 68828 |
| 2022 | 12259 | 63612 |
Initially, I used the code where the year was treated as a factor variable
Code:
nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) irr allbaselevels
Negative binomial regression Number of obs = 552,366
LR chi2(7) = 2727.85
Dispersion: mean Prob > chi2 = 0.0000
Log likelihood = -243592.72 Pseudo R2 = 0.0056
total_CT_AT IRR Std. err. z P>z [95% conf. interval]
pre_year_cat
2015 1 (base)
2016 1.252569 .0229418 12.30 0.000 1.208402 1.298351
2017 1.240314 .0225643 11.84 0.000 1.196868 1.285337
2018 1.388384 .0248287 18.35 0.000 1.340563 1.43791
2019 1.375061 .0243329 18.00 0.000 1.328187 1.423589
2020 1.709242 .0295652 30.99 0.000 1.652267 1.768183
2021 1.910185 .0328772 37.60 0.000 1.846821 1.975722
2022 1.897908 .0331448 36.69 0.000 1.834045 1.963995
_cons .1015409 .0014213 -163.41 0.000 .098793 .1043651
/lnalpha .558674 .0143598 .5305294 .5868187
alpha 1.748353 .025106 1.699832 1.798259
margins pre_year_cat, expression(1000*predict())
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
pre_year_cat |
2015 | 101.5409 1.421312 71.44 0.000 98.75513 104.3266
2016 | 127.1869 1.50242 84.65 0.000 124.2422 130.1316
2017 | 125.9426 1.463519 86.05 0.000 123.0741 128.811
2018 | 140.9777 1.5691 89.85 0.000 137.9023 144.0531
2019 | 139.6248 1.511656 92.37 0.000 136.662 142.5876
2020 | 173.5579 1.763705 98.41 0.000 170.1011 177.0147
2021 | 193.9618 1.942604 99.85 0.000 190.1543 197.7692
2022 | 192.7152 2.012535 95.76 0.000 188.7707 196.6597
------------------------------------------------------------------------------
I did a hand calculation and it matches the above results.
However, I have been asked to collapse the data by time instead of presentation. Each row must now include the annual count of CT scans and population size. Then, to run the negative binomial regression using an offset.
Code:
nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) offset(log_pre1) irr
Negative binomial regression Number of obs = 8
LR chi2(6) = 56.99
Dispersion: mean Prob > chi2 = 0.0000
Log likelihood = -44.217095 Pseudo R2 = 0.3919
total_CT_AT IRR Std. err. z P>z [95% conf. interval]
pre_year_cat
2016 1.252568 .0209798 13.44 0.000 1.212116 1.29437
2017 1.240314 .0206451 12.94 0.000 1.200503 1.281444
2018 1.388383 .0226342 20.13 0.000 1.344722 1.433461
2019 1.37506 .0221979 19.73 0.000 1.332234 1.419263
2020 1.709241 .0267875 34.20 0.000 1.657537 1.762559
2021 1.910184 .0296722 41.66 0.000 1.852904 1.969234
2022 1.897908 .029886 40.69 0.000 1.840227 1.957397
_cons .1015409 .0013098 -177.32 0.000 .0990059 .1041408
log_pre1 1 (offset)
/lnalpha -20.24877 . . .
alpha 1.61e-09 . . .
margins pre_year_cat, expression(1000*predict())
Expression: 1000*predict()
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
pre_year_cat |
2015 | 7010967 90435.87 77.52 0.000 6833716 7188218
2016 | 8781715 93826.89 93.59 0.000 8597818 8965613
2017 | 8695798 91478.98 95.06 0.000 8516502 8875093
2018 | 9733907 97038.72 100.31 0.000 9543715 9924100
2019 | 9640501 93575.05 103.02 0.000 9457097 9823904
2020 | 1.20e+07 106663.8 112.35 0.000 1.18e+07 1.22e+07
2021 | 1.34e+07 115907.7 115.54 0.000 1.32e+07 1.36e+07
2022 | 1.33e+07 120178.2 110.72 0.000 1.31e+07 1.35e+07
------------------------------------------------------------------------------
I tried using exposure ( as the code below) instead of offset, but the issue remains the same
Code:
nbreg total_CT_AT ib(first). pre_year_cat, dispersion(mean) exposure(pre1) irr
Negative binomial regression Number of obs = 8
LR chi2(6) = 56.99
Dispersion: mean Prob > chi2 = 0.0000
Log likelihood = -44.217095 Pseudo R2 = 0.3919
total_CT_AT IRR Std. err. z P>z [95% conf. interval]
pre_year_cat
2015 1 (base)
2016 1.252569 .0209799 13.45 0.000 1.212117 1.294371
2017 1.240314 .0206451 12.94 0.000 1.200503 1.281445
2018 1.388384 .0226342 20.13 0.000 1.344723 1.433462
2019 1.375061 .0221979 19.73 0.000 1.332235 1.419263
2020 1.709242 .0267875 34.20 0.000 1.657538 1.76256
2021 1.910184 .0296722 41.66 0.000 1.852904 1.969235
2022 1.897908 .029886 40.69 0.000 1.840227 1.957397
_cons .1015409 .0013098 -177.32 0.000 .0990059 .1041407
ln(pre1) 1 (exposure)
/lnalpha -20.24547 . . .
margins pre_year_cat, expression(1000*predict())
Expression: 1000*predict()
Delta-method
Margin std. err. z P>z [95% conf. interval]
pre_year_cat
2015 7010964 90435.83 77.52 0.000 6833713 7188215
2016 8781717 93826.91 93.59 0.000 8597820 8965615
2017 8695798 91478.98 95.06 0.000 8516502 8875093
2018 9733910 97038.74 100.31 0.000 9543717 9924102
2019 9640501 93575.05 103.02 0.000 9457097 9823905
2020 1.20e+07 106663.9 112.35 0.000 1.18e+07 1.22e+07
2021 1.34e+07 115907.7 115.54 0.000 1.32e+07 1.36e+07
2022 1.33e+07 120178.1 110.72 0.000 1.31e+07 1.35e+07
