xtpoisson versus xtreg.

Sumedha Gupta

Join Date: May 2016
Posts: 289

xtpoisson versus xtreg.

22 Aug 2017, 09:44

Dear All,
I have data on incidence of poisonings and I want to estimate if changes in state policy impacted incidence of poisoning. The full data is of poisonings only, so I don't observe individuals in my data who may be misusing drugs but didn't experience an incidence of poisoning. I have data for 50 US states, for 9 half years as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 state float(halfyear policy) long(caseid misuse suicide illicit narc noeffect death)
"AK" 1 0   42  15  15  14   29   6  0
"AK" 2 0   69  28  22  21   52   7  0
"AK" 3 0   51  14  22   9   43  13  0
"AK" 4 0   53  20  19  19   35   7  1
"AK" 5 0   53  17  21  17   37   7  1
"AK" 6 0   50  18  20  13   38   7  0
"AK" 7 0   38  14  13   7   32   6  0
"AK" 8 0   61  21  20  17   45   9  0
"AK" 9 0   44  14  13  11   35   8  0
"AL" 1 0  423 123 174 149  297  76  4
"AL" 2 0  428 143 151 153  292  69  5
"AL" 3 0  427 126 172 154  295  68  2
"AL" 4 0  408 134 169 135  300  81  2
"AL" 5 0  385 121 139 122  288  76  3
"AL" 6 0  353 110 134  83  281  53  1
"AL" 7 0  327 137  90  77  257  26  0
"AL" 8 0  320 116  90  73  257  49  0
"AL" 9 0  251  92  75  68  190  35  3
"AR" 1 0  145  47  49  25  120  28  0
"AR" 2 0  180  65  38  40  144  30  2
"AR" 3 0  127  47  42  28  100  25  1
"AR" 4 0  159  49  46  38  125  30  0
"AR" 5 0  132  40  45  25  109  27  0
"AR" 6 0  141  36  45  28  119  37  0
"AR" 7 0  157  33  53  28  135  37  0
"AR" 8 0  173  46  62  27  148  42  1
"AR" 9 0  193  48  80  36  160  37  0
"AZ" 1 0  718 240 213 239  499  88 11
"AZ" 2 0  741 255 238 238  524  86  2
"AZ" 3 0  741 270 217 250  530  76 12
"AZ" 4 0  678 218 189 179  519 108  6
"AZ" 5 0  645 232 154 173  488  83 10
"AZ" 6 0  642 211 159 210  448  74  8
"AZ" 7 0  626 210 184 234  412  68 13
"AZ" 8 0  658 211 191 207  476  88  6
"AZ" 9 0  643 240 177 218  449  82  2
"CA" 1 0 1795 621 603 625 1211 197  8
"CA" 2 0 1902 654 684 669 1284 185  4
"CA" 3 0 1866 630 655 646 1272 192 13
"CA" 4 0 1735 563 624 529 1247 194 11
"CA" 5 0 1762 566 620 566 1245 203  9
"CA" 6 0 1824 532 661 584 1277 209 12
"CA" 7 0 1865 597 652 616 1286 186  9
"CA" 8 0 1825 556 681 596 1260 187 12
"CA" 9 0 1727 562 609 615 1149 178  9
"CO" 1 0  287  95  75  86  206  69  1
"CO" 2 0  307  90  79 112  200  69  0
"CO" 3 0  292 104  69 112  188  66  3
"CO" 4 0  302  84  85  86  223  69  2
"CO" 5 0  308  92  79 123  192  77  3
"CO" 6 0  323  91  84 119  208  74  3
"CO" 7 0  361  87  89 155  211  66  5
"CO" 8 0  338 102  85 141  205  77  4
"CO" 9 0  360 116  87 175  193  78  1
"CT" 1 0  182  65  72  71  114  31  3
"CT" 2 0  186  79  68  81  110  24  1
"CT" 3 0  198  75  74  92  116  25  2
"CT" 4 0  208  75  90  89  130  20  3
"CT" 5 0  196  73  78  88  117  20  2
"CT" 6 0  243  98  98 125  132  24  4
"CT" 7 0  238 108  84 124  130  26  7
"CT" 8 0  227 102  70 111  124  26  2
"CT" 9 0  244 100  98 122  142  24  3
"DC" 1 0   57  22  18  27   32   8  2
"DC" 2 0   63  25  16  33   31  10  2
"DC" 3 0   45  22  11  22   23  11  0
"DC" 4 0   62  19  31  33   29  12  3
"DC" 5 0   86  31  40  49   40  17  1
"DC" 6 0   76  26  39  49   31   9  2
"DC" 7 0   80  27  33  47   38  10  4
"DC" 8 0   80  36  31  50   31  17  0
"DC" 9 0   70  28  26  47   27   9  0
"DE" 1 1   48  14  14  13   37  13  0
"DE" 2 1   54  18  19  19   36  10  0
"DE" 3 1   54  14  18  13   42   8  0
"DE" 4 1   57  27  18  12   46   2  0
"DE" 5 1   66  25  16  19   49  15  2
"DE" 6 1   48  21  13  14   34   5  2
"DE" 7 1   62  20  21  24   39   9  1
"DE" 8 1   77  29  26  31   50  12  3
"DE" 9 1   60  18  28  23   38   8  0
"FL" 1 0 1077 360 360 292  809 175  9
"FL" 2 0 1160 410 464 398  798 167  7
"FL" 3 0  987 325 359 285  730 145  9
"FL" 4 0 1016 342 357 282  759 186  8
"FL" 5 0 1130 380 434 385  790 168  5
"FL" 6 0 1195 424 483 422  813 187  6
"FL" 7 0 1178 403 431 392  821 196  5
"FL" 8 0 1073 331 452 356  746 183  9
"FL" 9 0 1111 402 433 431  718 185 13
end

The policy variable is 1 in a state that implemented the policy in the halfyear post policy implementation (basically an interaction term). I am debating between whether to use xtreg or xtpoisson for the diff-in-diff type of analysis. The distribution of the outcome variables I am studying is shown in the attached (Apologies for the attachment but I wasn't sure how to show the distributions using dataex). So outcomes are all non-negative, not too many excess 0s and the counts can be pretty large (exception being death). My measure of 'exposure' to poisoning from a specific drug type (eg. narcotic) is total number of individuals who experienced poisoning.

Given the distributions of the outcomes and the nature of the data, would xtreg be appropriate or xtpoisson? If xtreg would be better, how do I account for differences in sizes of states and hence differences in number of individuals being 'exposed' to poisoning. I am using the term 'exposure' in line with the option given in the xtpoisson.

Many thanks for any input.
Best,
Sumedha.

Attached Files

DistributionHistogramsOutcomes2.pdf (52.3 KB, 1 view)

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#2

22 Aug 2017, 11:43

I think that these distributions all exhibit a considerable degree of zero inflation. Moreover, given the large means for all of the variables other than death, the distributions, even ignoring the zero-inflation, are simply the wrong shape for Poisson. I would look into -xtnbreg- here, which is more flexible and usually works out OK even when there is zero-inflation.

Now, if you want to use -xtreg-, I would use outcome/population as the dependent variable, rather than outcome. You may find that those distributions look rather different from what you have seen without normalizing by population denominator. In any case, that would give you a degree of adjustment for exposure.

Finally, I want to point out that, at least in your example data, your policy variable is not as described: it is constant for all halfyears within any given state. It seems to describe the "treatment vs control group" part of a DID model, and it does not appear to be the interaction of anything with anything else. (In the context of your entire data set, this observation might not hold up.)
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#3

22 Aug 2017, 14:18

Prof. Schechter,

Thank you for your response. My understanding of xtnbreg from earlier Statalist threads is that it is not really 'fixed effect' and tends to be not consistent? So I am leaning towards the more robust xtreg...

Considering exposure, as you said I can use population of the state in a given half year as the denominator. Please find attached distribution graphs with population as the denominator "...._bypop.pdf". Indeed the distributions look more bell shaped now. My only concern with using the population as the denominator is that not everybody in the state is misusing drugs, and therefore at risk of poisoning. Does that somehow mess up the interpretation?

Could the total number of poisoning incidences be a more appropriate normalizing denominator? So rate of poisoning on narcotics/ total number of poisonings? In this case the denominator includes everybody who has experienced poisoning, so definitely 'misusers' of some sort. Please find attached a distribution graph of this also "..._bycaseid.pdf". My concern about this is that change in policy may change the numerator and denominator in this case... making it hard to interpret the change in the ratio as an effect of the policy.

Your observation about the policy variable is BTW correct... it is indeed due to dataex picking the first 90 observations in which there just happens to be no time variation in the policy variable.
Attached Files

DistributionHistogramsOutcomes_bycaseid.pdf (52.2 KB, 1 view)

DistributionHistogramsOutcomes_bypop.pdf (52.0 KB, 1 view)
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#4

22 Aug 2017, 17:51

Not sure if I actually asked my question in the previous post, but based on the histograms above would the xtreg seem like a suitable choice for the diff-in-diff analysis? Also, would the population of the state be a suitable denominator or would total number of case of poisoning be the suitable denominator? As I mentioned in the earlier post.. I am wondering which would be a better approach conceptually as well as based on the distributions presented in the histogram above.

Thank you again Prof. Schechter for your always valuable advise.
Sincerely,
Sumedha.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#5

22 Aug 2017, 18:50

So from the point of view of the distribution of the outcome variables, both denominators give distributions that look reasonable for use with -xtreg-. The choice between them boils down to which is the most conceptually correct (or, one might say in this case, the least conceptually incorrect) variable. (This would be the dominant concern even if there were distributional problems--one would find other ways to deal with them. But we are fortunate that we don't have to contend with that problem anyway here.)

It is clear from what you write that the ideal denominator would be the number of people misusing drugs in the state, but that number is not available. Actually it is really beyond not available, it is, for practical purposes, not observable. So some proxy must be used. If you use the total number of cases as the denominator, you are underestimating the denominator and, as you astutely noted yourself, the denominator itself may be altered by the policy change, so that interpreting the results becomes slippery. The ratio of particular types of cases to total cases is referred to in epidemiology as a proportional incidence rate. They are little used because they are generally of little interest and, again, because the denominator is as likely to be influenced by interventions as the numerators.

If you use the state population, you are overestimating the denominator. But at least this denominator is exogenous (presumably). Moreover.the ratios are actually population incidence rates, which are usually considered one of the most important indicators of what is happening with a disease. So I would go with the state population as the denominator, unless you can come up with a strong case for doing otherwise.

In discussing your results, you should point out that what you were really most interested in were ratios with number of drug misusers as the denominator, but that it was not possible to obtain the necessary denominator. I doubt anyone will be particularly critical about that.
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#6

22 Aug 2017, 18:55

Thank you so much Prof. Schechter. This is very, very useful. Thank you again.
Comment

Announcement

xtpoisson versus xtreg.

Comment

Comment

Comment

Comment

Comment