zip or xtpoisson?

ben earnhart

Join Date: May 2014
Posts: 1027

21 Mar 2016, 19:35

So I have a dilemma: a zip model with clustered standard errors (which takes into account my zeros), or use xtpoisson (which does my multiple levels correctly, but can't take into account the extra zeros).

I have data gathered from conference participants giving feedback on the sessions they attended. Fairly small conferences; with a 50% response rate, I get about 75-150 respondents per conference, with a given person probably attending four to six sessions. We are testing a new questionnaire format that is more smartphone-friendly and follows best practices in general -- currently I have about 500 respondents for each format, a little under 1000 respondents total. My dependent variables are missing data points on quantitative questions and length of an open-ended response. So my data is nested like this:

Conference
Person and Session crossed
response

Would it be reasonable to model it like below for a robustness check that I footnote? The effect of the new questionnaire format is strong enough that it shows up in just looking at the means (and t-tests), logit models, poisson, zip, and xtpoisson. I probably will report the simple differences in means (after all, the audience is more MBA-speak than Stats-geek), but want to be able to footnote that the effects were significant with more appropriate models. But there is no truly appropriate model that I can find. If I had a truly appropriate model, I might actually report its results along with the changes in means.

Any thoughts?

Code:

clear
set more off

input str2 conference byte(person session) str40 comment byte format
"FL" 1 10 ""                                         1
"FL" 1 11 "Dr. Keen was really Keen"                 1
"FL" 1 12 ""                                         1
"FL" 1 13 "fantastic"                                1
"FL" 1 14 ""                                         1
"FL" 2 10 "interesting"                              1
"FL" 2  9 "boring.  Nothing to see here, move along" 1
"FL" 2  6 "repetitive"                               1
"FL" 3 10 ""                                         1
"FL" 3  5 ""                                         1
"FL" 3  6 ""                                         1
"FL" 3  4 ""                                         1
"FL" 3  3 ""                                         1
"FL" 3  1 ""                                         1
"AL" 1 10 ""                                         0
"AL" 1 11 "guacamole!"                               0
"AL" 1 12 ""                                         0
"AL" 1 13 ""                                         0
"AL" 1 14 "food was good, presentation bad"          0
"AL" 2 10 ""                                         0
"AL" 2  9 ""                                         0
"AL" 2  6 ""                                         0
"AL" 2 10 "seen this one before"                     0
"AL" 3  5 ""                                         0
"AL" 3  6 ""                                         0
"AL" 4  4 ""                                         0
"AL" 4  3 ""                                         0
"AL" 4  1 ""                                         0
end

gen commentLength=length(trim(comment))
encode conference, gen(confNum)
gen confPerson=int(confNum*1000+person)
gen confSession=int(confNum*1000+session)
sum

zip commentLength format, vce(cluster confPerson) inflate(format)
zip commentLength format, vce(cluster confSession) inflate(format)

xtset confPerson
xtpoisson commentLength format

xtset confSession
xtpoisson commentLength format

Last edited by ben earnhart; 21 Mar 2016, 19:40.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

21 Mar 2016, 21:01

Well, the fact that you are even considering the -xtpoisson- alternative suggests to me that you are not particularly interested in modeling the zero inflation process per se--it's just a nuisance you want to get around.

In that case, why not-xtnbreg-? Except in the most extreme zero-inflation (whjch your sample data do not suggest are a problem here), this will adequately accommodate the excessive zeroes and it will deal with the multilevel nature of the data.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#3

21 Mar 2016, 21:34

Sure, I'll be happy to run xtnbreg on the real data. The real distribution of how often people make comments is probably a little bit different than the data I made up for the example, but not very different. So xtnbreg is more robust to zero inflation than xtpoisson?

And you're right -- the extra zeros are *mostly* a nuisance. If I had a model I was happy with saying was appropriate, it would be nice to say "here's the impact on the propensity to comment" and "here's the impact on the length, given they decided to comment," but since fancy models don't seem quite appropriate, explaining it with means and percentages might be the best I can do.

Last edited by ben earnhart; 21 Mar 2016, 21:38.
Comment

Announcement

zip or xtpoisson?

Comment

Comment