So I have a dilemma: a zip model with clustered standard errors (which takes into account my zeros), or use xtpoisson (which does my multiple levels correctly, but can't take into account the extra zeros).
I have data gathered from conference participants giving feedback on the sessions they attended. Fairly small conferences; with a 50% response rate, I get about 75-150 respondents per conference, with a given person probably attending four to six sessions. We are testing a new questionnaire format that is more smartphone-friendly and follows best practices in general -- currently I have about 500 respondents for each format, a little under 1000 respondents total. My dependent variables are missing data points on quantitative questions and length of an open-ended response. So my data is nested like this:
Conference
Person and Session crossed
response
Would it be reasonable to model it like below for a robustness check that I footnote? The effect of the new questionnaire format is strong enough that it shows up in just looking at the means (and t-tests), logit models, poisson, zip, and xtpoisson. I probably will report the simple differences in means (after all, the audience is more MBA-speak than Stats-geek), but want to be able to footnote that the effects were significant with more appropriate models. But there is no truly appropriate model that I can find. If I had a truly appropriate model, I might actually report its results along with the changes in means.
Any thoughts?
I have data gathered from conference participants giving feedback on the sessions they attended. Fairly small conferences; with a 50% response rate, I get about 75-150 respondents per conference, with a given person probably attending four to six sessions. We are testing a new questionnaire format that is more smartphone-friendly and follows best practices in general -- currently I have about 500 respondents for each format, a little under 1000 respondents total. My dependent variables are missing data points on quantitative questions and length of an open-ended response. So my data is nested like this:
Conference
Person and Session crossed
response
Would it be reasonable to model it like below for a robustness check that I footnote? The effect of the new questionnaire format is strong enough that it shows up in just looking at the means (and t-tests), logit models, poisson, zip, and xtpoisson. I probably will report the simple differences in means (after all, the audience is more MBA-speak than Stats-geek), but want to be able to footnote that the effects were significant with more appropriate models. But there is no truly appropriate model that I can find. If I had a truly appropriate model, I might actually report its results along with the changes in means.
Any thoughts?
Code:
clear set more off input str2 conference byte(person session) str40 comment byte format "FL" 1 10 "" 1 "FL" 1 11 "Dr. Keen was really Keen" 1 "FL" 1 12 "" 1 "FL" 1 13 "fantastic" 1 "FL" 1 14 "" 1 "FL" 2 10 "interesting" 1 "FL" 2 9 "boring. Nothing to see here, move along" 1 "FL" 2 6 "repetitive" 1 "FL" 3 10 "" 1 "FL" 3 5 "" 1 "FL" 3 6 "" 1 "FL" 3 4 "" 1 "FL" 3 3 "" 1 "FL" 3 1 "" 1 "AL" 1 10 "" 0 "AL" 1 11 "guacamole!" 0 "AL" 1 12 "" 0 "AL" 1 13 "" 0 "AL" 1 14 "food was good, presentation bad" 0 "AL" 2 10 "" 0 "AL" 2 9 "" 0 "AL" 2 6 "" 0 "AL" 2 10 "seen this one before" 0 "AL" 3 5 "" 0 "AL" 3 6 "" 0 "AL" 4 4 "" 0 "AL" 4 3 "" 0 "AL" 4 1 "" 0 end gen commentLength=length(trim(comment)) encode conference, gen(confNum) gen confPerson=int(confNum*1000+person) gen confSession=int(confNum*1000+session) sum zip commentLength format, vce(cluster confPerson) inflate(format) zip commentLength format, vce(cluster confSession) inflate(format) xtset confPerson xtpoisson commentLength format xtset confSession xtpoisson commentLength format
Comment