Problem using interaction terms with zinb to test for statistically significant difference

Neil Meredith

Join Date: Jul 2015

Posts: 12
#1

Problem using interaction terms with zinb to test for statistically significant difference

14 Jan 2019, 14:01

I am using Stata 15.1 with zinb. My dependent variable is number of times a respondent used a tanning bed in the past year (tanning). I zinb tanning on my variable of interest, log income, to obtain an income elasticity estimate. Other regressors include age, educational attainment, race, ethnicity, health status, employment status marital status, family size, time dummies, and region dummies. Specifically, my estimation commands are:

local xlist1 "log_income age doctoral professional masters bachelors associates somecollege highschool black white hispanic employed unemployed hourswrk married livingwpartner widowed divorcedseparated health famsize"

zinb tanning `xlist1' i.region i.year [pw=sampweight], vce(robust) inflate(`xlist1' i.region i.year)

Gender is another variable of interest. I am specifically wanting to know if there is a statistically significant difference between the income elasticity for men and the income elasticity for women. When I attempt to interact a gender dummy, female, with log_income and run zinb again, zinb will not converge. Specifically, I run the following:

gen log_income_female=log_income*female;

local xlist2 "log_income log_income_female age doctoral professional masters bachelors associates somecollege highschool black white hispanic employed unemployed hourswrk married livingwpartner widowed divorcedseparated health famsize"

zinb tanning`xlist2' i.region i.year [pw=sampweight], vce(robust) inflate(`xlist2' i.region i.year)

I suspect that non-convergence is due to perfect prediction of the logit estimation run in the inflate portion of zinb. My question is is there anything I do to get around the non-convergence and test whether log_income_female is statistically significant?

Please note I that have evaluated zinb vs. nbreg in an earlier part of my analysis and comparison of AIC and BIC indicates that zinb is what I should use.

Last edited by Neil Meredith; 14 Jan 2019, 14:04.
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

14 Jan 2019, 17:25

Neil,

I'm not sure if this is the cause, but you are manually creating a bunch of interaction terms. I see this a lot, but Stata has some factor variable syntax that automatically generates interaction terms or levels of categorical variables for you. This saves you from manually generating terms and possibly losing track of what you generated. I see a bunch of dummies for categorical variables where I'm not sure if you omitted a base category. For example, you could have typed

Code:

zinb tanning c.log_income##i.female age i.education i.race i.employed hourswrk i.marital_status, vce(robust) inflate(c.log_income##i.female age i.education i.race i.employed hourswrk i.marital_status i.region i.year)

When I first tried manually generating dummies and I forgot to exclude a base category, my program immediately failed to run and gave me some error message about collinearity (I think). It could be that your program failed to automatically detect the issue, as you have more variables. Why don't you just use factor variable syntax first and see if that solves it. Note that

Code:

c.log_income##i.female

will include main effects for both variables (treating log income as continuous and female as categorical), and their interaction term.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Neil Meredith

Join Date: Jul 2015

Posts: 12
#3

15 Jan 2019, 10:16

Weiwen, thank you for your reply. I'm sure I omitted a base category and that is not the problem. I'm aware that there is more I could be doing with factor variables and that it would make estimation more efficient and less prone to errors. However, to my knowledge, it is easier to automate tables after running estimates using a command such as outreg with a varlabels option if you manually generate terms (please let me know if I'm wrong about this as I've not attempted to use factor variables fully with outreg and other post estimation commands like it).
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

15 Jan 2019, 11:19

Neil,

I'm not familiar with outreg, but I know that estout has no problem outputting factor variables.

Going back to your code, your first list of covariates (xlist1) doesn't appear to have gender. Perhaps that was a typo, but if you haven't done so, I'd see if your model converges with gender (but without the gender*log income interaction) in the list.

You say your model didn't converge. Chances are your iteration log is going up to an asymptote, but it keeps refusing to converge with the "not concave" alert. You could use the iterate option to stop Stata from estimating at a specific iteration number, and then inspect all your coefficients. Any coefficients that look much too high or too low, or that have missing standard errors, are suspect.

You could check those coefficients for any data errors. Assuming no data errors, you might debate if you should remove them from either of the models.

I should mention that in latent class analysis, if the class proportion of one indicator is 1 or 0, it can cause convergence issues. People can and do constrain the logit intercept accordingly in this case. I'm not sure how acceptable that would be to your audience, but you alluded to the fact that this might be the issue with the excess zero portion of your estimation. The exercise above should let you see if that's happening. I'm not precisely sure what to do about it. If you find the constraint option acceptable, I can show you how to run your code in gsem. If not, the only alternative I can think of is Bayesian estimation, and I have no idea how to set up a zero inflated model in Bayesian syntax.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Abolfazl Aemi

Join Date: Feb 2019

Posts: 1
#5

25 Feb 2019, 11:07

how can i generate this table?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

25 Feb 2019, 14:22

Originally posted by Abolfazl Aemi View Post

how can i generate this table?...

This is an unrelated question. Better to post in a different thread. But, do note that you need to give us enough data to answer the question, because it's not clear what your data are. Also, if this is a homework question, forum policy discourages homework problems, as it's better to ask your teaching assistants or professor.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Problem using interaction terms with zinb to test for statistically significant difference

Comment

Comment

Comment

Comment

Comment