Hi All,
I am using Stata15 to run a non-linear regression model (Poisson pseudo maximum likelihood model(ppmlhdfe command)) to examine my research question.
I have a panel dataset of following setting.
- Panel dataset with 84,000 firm-day observations
- I'm doing Poisson pseudo maximum likelihood model with fixed effect regression with ppmlhdfe, trying to predict whether the Event has an influence on my dependent variable # of disclosures.
- My model is PPML model with industry, year, and month fixed effects and has many control variables (18 variables).
In equation, I run the following model with ppmlhdfe command.
# of disclosure = b0+ b1*Event + b3*controls + industry f.e. + year f.e. +month f.e. + u (Here, Event is the main variable that I am focused on)
In a code format:
ppmlhdfe #_of_disclosure Event controls, absorb(industry year month) vce(cluster firm)
For one of my analysis I want to compare whether the main effect shows up among only one "gender."
I can think of 3 different ways to test this.
(suggestion 1) Run the PPML regression separately for male and female subgroups and test whether two coefficients on Event is significantly different. gender = 1 if male and 2 if female
ppmlhdfe #_of_disclosure Event controls if gender ==1, absorb(industry year month) vce(cluster firm)
est store m1
ppmlhdfe #_of_disclosure Event controls if gender ==2, absorb(industry year month) vce(cluster firm)
est store m2
suest m1 m2 test [m1_mean]Event = [m2_mean]Event (suggestion 2) Run the PPML regression with interaction term of Event and gender (gender not fully interacted with all the controls)
ppmlhdfe #_of_disclosure Event gender i.Event#i.gendercontrols, absorb(industry year month) vce(cluster firm)
(suggestion 3) Run the PPML regression with interaction term of Event and gender (gender fully interacted with all the controls)
ppmlhdfe #_of_disclosure Event gender i.Event#i.gendercontrolsi.gender#c.controls, absorb(industry year month) vce(cluster firm)
My first question is whether my suggestions number 2 and 3 is still valid way to test group comparison for non-linear models such as PPML. I know that under linear regression suggestions number 1 and number 3 are supposed to give you the same coefficients. I am unsure whether the same applies to PPML model. For some reasons my results do not give me the same coefficients (maybe something to do with having fixed effects?)
My second question is what is the reason to prefer suggestion number 3 over suggestion number 2? I know that in suggestion number 2, I restrict the coefficients on controls to not vary among different gender and suggestions number 2 is not the same as suggestion number 1. But, since my model include many control variables with high dimensional fixed effects, I am concerned that if I run fully interacted model (suggestion number 3), there are too many parameters to estimate which may be problematic. Would there be a reason to favor suggestion number 2?
My third question is whether there are reasons to favor suggestion number 1 over suggestion number 3?
My final question is what would be the best way to test group comparison effects for non-linear model such as PPML models.
Thank you for reading a long question!
Your advice would be most appreciated!
I am using Stata15 to run a non-linear regression model (Poisson pseudo maximum likelihood model(ppmlhdfe command)) to examine my research question.
I have a panel dataset of following setting.
- Panel dataset with 84,000 firm-day observations
- I'm doing Poisson pseudo maximum likelihood model with fixed effect regression with ppmlhdfe, trying to predict whether the Event has an influence on my dependent variable # of disclosures.
- My model is PPML model with industry, year, and month fixed effects and has many control variables (18 variables).
In equation, I run the following model with ppmlhdfe command.
# of disclosure = b0+ b1*Event + b3*controls + industry f.e. + year f.e. +month f.e. + u (Here, Event is the main variable that I am focused on)
In a code format:
ppmlhdfe #_of_disclosure Event controls, absorb(industry year month) vce(cluster firm)
For one of my analysis I want to compare whether the main effect shows up among only one "gender."
I can think of 3 different ways to test this.
(suggestion 1) Run the PPML regression separately for male and female subgroups and test whether two coefficients on Event is significantly different. gender = 1 if male and 2 if female
ppmlhdfe #_of_disclosure Event controls if gender ==1, absorb(industry year month) vce(cluster firm)
est store m1
ppmlhdfe #_of_disclosure Event controls if gender ==2, absorb(industry year month) vce(cluster firm)
est store m2
suest m1 m2 test [m1_mean]Event = [m2_mean]Event (suggestion 2) Run the PPML regression with interaction term of Event and gender (gender not fully interacted with all the controls)
ppmlhdfe #_of_disclosure Event gender i.Event#i.gendercontrols, absorb(industry year month) vce(cluster firm)
(suggestion 3) Run the PPML regression with interaction term of Event and gender (gender fully interacted with all the controls)
ppmlhdfe #_of_disclosure Event gender i.Event#i.gendercontrolsi.gender#c.controls, absorb(industry year month) vce(cluster firm)
My first question is whether my suggestions number 2 and 3 is still valid way to test group comparison for non-linear models such as PPML. I know that under linear regression suggestions number 1 and number 3 are supposed to give you the same coefficients. I am unsure whether the same applies to PPML model. For some reasons my results do not give me the same coefficients (maybe something to do with having fixed effects?)
My second question is what is the reason to prefer suggestion number 3 over suggestion number 2? I know that in suggestion number 2, I restrict the coefficients on controls to not vary among different gender and suggestions number 2 is not the same as suggestion number 1. But, since my model include many control variables with high dimensional fixed effects, I am concerned that if I run fully interacted model (suggestion number 3), there are too many parameters to estimate which may be problematic. Would there be a reason to favor suggestion number 2?
My third question is whether there are reasons to favor suggestion number 1 over suggestion number 3?
My final question is what would be the best way to test group comparison effects for non-linear model such as PPML models.
Thank you for reading a long question!
Your advice would be most appreciated!
Comment