replicating proc genmod in stata

Jacob Stowell

Join Date: Nov 2014

Posts: 11
#1

replicating proc genmod in stata

13 Nov 2014, 10:24

Dear Stata Community,

I am brand new to this list, and relatively new to Stata, but I was hoping someone would be able to help me with a problem I am having. Specifically, in an effor to become better acquainted with Stata, I have been trying to replicate results from a study I conducted using SAS. I have panel data for three decennial census years (1980-2000) and used Proc Genmod to estimate a fixed effects model. The code I used is below:

proc genmod data=two;
*ods output ParameterEstimates=sys.tvic0;
class tract tm;
model tvic = ac_ratio disadv stable prfemp p_for tm tm*p_for/d=nb dscale offset=lnpop;
run;

*The variable tm is a categorical variable indicating the census year.

Here is the syntax I submitted in Stata:
xtset tract year
xtnbreg tvic ac_ratio prfemp disadv stable p_for, fe exposure(tpop)

*The variable year is the decennial census year.

I did some reading and it appeared that xtnbreg would be the Stata command I should use for this purpose. However, I am unable to replicate the results, either in terms of the coefficient estimates or the standard errors. As such, I began trying different specification options, just in an attempt to see if I could duplicate the findings, which would help me to figure out some of the differences between programs. I tried both the "pa" (with different scale and correlation specifications) and "re" options, knowing the latter is incorrect.

I also came across a discussion of the xtgee command which I tried, specifying the link as log and the family as negative binomial. Still nothing matches. I was able to figure out a way to get xtnbreg and xtgee to produce the same results, which again are different from those produced by SAS. I am a little bit at a loss. If anyone could point out where I may be going wrong, or point me to some literature that could help clarify the differences, that would be much appreciated. Thank you for your time and I look forward to hearing from the community.

Sincerely,
Jake
-- Jacob I. Stowell, Ph.D. Associate Professor School of Criminology and Criminal Justice Northeastern University 360 Huntington Avenue Boston, Massachusetts 02115
Tags: None
Joe Canner

Join Date: Mar 2014

Posts: 580
#2

13 Nov 2014, 13:45

Jacob,

I am not terribly conversant in either PROC GENMOD or xtnbreg, but I can find my way around both Stata and SAS fairly well. Since I don't have your data set, I can't replicate your problem, but using the Stata airacc.dta set that is used as an example in the xtnbreg help gave me a few ideas. My first question actually has to do with your SAS program. Where does your PROC GENMOD statement account for the panel structure of the data? From what I can see, the CLASS statement does not accomplish that. Normally, you would use a REPEATED statement, or similar, to establish this. As it stands, it appears that your PROC GENMOD just analyzes the data set without any accounting for clustering/repeated measures, while the xtnbreg (or xtgee) commands (used, as required, in the context of an xtset), will in fact do that.

Sorry if this is a naive question, just trying to understand better what you are trying to do.

Regards,
Joe
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#3

13 Nov 2014, 13:57

P.S. I took the airacc data set and did the following in SAS:

Code:

data airacc; set a.airacc; lnpmiles=log(pmiles); run; proc genmod; class airline time; model i_cnt=inprog / d=nb offset=lnpmiles; run;

and in Stata:

Code:

webuse airacc xtset, clear nbreg i_cnt inprog, exposure(pmiles)

and got the same answers. In other words, PROC GENMOD as you specified it is the same as nbreg (i.e., disregarding panel structure).

Incidentally, adding the DSCALE option in the PROC GENMOD MODEL statement results in the same coefficients and slightly different standard errors. I'm not sure yet what the equivalent Stata option is.
1 like
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#4

13 Nov 2014, 14:19

P.P.S. This gets pretty close results to PROC GENMOD used as above with the DSCALE option:

Code:

glm i_cnt inprog, family(nb ml) exposure(pmiles) scale(dev) irls
Comment
Jacob Stowell

Join Date: Nov 2014

Posts: 11
#5

13 Nov 2014, 14:40

Hi Joe,

Thank you so much for your reply. I went back to a book I have written by Paul Allison Fixed Effects Regression Methods for Longitudinal Data using SAS (2005) and he does mention adding a "repeated=" command. However, he only mentions it in reference to invoking GEE estimation and random effects models. In one point in his discussion of prog genmod for negative binomial estimation for panel data he uses this model an example:

proc genmod data=patents2;
class t id;
model patent=rd0-rd5 t id / dist=nb scale=0;
run;

He does not include a repeated measure. In a comparison of results from the fixed effects and GEE estimation model (below), he argues that "parameter estimates are roughly similar [between] the fixed effects and GEE models. But unlike the fixed effects method, two of the [variables] approach statistical significance. Intersting the standared errors for the GEE coefficients are generally larger than those for the fixed effects method which is the opposite of what would ordinarily be expected."

proc genmod data=patents2;
class t id;
model patent=rd0-rd5 t id / dist=nb scale=0;
repeated subject=id /type=mdep(4) corrw;
run;

*NB: Allison argues that similar results are also generated using an exchangeable correlation structure. Indeed the results are pretty similar between both models.

I guess I am a bit unclear about the distinction being drawn...It appears that he is differentiating between fixed effects estimation and GEE? I'm sorry if this is a simple question...It is interesting to learn about the different programs, but I just don't quite get how the estimation procedures differ.

Many thanks,
Jake
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#6

13 Nov 2014, 15:13

Jake,

The example from the book has both time and ID as variables in the model, which seems quite different from incorporating them using a panel structure. However, I don't have the book to know what he is trying to illustrate, and this is starting to get out of my league. It's also worth noting that your original example included time in the model but not tract, whereas your Stata model has both time and tract in the xtset but neither one as variables in the model. I suspect this may have something to do with your problem. Note, in addition, that xtset distinguishes between a panel variable (e.g., id or tract) and a time variable, and this distinction could be important in subsequent models. In contrast, the CLASS statement in SAS does not seem to make this distinction (unless REPEATED is used).

I downloaded the Allison example data and will try to replicate the SAS results in Stata (but not today).

Regards,
Joe
Comment
Jacob Stowell

Join Date: Nov 2014

Posts: 11
#7

13 Nov 2014, 16:14

Hi Joe,

Thank you so much...I appreciate your help very much. I will keep working on this as well. It is very kind of you to help out with this...I don't want to be a nuissance.

Much appreciated,
Jake
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#8

17 Nov 2014, 13:22

Jake,

I don't know if this helps you or not, but I was able to reproduce (more or less) in Stata the negative binomial example from Allison's book that you referred to.

I did this in SAS:

Code:

proc genmod data=patents2; class t id; model patent=rd0-rd5 t id / dist=nb; run;

and this in Stata:

Code:

nbreg patent rd_0-rd_5 b5.t b346.id

and got the same answers. Note that the SAS version omits the "SCALE=0" option. I haven't yet figured out how to reproduce that in Stata. Stata's glm allows for setting the scale parameter, but it must be greater than 0.

Note also the "b5.t" and "b356.id" terms in the nbreg statement. With factor variables, SAS defaults to using the last category as the reference, while Stata defaults to using the first category. "b5.t" requests that the last category of t (t=5) be the reference category and "b356.id" requests that the last category of id (id=356) be the reference.

Going back to your original question, in order to reproduce the results from the SAS GENMOD code you presented, I would recommend (1) using nbreg instead of xtnbreg and (2) adding i.tm i.tm##p_for to your model so that you are asking for the same thing. I'm not sure how to reproduce the DSCALE option, but for that perhaps you will need to switch to glm and use the scale(dev) option.

Regards,
Joe
Comment

Announcement

replicating proc genmod in stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment