Using OGLM to Determine Coefficient Inequality across Multiple Groups

Kevin Wolff

Join Date: Aug 2018
Posts: 10

Using OGLM to Determine Coefficient Inequality across Multiple Groups

21 Jan 2020, 07:47

I am using Stata 16.

I am trying to determine whether the effect of crime type (either coded as dummies or as a single categorical measure) has an effect on my dichotomous outcome and whether this effect varies between locations (n=5 counties in New York). I have been reading on the subject, including Allison's work, and Williams' work on heterogeneous choice models with OGLM, but I am wondering whether it is appropriate to compare coefficients when there are more than two groups? All of the published examples use a difference between males and females, or otherwise, but I would like to compare across a total of 5 areas.

In an attempt to do this, I specified an equation with a large number of interaction terms between each crime type (leaving one out as a reference cat) and each of the boroughs (again leaving one out). Below is the code used. The OGLM model provides a variance parameter for each of the included boroughs and estimates for each interaction. I am just curious whether this is the best way to test for equality across more than 2 groups, when I also have a categorical (not ordinal) predictor variable and a dichotomous outcome.

Thanks for any input.

Code:

//Generate Dummies from categorical variable//
tab offtype2, gen(offtype_)

//Create Interaction Terms for each crimeXborough
gen weap_bx=offtype_1*Bronx
gen weap_bk=offtype_1*Brooklyn
gen weap_qn=offtype_1*Queens
gen weap_si=offtype_1*Staten

gen sexc_bx=offtype_2*Bronx
gen sexc_bk=offtype_2*Brooklyn
gen sexc_qn=offtype_2*Queens
gen sexc_si=offtype_2*Staten

gen drug_bx=offtype_3*Bronx
gen drug_bk=offtype_3*Brooklyn
gen drug_qn=offtype_3*Queens
gen drug_si=offtype_3*Staten

gen vio_bx=offtype_4*Bronx
gen vio_bk=offtype_4*Brooklyn
gen vio_qn=offtype_4*Queens
gen vio_si=offtype_4*Staten

//Property (offtpe_5) is Baseline

gen dwi_bx=offtype_6*Bronx
gen dwi_bk=offtype_6*Brooklyn
gen dwi_qn=offtype_6*Queens
gen dwi_si=offtype_6*Staten

gen other_bx=offtype_7*Bronx
gen other_bk=offtype_7*Brooklyn
gen other_qn=offtype_7*Queens
gen other_si=offtype_7*Staten

//Heterogeneous Choice Models//
estimates clear
oglm detained2 offtype_1 offtype_2 offtype_3 offtype_5 offtype_6 offtype_7 Bronx Brooklyn Queens Staten ///
weap_bx weap_bk weap_qn weap_si sexc_bx sexc_bk sexc_qn sexc_si vio_bx vio_bk vio_qn vio_si drug_bx drug_bk ///
drug_qn drug_si dwi_bx dwi_bk dwi_qn dwi_si other_bx other_bk other_qn other_si  ///
sex age age2 black other priorfel priormisd offsever_2 offsever_3 offsever_5 offsever_6 offsever_7 offsever_8 offsever_9 offsever_10 ///
arrmonth_2-arrmonth_12 arryear_2-arryear_3, hetero(Bronx Brooklyn Queens Staten) store(oglm1) link(logit)

Tags: None

Richard Williams

Join Date: Apr 2014

Posts: 5025
#2

21 Jan 2020, 12:27

First off, this syntax seems way too complicated to me. oglm supports factor variable notation. So, assuming Bronx, Brooklyn, etc. are themselves mutually exclusive categories created from, for example, a variable called borough, you could just have something like

Code:

oglm detained2 i.offtype i.borough i.offtype#i.borough othervars, het(i.borough)

I suspect some of your other vars could user factor notation too, e.g. instead of age2 have c.age#c.age, instead of offseverr_x vars have i.offsever

As far as your main Q, I know of no reason you can only have a binary variable in the hetero equation.

If this is your beginning model, I suspect you should start much more simply and build up, e.g. don't add all the interactions until a later step.

I'm partial to oglm (which I wrote) but if you are ok with probit link you could use hetprob or (if you have Stata 16) hetoprobit.

I would probably also use margins to help make sense of everything. If you aren't familiar with margins (or factor variable notation) see

https://www3.nd.edu/~rwilliam/stats/Margins01.pdf

Finally, I'll note that these models can be tough to estimate, especial when the response variable is binary. You'll have to see how it works with a complicated model like yours.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Kevin Wolff

Join Date: Aug 2018

Posts: 10
#3

21 Jan 2020, 13:47

Thank you, Professor Williams, for highlighting the inefficiencies in my code and providing and example of how to simplify it. My original version of OGLM would not allow factor notation, but I downloaded again, and the code you provided works great. I will also look into margins to try and make sense of all the estimated effects.

I have one more question. When it comes to interpreting a significant estimate for LNSIGMA in this case, any significant estimates would indicate a significant difference between the residuals in the denoted category (in my case, a particular borough) and the one that was the base category, is that correct? For example, a significant positive lnsigma for Brooklyn, for example, would indicate the standard deviation of the residuals for Brooklyn is significantly larger than for Manhattan (the base).

Thanks again for your time and service to the discipline.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#4

21 Jan 2020, 15:39

yes, values and significance levels are relative to the baseline category.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

Announcement

Using OGLM to Determine Coefficient Inequality across Multiple Groups

Comment

Comment

Comment