Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using OGLM to Determine Coefficient Inequality across Multiple Groups

    I am using Stata 16.

    I am trying to determine whether the effect of crime type (either coded as dummies or as a single categorical measure) has an effect on my dichotomous outcome and whether this effect varies between locations (n=5 counties in New York). I have been reading on the subject, including Allison's work, and Williams' work on heterogeneous choice models with OGLM, but I am wondering whether it is appropriate to compare coefficients when there are more than two groups? All of the published examples use a difference between males and females, or otherwise, but I would like to compare across a total of 5 areas.

    In an attempt to do this, I specified an equation with a large number of interaction terms between each crime type (leaving one out as a reference cat) and each of the boroughs (again leaving one out). Below is the code used. The OGLM model provides a variance parameter for each of the included boroughs and estimates for each interaction. I am just curious whether this is the best way to test for equality across more than 2 groups, when I also have a categorical (not ordinal) predictor variable and a dichotomous outcome.

    Thanks for any input.

    Code:
    //Generate Dummies from categorical variable//
    tab offtype2, gen(offtype_)
    
    //Create Interaction Terms for each crimeXborough
    gen weap_bx=offtype_1*Bronx
    gen weap_bk=offtype_1*Brooklyn
    gen weap_qn=offtype_1*Queens
    gen weap_si=offtype_1*Staten
    
    gen sexc_bx=offtype_2*Bronx
    gen sexc_bk=offtype_2*Brooklyn
    gen sexc_qn=offtype_2*Queens
    gen sexc_si=offtype_2*Staten
    
    gen drug_bx=offtype_3*Bronx
    gen drug_bk=offtype_3*Brooklyn
    gen drug_qn=offtype_3*Queens
    gen drug_si=offtype_3*Staten
    
    gen vio_bx=offtype_4*Bronx
    gen vio_bk=offtype_4*Brooklyn
    gen vio_qn=offtype_4*Queens
    gen vio_si=offtype_4*Staten
    
    //Property (offtpe_5) is Baseline
    
    gen dwi_bx=offtype_6*Bronx
    gen dwi_bk=offtype_6*Brooklyn
    gen dwi_qn=offtype_6*Queens
    gen dwi_si=offtype_6*Staten
    
    gen other_bx=offtype_7*Bronx
    gen other_bk=offtype_7*Brooklyn
    gen other_qn=offtype_7*Queens
    gen other_si=offtype_7*Staten
    
    //Heterogeneous Choice Models//
    estimates clear
    oglm detained2 offtype_1 offtype_2 offtype_3 offtype_5 offtype_6 offtype_7 Bronx Brooklyn Queens Staten ///
    weap_bx weap_bk weap_qn weap_si sexc_bx sexc_bk sexc_qn sexc_si vio_bx vio_bk vio_qn vio_si drug_bx drug_bk ///
    drug_qn drug_si dwi_bx dwi_bk dwi_qn dwi_si other_bx other_bk other_qn other_si  ///
    sex age age2 black other priorfel priormisd offsever_2 offsever_3 offsever_5 offsever_6 offsever_7 offsever_8 offsever_9 offsever_10 ///
    arrmonth_2-arrmonth_12 arryear_2-arryear_3, hetero(Bronx Brooklyn Queens Staten) store(oglm1) link(logit)

  • #2
    First off, this syntax seems way too complicated to me. oglm supports factor variable notation. So, assuming Bronx, Brooklyn, etc. are themselves mutually exclusive categories created from, for example, a variable called borough, you could just have something like

    Code:
    oglm detained2 i.offtype i.borough i.offtype#i.borough othervars, het(i.borough)
    I suspect some of your other vars could user factor notation too, e.g. instead of age2 have c.age#c.age, instead of offseverr_x vars have i.offsever

    As far as your main Q, I know of no reason you can only have a binary variable in the hetero equation.

    If this is your beginning model, I suspect you should start much more simply and build up, e.g. don't add all the interactions until a later step.

    I'm partial to oglm (which I wrote) but if you are ok with probit link you could use hetprob or (if you have Stata 16) hetoprobit.

    I would probably also use margins to help make sense of everything. If you aren't familiar with margins (or factor variable notation) see

    https://www3.nd.edu/~rwilliam/stats/Margins01.pdf

    Finally, I'll note that these models can be tough to estimate, especial when the response variable is binary. You'll have to see how it works with a complicated model like yours.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thank you, Professor Williams, for highlighting the inefficiencies in my code and providing and example of how to simplify it. My original version of OGLM would not allow factor notation, but I downloaded again, and the code you provided works great. I will also look into margins to try and make sense of all the estimated effects.

      I have one more question. When it comes to interpreting a significant estimate for LNSIGMA in this case, any significant estimates would indicate a significant difference between the residuals in the denoted category (in my case, a particular borough) and the one that was the base category, is that correct? For example, a significant positive lnsigma for Brooklyn, for example, would indicate the standard deviation of the residuals for Brooklyn is significantly larger than for Manhattan (the base).

      Thanks again for your time and service to the discipline.

      Comment


      • #4
        yes, values and significance levels are relative to the baseline category.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment

        Working...
        X