xtmelogit variance-covariance structure for the random effects

Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#1

xtmelogit variance-covariance structure for the random effects

10 Oct 2017, 15:05

Hello,
I run a multilevel binary mixed-effects model:

xtmelogit RECIDIVATE FOREIGN HISPANIC_PERCENT HISPANIC_PERCENT*FOREIGN INCOME_CAPITA , || ZIP: FOREIGN, variance

I try to examine if there is a cross-level interaction between Percent Hispanics (HISPANIC_PERCENT) in a given Zip Code Area (ZIP) and Foreign Born Status (FOREIGN). For this I included interaction HISPANIC_PERCENT*FOREIGN. That is, I want to see if the effects of FOREIGN randomly vary across the ZIP code areas that have different Percent of Hispanics (HISPANIC_PERCENT).
As you can see, FOREIGN is specified in this model as random effect.

My question is what is the most suitable type of variance-covariance structure for the random effects? Independent, unstructured, identity, or exchangable? I read about it a lot, but still have hard time deciding which one is the best for my purpose of the study. The findings differ depending of the type I selected.

Thank you in advance.
Best,
Sylwia
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#2

10 Oct 2017, 17:13

Before answering the question you asked, this caught my eye as a potential problem:

That is, I want to see if the effects of FOREIGN randomly vary across the ZIP code areas that have different Percent of Hispanics (HISPANIC_PERCENT).

Fair enough, but that isn't what your code does. First of all, the "code" you show will only produce a syntax error because the * in your "interaction term" is not legal syntax. Also, the comma before || will provoke a syntax error, too. For that matter, if you're using current Stata, or even recent Stata, the command is now called -melogit-. Also, in modern Stata, you do not need to specify the -variance- option as that is now the default. But putting those issues aside, your model is not structured to do what you say you want. I assume that HISPANIC_PERCENT is a zip-code level variable, and FOREIGN is a person-level variable. Then the term c.HISPANIC_PERCENT#i.FOREIGN will, by itself establish the cross-level interaction between the zip code's percent hispanic population and the person's foreign born status. It is not necessary to also have FOREIGN listed in the ||ZIP: part of the model.

While listing FOREIGN in the ZIP: part of the model is perfectly legal, it may or may not be appropriate for your purposes, and at least from what you describe in your post, it sounds like it is not appropriate. What ZIP: FOREIGN does is introduce random slopes for FOREIGN. That means that in your model there is a different baseline effect of FOREIGN in each zipcode. It is more or less equivalent to having a ZIPCODE#FOREIGN interaction added to your model. (No, it's not statistically equivalent, but you can think of it this way.) If that's what you want, it's fine, but nothing you have described about your problem suggests this is needed or wanted.

Remember this important principle about Stata multilevel models: the level within the command where a variable appears has nothing at all to do with the level at which that variable is defined. It appears at any level where its effect is supposed to vary across values of the variable that defines the level. (Corollary: all effects should appear in the fixed-effects level of the model, regardless of whether they appear in higher levels as well.)

So, anyway, I would revise your model, using full factor variable notation, as:

Code:

melogit RECIDIVATE i.FOREIGN##c.HISPANIC_PERCENT INCOME_CAPITA || ZIP:

Turning now to the proper specification of covariance, which is what you asked about. The unstructured covariance makes no assumptions about the relationships among the random effects at the level concerned. While this may sound like a great virtue, it comes at a heavy price: the number of parameters that must be estimated grows as the square of the number of observations per group. In addition to making estimation take a (potentially very) long time, it can actually overwhelm the available degrees of freedom in the data and leave you with no results at all.

The exchangeable covariance structure is ideal when the level variable is the unit of observation and the lowest level of the model consists of repeated observations on these levels, i.e. a repeated measures study. You don't actually say what the design for your study was, but my guess from just the names of the variables is that this is not a repeated measures study on zip codes, but rather a study of individuals independently sampled within zip codes.

If that is correct, then the independent covariance structure is appropriate.
Comment
Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#3

10 Oct 2017, 18:20

Dear Clyde,

Thank you so much for your very quick reply. Yes, you are perfectly right - they were errors in my command. I included "*" in a hope that I will make it easier to understand that it is an interaction. I was wrong.

You are right that FOREIGN is an person- level variable and HISPANIC_PERCENT is a zip-code level variable. In the model that you provided, should I also include "HISPANIC _PERCENT and "FOREIGN"?
Or is "i.FOREIGN##c.HISPANIC_PERCENT" enough?

I have also included several other person-level (RECIDIVATE) and zip-code level (INCOME_CAPITA) variables - they are all included before || ZIP: Am I correct in that? I am very new to that, thus I would very thankful for your clarification.
melogit RECIDIVATE i.FOREIGN##c.HISPANIC_PERCENT FOREIGN HISPANIC_PERCENT INCOME_CAPITA || ZIP: As for the type of covariance, it is a study of foreign people held in jail that were identified as residents of different zip code areas. Thus, it is not a repeated measures study (not repeated measures of the same individuals), but many foreign people studied here may actually reside in the same zip-code area. Zip-code level variables such as HISPANIC_PERCENT or INCOME_CAPITA are thus repeated because they are the same for individuals that lived in that zip-code area. My understanding is that we are on the same page here. Would you still recommend independent covariance structure over unstructured? The numbers of groups is quite high = 581. However, you are right that unstructured covariance leave nonsignificant results. Meanwhile, independent structure yields some sig. results. Would you please clarify when do we actually use unstructured type of covariance? Mixed-effects logistic regression Number of obs = 36,350 Group variable: ZIP Number of groups = 581 Thank you so much for all your help! Sylwia
Comment
Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#4

10 Oct 2017, 18:43

P.S. I forgot to ask:
If I do decide to have FOREIGN after || ZIP (see below), can I still use independent type of covariance? Or should I use unstructured in this case?
melogit RECIDIVATE i.FOREIGN##c.HISPANIC_PERCENT INCOME_CAPITA || ZIP: FOREIGN

Thank you.
Sylwia
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#5

10 Oct 2017, 18:50

In the model that you provided, should I also include "HISPANIC _PERCENT and "FOREIGN"?
Or is "i.FOREIGN##c.HISPANIC_PERCENT" enough?

i.FOREIGN##c.HISPANIC_PERCENT is enough. Do read -help fvvarlist-. When Stata sees the double ## operator, it automatically expands that to include i.FOREIGN and HISPANIC PERCENT in the model without your having to write them out. When you run the code this way, you will see that these terms appear in the output.

Would you still recommend independent covariance structure over unstructured?

Yes.

However, you are right that unstructured covariance leave nonsignificant results.

That's not what I meant, and that doesn't matter for deciding what to do. What I meant is that the model sometimes fails to converge when you ask for unstructured covariance because the number of covariance parameters is so large that there isn't enough data to identify them all.

Would you please clarify when do we actually use unstructured type of covariance?

It really depends on the number and type of random effects being estimated and it isn't easy to state a simple rule.
Comment
Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#6

11 Oct 2017, 17:50

Dear Mr. Schechter, Thank you so much for all your help. I really appreciate that.
Sylwia
Comment
Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#7

13 Oct 2017, 14:33

Dear Mr. Schechter,
Now, since I estimated correctly my model, I am trying to estimate and plot the marginal effects for the interaction i.FOREIGN##c.HISPANIC_PERCENT

Just to remind you, my model is:
melogit RECIDIVATE i.FOREIGN##c.HISPANIC_PERCENT INCOME_CAPITA || ZIP: RECIDIVATE is coded 0/1
FOREIGN is coded 0/1
HISPANIC_PERCENT is continuous variable

I found online a command:

margins, dydx(FOREIGN) at (HISPANIC_PERCENT=(1(1) 4)) vsquish post

However, Stata produces an error:
invalid dydx() option;
variable FOREIGN may not be present in model as factor and continuous predictor

I checked Stata handbook, but I am not quite sure what could work correctly after melogit. Would you please advice what would be a correct command for the margins after my model melogit? What are the typical meaningful values used after "at" - is it 25th 50th (mean) and 75th percentiles? How I can plot these margins?

Thank you so much in advance,
Sylwia
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#8

13 Oct 2017, 15:50

Your code looks correct to me, and I don't see any reason you should be getting that error. I have one theory. Perhaps you ran some command between the -melogit- and this -margins- command that overwrote the -melogit- results in e(). For example if you ran another -margins- command with the -post- option, then the original -melogit- results would no longer be there and the -margins- command you show above would fail.

If it wasn't that, please re-post showing all the code and output you got from Stata starting with the -melogit- command down through the command that is giving you the error message. Do that by copy/pasting from your log file or Results window into the Forum editor (use code delimiters so it shows up neatly, see FAQ #12 for instructions) and do not edit it in any way.

As for typical values for the -at()- option it's really a matter of what is of interest to you. If you want to see how things work over the whole range of a variable you would generally have a series of values that starts at or near the variable's minimum and runs up to or near its maximum. If you are focused on what's happening near the center of the distribution then the 25th, 50th (which is the median, not the mean) and 75th percentiles would be fine. You could specify that as -at((p25) HISPANIC_PERCENT) at((p50 HISPANIC_PERCENT) at((p75) HISPANIC PERCENT)- If there are certain particular values of HISPANIC_PERCENT that are of special theoretical interest, then those would be specified. It really depends on what question you are trying to answer.
Comment
Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#9

13 Oct 2017, 17:17

Dear Mr. Schechter,
I figured out the error. I had variable FOREIGN twice in my model.

Code:

melogit RECIDIVATE FOREIGN i.FOREIGN##c.HISPANIC_PERCENT INCOME_CAPITA_NOTSIG || ZIP: FOREIGN

It should be:

Code:

melogit RECIDIVATE i.FOREIGN##c.HISPANIC_PERCENT INCOME_CAPITA_NOTSIG || ZIP: FOREIGN

My margins command works well now for both percentiles and the exact values.

Code:

margins, dydx(FOREIGN) at((p25) HISPANIC_PERCENT) at((p50) HISPANIC_PERCENT) at((p75) HISPANIC_PERCENT)

Code:

margins, dydx(FOREIGN) at (HISPANIC_PERCENT=(10 12 13))

My only question is about plotting these marginal effects of interaction i.FOREIGN##c.HISPANIC_PERCENT. Again, my Dependent Variable is RECIDIVATE, and I am thinking that my dependent variable should on Y-axis, my PERCENT_HISPANIC on X-axis, and I should be getting two lines - one for 1=FOREIGN, and the other for 0=FOREIGN. After estimating the margins, I run the following marginsplot command. However, Stata produce an error:

Code:

. marginsplot, at(HISPANIC_PERCENT) plot(FOREIGN) FOREIGN not a dimension in margins results

Would you please advice how I can plot my margins?
Thank you in advance.
Sylwia
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#10

13 Oct 2017, 18:07

I think alll you need is just

Code:

marginsplot, at(HISPANIC_PERCENT = (10 12 13))

But there is only one curve to plot, not two. That's because you specified -dydx(foreign)- in your -margins- command. That means that -margins- calculates (and -marginsplot- graphs) the difference in probability of RECIDIVATE between the foreign = 0 and the foreign = 1 situations. As foreign has two values, there is only one difference between them.

If you are thinking you want a plot of the RECIDIVATE probabilities themselves at various values of HISPANIC_PERCENT, one such curve when FOREIGN = 0 and another when FOREIGN = 1, then you have to specify the -margins- command accordingly, which would be:

Code:

margins FOREIGN, at(HISPANIC_PERCENT = (10 12 13)) marginsplot

Note that there is no -dydx()- option this time.
Comment
Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#11

13 Oct 2017, 22:39

Mr. Schechter,
Thank you so much for all your help. Everything works now well!
Best,
Sylwia
Comment

Announcement

xtmelogit variance-covariance structure for the random effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment