interpreting stata output for multi level xtmelogit command

Maliha Nazir

Join Date: Aug 2016

Posts: 13
#1

interpreting stata output for multi level xtmelogit command

13 Sep 2016, 14:15

Hi,

I am implementing a multi level model in Stata.I have some questions regarding interpreting the output specifically analyzing the random effects at individual and country level.

The dependent variable(V46new) is binary and dataset has two levels : individual and country. I want to see how individual as well as country level variables affect the dependent variable. The data was collected for 5 years .

I want to analyze if country's unemployment rate (Unemp2010) and other individual level variables affect the dependent variable (V46new) using multi level approach to analyze contextual effects.

My questions are as follows:
What does (_cons) and var(_cons) represent and how are these values interpreted?

How is var(Une~2010) interpreted?

What is the variance at individual and country level from the Stata output?

Thank you very much.

Regards,

Maliha Nazir

Last edited by Maliha Nazir; 13 Sep 2016, 14:48.
Tags: categorical, margins, multi level, random effects, survey data
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

13 Sep 2016, 14:26

I'm not sure what you did to try to show the output you want help interpreting, but all I see on my screen is some placeholder icons that seem to suggest an attached picture, but there is no actual picture visible. In any case, the best way to show Stata output is to copy/paste it directly from your Results window or log file into a code block in the forum editor. (If you are not familiar with code blocks, please read the instructions in FAQ #12.) When output is posted that way, it is always an accurate representation of what happened, and it shows up readably in everybody's browser. Please repost. While it may be possible to answer your questions without specifically referring to your output, the answers would be long-winded and generic, and possibly confusing. It would be better to see the actual output and talk about that.
1 like
Comment
Maliha Nazir

Join Date: Aug 2016

Posts: 13
#3

13 Sep 2016, 15:03

Dear Clyde,

Thank you for your reply.

I have pasted the Stata output in word document as the command took about 7 hours to display results. I am using xtmelogit command in Stata 13.1. The dependent variable is binary. The independent variables are at individual and country level (V2).

The dataset has two levels : individual nested under countries. The data was collected for 5 years . I want to analyze if country's unemployment rate (Unemp2010) and other individual level variables affect the dependent variable (V46new) using multi level approach to analyze contextual effects.

Command:
xtmelogit V46new women V242new V245new Both_Parents_Imm V181new V39new V248new V107new V59new V147new Svy_Year1 Svy_Year2 Svy_Year3 Svy_Year4 ||V2:Unemp2010, var

My questions are as follows:
What does (_cons) and var(_cons) represent and how are these values interpreted?

How is var(Une~2010) interpreted?

What is the variance at individual and country level from the Stata output?

What is the random intercept and the random slope from the output and how is it interpreted?

How is variable "women" interpreted from the output. Women is a binary variable, equals 1 if the respondent is women. How is the interpretation different when considering the marginal effects coefficient of variable " women", given in another table below.

If the variable "Unemp2010" is included in the fixed part as well as random part, will it be incorrect?

Thank you.

Attached Files

Doc1.docx (180.8 KB, 1 view)
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

13 Sep 2016, 15:44

Maliha,

I hate to be a pain about this, but I did ask you to post this by pasting from your log file or Results window into a code block here in the editor. Attaching Microsoft Office documents is discouraged. Some of us don't use Microsoft Office. I do, but I'm among those who will not download an Office document from a stranger because they can contain active malicious content, a risk I can't afford to take. Please follow the recommended ways of posting data in the FAQ: those recommendations are not there to make life hard for you but to assure that everybody uses the Forum safely, effectively and efficiently.

Since I don't have your actual output to work with, here is the output from a simple application of -melogit- (which is the name of the command in the current version of Stata) with a random slopes regression:

Code:

. webuse bangladesh, clear
(Bangladesh Fertility Survey, 1989)

. 
. melogit c_use age urban || district: age

// ITERATION LOG OMITTED FOR BREVITY 

Mixed-effects logistic regression               Number of obs     =      1,934
Group variable:        district                 Number of groups  =         60

                                                Obs per group:
                                                              min =          2
                                                              avg =       32.2
                                                              max =        118

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(2)      =      33.94
Log likelihood = -1249.8973                     Prob > chi2       =     0.0000
------------------------------------------------------------------------------
       c_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .008966   .0057597     1.56   0.120    -.0023228    .0202547
       urban |   .6549066   .1161018     5.64   0.000     .4273513     .882462
       _cons |  -.7046928   .0859711    -8.20   0.000    -.8731931   -.5361925
-------------+----------------------------------------------------------------
district     |
     var(age)|   .0001562   .0002996                      3.64e-06    .0067017
   var(_cons)|   .1965207   .0686312                      .0991156    .3896501
------------------------------------------------------------------------------
LR test vs. logistic model: chi2(2) = 39.05               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

The underlying model here is log odds(c_use) = constant + (b_age + v_district)*age + b_urban*urban + u_district + e_person
e_person is the residual error, u_district is a random intercept at the district level, and the slope of age varies by district, with mean b_age. The usual assumptions are made about the distributions of the error terms u, v, and e (mean 0, normal distribution with variance to be estimated, and independence).

So how does this relate to the Stata output? _cons is the constant term of the model. var(_cons) is the variance of the distribution of the district level u-intercepts. var(age) is the variance of the distribution of the district level slopes of the log-odds c_use vs age relationship..

The variance at the lowest level of the model is, in a multi-level logistic regression, always that of the standard logistic regression: pi²/3.
The variance at the district level is given by var(_cons).

Here is some margins output from the above regression:

Code:

. margins, at(urban = (0 1))

Predictive margins                              Number of obs     =      1,934
Model VCE    : OIM

Expression   : Marginal predicted mean, predict()

1._at        : urban           =           0

2._at        : urban           =           1

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   .3381885   .0182264    18.55   0.000     .3024655    .3739115
          2  |   .4881409   .0278009    17.56   0.000     .4336521    .5426297
------------------------------------------------------------------------------

In the -melogit- output, the coefficient of urban reflects the change in the log odds of the outcome variable (c_use) associated with a unit increase (from 0 to 1) in the value of the dichotomous predictor urban. In the -margins- output, we have the adjusted predicted probabilities of c_use when urban = 0, and when urban = 1.

Here is some marginal effects output:

Code:

. margins, dydx(urban) at(urban = (0))

Average marginal effects                        Number of obs     =      1,934
Model VCE    : OIM

Expression   : Marginal predicted mean, predict()
dy/dx w.r.t. : urban
at           : urban           =           0

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       urban |    .140042    .023944     5.85   0.000     .0931127    .1869714
------------------------------------------------------------------------------

The .140042 figure is the marginal effect of urban on the probability of c_use starting from the base value of urban = 0. Because this model was run without using factor-variable notation, this calculation was done treating urban as a continuous variable. So this value is the first derivative of the probability of c_use viewed as a function of urban, evaluated at urban = 0. It is close to, but not exactly the same as, 0.488109-0.3381885 from the first -margins- output.

You will notice that in the model I showed you, the variable to be estimated with a random slope, here age (corresponding to your Unemp2010) is included in the fixed part. That should always be done. It is incorrect not to include it unless you wish to impose the constraint that the mean random slope for that variable is zero. Without including it in the fixed part the model is, in general, mis-specified.

I think this answers your questions.

Comment

Maliha Nazir

Join Date: Aug 2016

Posts: 13
#5

20 Sep 2016, 12:20

Dear Clyde,

Thank you for your answer. From the example that you have quoted above, how is the effect of context at district level interpreted based on the co-efficient of variable "age" and var(_cons) and var(age)?

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

20 Sep 2016, 13:07

So, in the example posted in #3, the mean slope of the log_odds c_use vs age relationship (adjusted for urban) is 0.008966. That slope, however, varies from district to district. Some districts have higher slopes, some have lower, the average being, again, 0.008966. The variance of the distribution of slopes among districts is estimated by var(age), as 0.0001562.

As for var(_cons) it has no connection to the age effect in the model. Rather, the expected value of log odds u_c conditional on age and urban both being zero is, as specified by _cons, -0.7046928. But, again, districts differ, some having a higher value and some a lower value for this. The variance of this statistic among districts is estimated by var(_cons), as 0.1965207.
1 like
Comment
Maliha Nazir

Join Date: Aug 2016

Posts: 13
#7

20 Sep 2016, 13:26

So, can we say that age is positively associated with log odds of "c_use" (adjusted for urban) at district level as the co efficient is positive but it is not significant though?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

20 Sep 2016, 13:50

We can say that our estimate of the mean value of that slope is positive. I'm not a fan of null hypothesis significance testing, particularly when dealing with behavioral outcomes, so I generally ignore p-values in this context. Looking at the confidence interval, we can say that the bulk of the confidence interval lies in positive territory, but we cannot altogether rule out the possibility that, on average, this slope is negative. That said, the standard deviation across districts (square root of the variance) is about 0.012, which is quite large compared to that mean value. So we can also infer that in numerous districts the association with age is negative, and that there is also an appreciable number of districts in which it is strongly positive. I would also be quick to point out that the confidence interval for the mean estimate, from-0.0023228 to 0.0202547 is pretty wide relative to that estimated mean, so I would state any conclusions about the relationship to age fairly tentatively--the data do not enable us to make sharp precise statements about it.
1 like
Comment
Maliha Nazir

Join Date: Aug 2016

Posts: 13
#9

22 Sep 2016, 20:31

Hi Clyde,

There is strange thing that i noticed from the output that the co-efficients of all independent variables in the mixed model using meqrlogit command are same as the marginal effects using margins,dydx(*) as post estimation? Does that mean, something must be wrong or what could cause this result?

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#10

22 Sep 2016, 21:34

That doesn't sound right. But without seeing the exact commands and output it's hard to say anything more specific than that.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#11

22 Sep 2016, 21:53

I retract what I said in #10. I just checked -help meqrlogit postestimation- and clicked on the -margins- link. The default prediction for -margins- after -meqrlogit-, and the only predictor even available in that setting, is xb. Well, the coefficient is, indeed, the marginal effect on xb. The problem is that what you really want is the marginal effect on predicted probability, and you can't get that from -margins- after -meqrlogit-.

Are you able to get your model to converge with -melogit- instead of -meqrlogit-? If so, use that instead, and then, -margins, dydx()- will calculate the marginal effects on the predicted probabilities.

Code:

set more off clear webuse bangladesh melogit c_use urban age child* || district:age margins, dydx(age) at(age = (-15(5)20)) // EFFECTS ON PROBABILITIES meqrlogit c_use urban age child* || district:age margins, dydx(age) at(age = (-15(5)20)) // EFFECTS ON XB = COEFFICIENTS

I have no idea why -margins- is so restricted after -meqrlogit- when -melogit-, which estimates the same model using a different numerical method, is not.
1 like
Comment
Maliha Nazir

Join Date: Aug 2016

Posts: 13
#12

23 Sep 2016, 09:46

Thank you Clyde. Unfortunately, melogit does not converge. I get the following error:

numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 24: log likelihood = -47822.72 (backed up)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#13

23 Sep 2016, 11:18

That's unfortunate. I wonder if you might get convergence out of -melogit- if you gave it the results of -meqrlogit- to use as starting values.
1 like
Comment
Tom Weichle

Join Date: Apr 2014

Posts: 27
#14

23 Sep 2016, 15:06

Hi Clyde,

This is a really important observation that you discovered regarding the fact that -margins- after -meqrlogit- does not have an option to produce predicted probabilities. I've recently spent a good amount of time comparing the ability of the models I'm testing to converge using -melogit- and -meqrlogit-. And one of my end goals is to make comparisons on the predicted probabilities.

In general, it seems like the postestimation command -predict- has a default prediction statistic which is usually the same default as -margins-. However, the default for -meqrlogit- postestimation -predict- is "mu" (the predicted mean, that is, the probability of a positive outcome) as you mentioned which is not the same as the default for -margins- which is "xb". This is very surprising.

It's very discouraging that -meqrlogit- doesn't offer the option to produce predicted probabilities in the -margins- postestimation. Is there any reason why -meqrlogit- doesn't offer this option? This may also be a question that someone at Stata could answer.

Best,

Tom
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#15

23 Sep 2016, 15:31

Is there any reason why -meqrlogit- doesn't offer this option? This may also be a question that someone at Stata could answer.

I don't know. And, I, too, hope that someone from StataCorp will explain.
Comment

Announcement