logit and mixed regression

Lena Buettner

Join Date: Aug 2022

Posts: 13
#1

logit and mixed regression

31 Aug 2022, 15:23

Hello together,

I am currently working on my Stata code for my bachelor thesis. I’m quite a beginner and would be super grateful for some help — best would be in as simpel words as possible.

The data set I’m working with is a two-level data set, with the person id as the first level and the household in which they live as second level. Most of my variables differ only within the household. For example I investigate the effect of birth order, so every child in the household has the same fixed effects concerning their household (all children live in the household with one level of assets, with one level of parental education and so on). So I have some sort of panel structure, even if I don’t make observations over time.

Now I have tow questions:
I want to estimate the effect of the childs gender and the family size in one model. The childs gender can have household fixed effects and only varies within the household. But the family size varies between households. I found out, that I somehow have to use the mixed command, but I’m not sure how exactly to implement it.

I have two types of outcome variables: Dummy variables and variables that could take more than just two values. I found out that for the dummy variables I have to use the logit or logistic command. If I type „logit Dummyvaribale sex“ or „logistic Dummyvaribale sex“ I get the following result: outcome does not vary; remember: 0 = negative outcome, all other non-missing values = positive outcome, r(2000). I don’t understand what was wrong here.

Thank you so much for your help - I’m super grateful for any advice!!

Here my not working relevant part of the code
* options:
mixed highest_grade_compleated sex number_siblings|| hh_idn:, mle
mixed highest_grade_compleated sex || hh_idn: number_siblings
*sex allein in nem fixed effects modell hat 0,60 als coeffizineten

reg dummycur number_siblings
reg enrollment_age number_siblings
reg highest_grade_compleated number_siblings age
reg schooling_progression number_siblings

xtset hh_idn
xtlogit current_yn sex, fe
*ODER
clogit current_yn sex, group(hh_idn)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

31 Aug 2022, 18:06

I get the following result: outcome does not vary; remember: 0 = negative outcome, all other non-missing values = positive outcome, r(2000). I don’t understand what was wrong here.

As you do not show example data, it is impossible to be sure, but the most likely cause of this is that your outcome variable is miscoded in the data. I often see data sets in which yes is coded as 1, and no is simply not coded (that is, is coded as missing value). That works well in spreadsheets, but is lethal in Stata. In Stata, no must be coded as zero, and yes can be coded as anything other than zero, but 1 is simplest and best. So if your outcome variable current_yn is not coded as 0 = no/1 = yes, change it to that and your error message will probably go away. If that is not the source of your problem, post back with example data using the -dataex- command and it will be possible to troubleshoot other sources of this kind of problem. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Concerning the correct use of -mixed-, let me point out that the level of the model at which a variable varies does not determine which level of the command it should appear in. In particular, even though number_siblings is a household-level variable (i.e. it is constant within households and varies across households) you do not necessarily place it in the hh_idn level of the command. What does get placed in those levels, you may wonder? Placing a variable, regardless of the level at which it varies, at one of the random levels of the command tells Stata that you want to estimate random slopes for that variable. So,

Code:

mixed highest_grade_compleated sex number_siblings|| hh_idn:, mle

is a standard two-level model of highest grade completed, explained by sex and number of siblings, with a random effect at the household level.

Code:

mixed highest_grade_compleated sex || hh_idn: number_siblings

is a mis-specified model in any case. Whenever you have a random slope specified (here, number_siblings) you must also include it at the bottom (fixed) level. So this needs to be changed to:

Code:

mixed highest_grade_compleated sex number_siblings || hh_idn: number_siblings

With that done, you now have a model where sex and number_siblings are explanatory variables for highest grade completed, along with a household level random effect. But you also have specified that the slope of the highest_grade_compleated:number_siblings relationship is different for every household. That is, which household you are in matters for determining how strongly number of siblings affects highest grade completed. As this is not my discipline, I don't know if having random slopes for this variable at the household level makes real-world sense or not. That is a substantive question, not a statistical one, and the choice between these two models depends on that. If you are unsure, you should consult somebody with expertise in your field of study.
1 like
Comment
Lena Buettner

Join Date: Aug 2022

Posts: 13
#3

02 Sep 2022, 02:16

Hello Clyde Schechter,

thank you so so much, you were right and the outcome variables was not coded correctly (with 1 and 2 instead of 0 and 1). Thank you so much for that great idea!
Next time I'll provide example data by using -dataex-. I didn't know that because I'm quite new to this forum and everything that has to do with codes! But now I know!

And thank you for the interpretation of the mixed models. Now I understand thanks to your explanation that I have to used the first of the mixed codes.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#4

02 Sep 2022, 02:18

Lena:
may I recommend you to take a look at the FAQ, so that you can post more effectively in the future? Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

logit and mixed regression

Comment

Comment

Comment