How to run mlogit for panel data

Benjamin Wang

Join Date: Mar 2019

Posts: 1
#1

How to run mlogit for panel data

01 Mar 2019, 06:23

Dear guys:

I'm a graduate in China, my thesis is about to check the relation between 1000 people's salary and their transportation choices in 10 years.

Obviously it is a panel data, and I have conducted 7 kinds of transportation choices so it's also a mlogit model.

But there is no xtmlogit command in Stata, also I have checked the Stata journal before, I've read Klaus Pforr's <<femlogit—Implementation of the multinomial logit model with fixed effects
>>.

However, as a beginner in this field, I found it hard for me to understand the do files and the data- they are too complicated for me

All I need to do is check the salary increase and its influence to the changes of choices of transportation. Could someone kindly provide me some easy methods? I'd be grateful!

Looking forward for your guys' reply!

Benjamin Wang
Tags: logit, panel data, Time Series
Clyde Schechter

Join Date: Apr 2014

Posts: 29968
#2

01 Mar 2019, 09:45

It doesn't get any easier than using -femlogit-. I can think of a much more complicated way to do it, but not an easier one.

Look: just pretend that -femlogit- were called -xtmlogit-. So -xtset- your data the way you would if there were an -xtmlogit- command. And then run -femlogit- just as if it were -xtmlogit- and you'll have your results.

Now, if by

However, as a beginner in this field, I found it hard for me to understand the do files and the data- they are too complicated for me

you mean that even if there were an -xtmlogit- command you don't know how to use it, then that's another story. In that case, post back, but also include an example of your data. Be sure to use the -dataex- command to show the example data. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Gate Nucht

Join Date: Jan 2017

Posts: 20
#3

10 Mar 2021, 16:38

Hi everyone,

I am trying to use the -femlogit- command to run a multilevel multinomial regression analysis. However, most of my DVs are factor variables, so I get the following error message:

- factor variables and time series operators not allowed.

I followed River Huang's advice he offered in a different thread from 2017 ("femlogit with factor variables"), which was this:

tab x1, gen(dx)
femlogit y dx* x2 x3, baseoutcome(0)

I'm not sure what the first line does exactly, but I followed the advice. However, I still get the same error message when I try to run the third line ("factor-variable and time-series operators not allowed")

Here's my code (for the first model that only includes one IV):

tab intrace, gen(intrace_dx)
xtset ID
femlogit race i.intrace_dx, base(1)

I tried adding the "*" after the variable, but it does not run with or without it. I also tried substituting "base(1)" with "baseoutcome(1)" but I still get the same error message.

Any insights would be greatly appreciated.

Thanks!
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#4

11 Mar 2021, 05:45

Gate Nucht Many user-written packages do not support Stata's factor variable operators such as -i-. Your -tab intrace, gen(intrace_dx)- command line generates a set of dummy indicators, one for each value of -intrace-. I am not sure how many dummies you end up with, but suppose that it's 7. You can try -femlogit race intrace_dx2-intrace_dx7, base(1)- and see if it works. Please note that you should not place -i.- at the beginning of your dummy regressor names.
Comment
Gate Nucht

Join Date: Jan 2017

Posts: 20
#5

11 Mar 2021, 18:34

Thank you!

I tried your suggestions but I cannot make the model run. Stata tells me my dummy variables were "omitted because of no within-group variance". (There is nothing wrong with my variables and they run fine in a regular -mlogit- model.)

I also tried using the following line, which was suggested by Jeff Wooldridge in a different thread regarding a similar issue:

mlogit race i.intrace, cluster(intid) base(1)

However, when I compared the results generated by the line above with the results generated by a regular model -- mlogit race i.intrace1, base(1) -- the coefficients are exactly the same, while the standard errors and p-values are significantly larger when I used the cluster command. I would expect the coefficients to also be different, so I'm not sure if the cluster command is equivalent to using femlogit here.

To test my assumption, I compared the results of (1) a binary logistic regression model with (2) a clustering model with (3) a multilevel binary logistic regression model to see if the coefficients are the same there too. (I simply dichotomized my outcome variable for this.)

Regular binary logistic regression model code:
logit race_dich i.intrace

Clustering model code:
logit race_dich i.intrace, cluster(intid)

Multilevel model code:
xtset intid
xtlogit race_dich i.intrace

Here, too, do the regular and the clustering models produce the same coefficients, with the clustering model producing larger standard errors and p-values. The multilevel model produce very different coefficients from the other two models, which supports my suspicion that using the cluster command is not an accurate choice for what I'm trying to do.

Finally, I tried to use the gsem command, as is recommended here. This is my code:

gsem (2.race <- i.intrace RI2[intid])
(3.race <- i.intrace RI3[intid])
(4.race <- i.intrace RI4[intid])
(5.race <- i.intrace RI5[intid])
(6.race <- i.intrace RI6[intid])
(7.race <- i.intrace RI7[intid])
(8.race <- i.intrace RI8[intid]), mlogit

However, Stata gives me the following error message, and I haven't been able to figure out what's wrong with my code: ( is not a valid command name

My last resort would be to run multiple binary logistic regression models instead of one multinomial regression model or to to give it a shot in R, but I was hoping to somehow make my multilevel multinomial regression model run in Stata.

Last edited by Gate Nucht; 11 Mar 2021, 18:58.
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#6

12 Mar 2021, 00:29

Gate Nucht The -femlogit- model message means what it says: Your regressors do not vary over time within an individual. Just as with fixed effects linear regression models, you can't identify coefficients on such variables. They are differenced away along with fixed effects.

You are wrong in expecting -vce(cluster varname)- to result in different coefficient estimates. Your're estimating exactly the same model using exactly the same data and exactly the same non-linear esitmator of those coefficients. All you are telling Stata to do is to adjust standard errors for clustering and that's it. As long as your -varname- has no missing value, you're meant to see exactly the same coefficient estimates.

I've never used the -gsem- command before but my interpretation of that error message is that either your command line has a typo or you're using an older version of Stata that does not include the -gsem- command. In either case, the first step I suggest is to type -help gsem-. In case you're using a suitable version of Stata, you'll be able to learn the command's syntax diagram and find out potential errors in your command line. In case you're using an older version of Stata that does not support -gsem-, you'll get an error message which confirms this.
Comment
Gate Nucht

Join Date: Jan 2017

Posts: 20
#7

12 Mar 2021, 07:39

I don't understand why the model runs when I conduct a multilevel binary logistic regression

Code:

xtset ID xtlogit race_dich intrace2 intrace3

but not when I conduct a multilevel multinomial logistic regression using the same IVs.

Code:

xtset ID femlogit race intrace2 intrace3, base(1)

Wouldn't the issue that my regressors do not vary over time within an individual also prevent the binary logistic regression model from running?
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#8

12 Mar 2021, 08:13

Gate Nucht: -xtlogit-, by default, assumes -xtlogit, re- which is a random effects model. As with the linear regression case, the random effects model allows you to identify coefficients on time-invariant regressors.
Comment
Gate Nucht

Join Date: Jan 2017

Posts: 20
#9

12 Mar 2021, 11:06

Okay, so -xtlogit- is by default a random-effects model, but -femlogit- (which is supposed to do what the non-existent command -xtmlogit- would do) does not create random effects models, and that's why my model does not run when I use -femlogit-. Do I understand that correctly?

Basically, I'd like to analyze whether interviewer race (intrace) is predictive of how interviewers (ID) categorize respondents into racial groups (race). All interviewers conduct many interviews throughout the year under study, but the observable interviewer characteristics (e.g. race, gender, hire year) are of course the same across all interview records associated with the same interviewer.

So to examine how interviewer race predicts their likelihood of categorizing interviewees into different racial groups, I need to account for intid (since interviewers may have unobserved characteristics that also affect how they assign racial categories to respondents).

If there were only two racial categories, running a multilevel binary logistic regression would be the right choice, correct?

Code:

xtset ID xtlogit race_dich intrace2 intrace3

However, since my outcome variable has more than 2 categories and -femlogit- doesn't allow me to run a random effects model, and since gsem also does not run, could I just conduct several multilevel binary logistic regressions using -xtlogit- instead? That is, the first model would compare race category 1 with race category 2, the second model would compare race category 1 with race category 3, etc.

If this question is no longer a fit for this thread I understand and I will seek advice elsewhere.

Thank you!
Comment

Announcement

How to run mlogit for panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment