Creating a loop for estimating probabilities using mlogit

Sam McCaw

Join Date: Apr 2019

Posts: 30
#1

Creating a loop for estimating probabilities using mlogit

24 Apr 2019, 13:59

Hello,

Since the mlogit is not a panel estimator I want to estimate a cross sectional model but for 10 years, i.e. I will be estimating the model for each year separately.

For each of the outcomes I would like the estimate the probabilities from each model but contain them in one variable.

Can this be done using a loop? I.e. estimate for year 1, use predict to get the probabilities (pr_outcome1 pr_outcome2 pr_outcome3), and then for year 2, and have the probabilities for year two appear under the same variables pr_outcome1 pr_outcome2 pr_outcome3?

my model is mlogit y xi x2 i.x3, base (0)

Thanks.

SAM
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30127
#2

24 Apr 2019, 22:14

The code you are asking for is complicated enough that it would not be prudent to throw some code up here without testing it. But you provided no example data to work with, so that isn't possible.

In a general way, yes you can do this. You could do it with a loop, although using the -runby- command, by Robert Picard and me, available form SSC, is simpler. The -runby- helpfile contains some examples that are roughly similar to your problem.

If you want more concrete help, post back with example data, using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Sam McCaw

Join Date: Apr 2019

Posts: 30
#3

25 Apr 2019, 07:11

Many thanks for your response and suggestions, and apologies for the lack in clarity. Please find an example below:

My current code for years 2000-2010 is as follows:

eststo: mlogit y x1 x2 x3 , base (0)

predict z1 if e(sample), xb outcome(1)
predict z2 if e(sample), xb outcome(2)
predict z3 if e(sample), xb outcome(3)
predict z4 if e(sample), xb outcome(4)

predict pr_outcome0 pr_outcome1 pr_outcome2 pr_outcome3 pr_outcome4

I initially ran this as pooled data and panel data but since mlogit is not a panel estimator I would like to run these by year (2000-2010).

I would like to use a loop to have STATA repeat the above regressions by year and replace the predicted values for each iteration without having to create different versions of pr_outcome0 pr_outcome1 pr_outcome2 pr_outcome3 pr_outcome4.

I remember doing something similar a few years ago when I ran a simple regression by industry and year, and then got STATA to replace the predicted values for the next iteration of year-industry using a loop.

Thanks,

SAM
Comment
Sam McCaw

Join Date: Apr 2019

Posts: 30
#4

25 Apr 2019, 07:13

I cannot upload the data unfortunately as it is licensed and I am unable to share on a public forum. Thanks.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35736
#5

25 Apr 2019, 08:31

#4 OK, but we already address that difficulty in the FAQ Advice.

We can understand your dataset only to the extent that you explain it clearly.

The best way to explain it is to show an example.

[...]

If your dataset is confidential, then provide a fake example instead.

The second best way to explain your situation is to use one of Stata's own datasets and adapt it to your problem. Examples are the auto data and the Grunfeld data (a simple panel dataset). That may be more work for you and you may not find an analog of your problem with such a dataset.
2 likes
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10231

25 Apr 2019, 09:54

This is not tested and is based on #3. As others have remarked, a data example increases your chances of getting a helpful reply.

Code:

forval i=1/4{
gen pr_outcome`i'=.
}

forval y=2000/2010{
mlogit y x1 x2 x3 if year==`y', base (0)
predict z1 if e(sample), xb outcome(1) 
predict z2 if e(sample), xb outcome(2) 
predict z3 if e(sample), xb outcome(3) 
predict z4 if e(sample), xb outcome(4) 
predict p1 p2 p3 p4
forval i=1/4{
replace pr_outcome`i'=p`i' if year==`y'
}
drop p1 p2 p3 p4 z1 z2 z3 z4
}

Errors can arise if not all categories are observed in every year.

Comment

Sam McCaw

Join Date: Apr 2019

Posts: 30
#7

25 Apr 2019, 10:11

Thanks all, for the many helpful comments and suggestion. If the above code does not work with my dataset I will upload an example dataset.

Best,
SAM
Comment

Sam McCaw

Join Date: Apr 2019
Posts: 30

01 May 2019, 09:12

Hi Andrew,

I'm using the following code to estimate the mlogit model to calculate the outcome probabilities for 40 individuals by year. My variables are

IIndividual ID: indid
Year: year
X-variable: x1
Y-variable: outcome with five possible values (0, 1, 2, 3, 4). I am using value 0 as the base outcome.

When I run the following code, I get an error message as follows, but am not sure where the mistake is.

Code:

outcome 2 not found

If I run this without a loop it works just fine. Is it because for some years outcome 2 does not appear? Grateful for any suggestions on how to get around this.

Thanks.

SAM

Code:

  

forval i=0/4
{ gen pr_outcome`i'=.
}  

forval y=2000/2010{
mlogit outcome x1 if year==`y', base (0)
predict z1 if e(sample), xb outcome(1)  
predict z2 if e(sample), xb outcome(2)  
predict z3 if e(sample), xb outcome(3)  
predict z4 if e(sample), xb outcome(4)  
predict p1 p2 p3 p4
forval i=0/4
{replace pr_outcome`i'=p`i' if year==`y'
}
drop p1 p2 p3 p4 z1 z2 z3 z4
}

The data example is as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte indid int year float x1 byte outcome
 1 2000   20 0
 1 2001 23.1 0
 1 2002 24.9 0
 1 2003   25 0
 1 2004 25.6 0
 1 2005 25.6 0
 1 2006 25.8 2
 1 2007 25.9 2
 1 2008   26 2
 1 2009 26.1 2
 1 2010 26.4 2
 2 2000 26.4 3
 2 2001 26.4 3
 2 2002 26.6 3
 2 2003 26.8 3
 2 2004   27 3
 2 2005   27 3
 2 2006 27.8 4
 2 2007   28 4
 2 2008   28 4
 2 2009 28.4 4
 2 2010 28.6 4
 3 2000 28.8 0
 3 2001 28.9 0
 3 2002   29 3
 3 2003 29.4 3
 3 2004 29.7 3
 3 2005 29.9 3
 3 2006   30 3
 3 2007   30 3
 3 2008   30 3
 3 2009 30.7 3
 3 2010 30.8 3
 4 2000   31 0
 4 2001   31 0
 4 2002   31 0
 4 2003 31.3 0
 4 2004 31.9 0
 4 2005 32.1 0
 4 2006 32.2 0
 4 2007 32.4 0
 4 2008 32.4 3
 4 2009 32.8 3
 4 2010   33 3
 5 2000 33.2 3
 5 2001 33.5 3
 5 2002 33.6 3
 5 2003 33.9 3
 5 2004 33.9 3
 5 2005   34 3
 5 2006   34 3
 5 2007   34 3
 5 2008   34 4
 5 2009 34.1 4
 5 2010 34.3 4
 6 2000 34.3 3
 6 2001 34.3 3
 6 2002 34.4 3
 6 2003 34.5 3
 6 2004 34.5 4
 6 2005 34.6 4
 6 2006 34.7 4
 6 2007 34.7 4
 6 2008 34.7 4
 6 2009 34.8 4
 6 2010 34.8 4
 7 2000 34.9 0
 7 2001   35 0
 7 2002   35 0
 7 2003   35 1
 7 2004   35 3
 7 2005   35 3
 7 2006   35 3
 7 2007   35 3
 7 2008   35 3
 7 2009   35 3
 7 2010 35.4 3
 8 2000 35.4 0
 8 2001 35.4 0
 8 2002 35.5 0
 8 2003 35.6 0
 8 2004 35.7 3
 8 2005 35.7 3
 8 2006 35.9 3
 8 2007   36 3
 8 2008   36 3
 8 2009   36 3
 8 2010   36 3
 9 2000 36.1 0
 9 2001 36.1 0
 9 2002 36.1 0
 9 2003 36.2 0
 9 2004 36.3 0
 9 2005 36.4 0
 9 2006 36.4 0
 9 2007 36.4 0
 9 2008 36.6 3
 9 2009 36.8 3
 9 2010 36.8 3
10 2000 36.8 0
end

Last edited by Sam McCaw; 01 May 2019, 09:20.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10231

02 May 2019, 06:30

As explained in #6, the code relies on all categories of the outcome being observed in every year. You should look at your data before analyzing it.

Code:

. 
. tab outcome year

           |                                                     year
   outcome |      2000       2001       2002       2003       2004       2005       2006       2007       2008       2009 |     Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
         0 |         7          6          5          4          3          3          2          2          0          0 |        32 
         1 |         0          0          0          1          0          0          0          0          0          0 |         1 
         2 |         0          0          0          0          0          0          1          1          1          1 |         5 
         3 |         3          3          4          4          5          5          4          4          5          5 |        47 
         4 |         0          0          0          0          1          1          2          2          3          3 |        15 
-----------+--------------------------------------------------------------------------------------------------------------+----------
     Total |        10          9          9          9          9          9          9          9          9          9 |       100 


           |    year
   outcome |      2010 |     Total
-----------+-----------+----------
         0 |         0 |        32 
         1 |         0 |         1 
         2 |         1 |         5 
         3 |         5 |        47 
         4 |         3 |        15 
-----------+-----------+----------
     Total |         9 |       100

The category "outcome=1" is only observed once in the 11 year period. If this is the case in your data, it may make sense to merge it with a closely related category or drop it altogether. Secondly, your base category, "outcome=0", is not present in the last 3 years of the sample, so your mlogit command cannot be executed. I do not know why you need to create the z1-z4 variables as you do not use them. So here is a revised code that generates predictions with an uneven number of categories across years.

Code:

qui sum outcome
forval i=`r(min)'/`r(max)'{
gen pr_outcome`i'=.
}

forval y=2000/2007{
mlogit outcome x1 if year==`y', base (0)
levelsof outcome if year==`y', local(outcomes)
foreach o in `outcomes'{
predict prob`o' if e(sample), pr outcome(`o') 
}
foreach o in `outcomes'{
replace pr_outcome`o'=prob`o' if year==`y'
}
drop prob*
}

Comment

Sam McCaw

Join Date: Apr 2019

Posts: 30
#10

02 May 2019, 08:32

Many thanks, Andrew, once again. I will try the above code.
Comment

Announcement