How to fit Multinomial logistic regression model with multilevel independent variables

Sorath shah

Join Date: Oct 2023
Posts: 1

How to fit Multinomial logistic regression model with multilevel independent variables

30 Oct 2023, 05:08

Hello everyone, I have travel data consisting of a person selecting a specific mode of transport (e.g., Car) over other options (e.g., Bus, Plane, and Train) and paying a certain price for the distance he travelled. Each row represents a trip that person made in their chosen mode.
This data in wide format looks like this:

Mode	Mode_id	Price	Distance
Car	1	4.5	109
Train	2	2.1	34
Bus	3	3.3	14
Plain	4	8	150
Car	1	5	20

Code:

* Define variables and add data input str15 Mode Mode_id Price Distance Car 1 4.5 109 Train 2 2.1 34 Bus 3 3.3 14 Plain 4 8 150 Car 1 5 20 end * Display the dataset list

I have joined another data set of socio-economic variables at the district level, and the updated data looks like this:

District	District_id	Mode	Mode_id	Price	Distance	Income
Praha 1	1	Car	1	4.5	109	200
Praha 1	1	Train	2	2.1	34	200
Praha 2	2	Bus	3	3.3	14	300
Praha1	1	Plain	4	8	150	200
Praha 2	2	Car	1	5	20	300

Code:

* Define the dataset input str15 District District_id str10 Mode Mode_id Price Distance Income Praha 1 1 "Car" 1 4.5 109 200 Praha 1 1 "Train" 2 2.1 34 200 Praha 2 2 "Bus" 3 3.3 14 300 Praha 1 1 "Plain" 4 8 150 200 Praha 2 2 "Car" 1 5 20 300 end * Save the dataset save "Trips_Districts.dta", replace

Now I want to predict people's mode of choice preference with alternative specific independent variables "price" and "distance" and with another set of independent variables "income" and "education" that are nested at the first level, which is district.

The reshaped data in long format looks like this:

District	District_id	Person_id	Mode	Mode_choice	Choice_id	Price	Distance	Income
Praha 1	1	1	Car	Car	1	4.5	109	200
Praha 1	1	1	Train	Car	0			200
Praha 1	1	1	Bus	Car	0			200
Praha 1	1	1	Plane	Car	0			200
Praha 1	1	2	Train	Train	1	2.1	34	200
Praha 1	1	2	Car	Train	0			200
Praha 1	1	2	Bus	Train	0			200
Praha 1	1	2	Plane	Train	0			200
Praha 2	2	3	Bus	Bus	1	3.3	14	300
Praha 2	2	3	Plane	Bus	0			300
Praha 2	2	3	Car	Bus	0			300
Praha 2	2	3	Train	Bus	0			300
Praha1	1	4	Plane	Plane	1	8	150	200
Praha1	1	4	Car	Plane	0			200
Praha1	1	4	Train	Plane	0			200
Praha1	1	4	Bus	Plane	0			200
Praha 2	1	5	Car	Car	1	20	21	300
Praha 2	1	5	Train	Car	0			300
Praha 2	1	5	Bus	Car	0			300
Praha 2	1	5	Plane	Car	0			300

Code:

* Define variable names and data input str10 District str10 District_id Person_id str10 Mode Choice Price Distance Income "Praha 1" 1 1 "Car" 1 4.5 109 200 "Praha 1" 1 1 "Train" 0 . 200 "Praha 1" 1 1 "Bus" 0 . 200 "Praha 1" 1 1 "Plane" 0 . 200 "Praha 1" 1 2 "Train" 1 2.1 34 200 "Praha 1" 1 2 "Car" 0 . 200 "Praha 1" 1 2 "Bus" 0 . 200 "Praha 1" 1 2 "Plane" 0 . 200 "Praha 2" 2 3 "Bus" 1 3.3 14 300 "Praha 2" 2 3 "Plane" 0 . 300 "Praha 2" 2 3 "Car" 0 . 300 "Praha 2" 2 3 "Train" 0 . 300 "Praha1" 1 4 "Plane" 1 8 150 200 "Praha1" 1 4 "Car" 0 . 200 "Praha1" 1 4 "Train" 0 . 200 "Praha1" 1 4 "Bus" 0 . 200 "Praha 2" 1 5 "Car" 1 20 21 300 "Praha 2" 1 5 "Train" 0 . 300 "Praha 2" 1 5 "Bus" 0 . 300 "Praha 2" 1 5 "Plane" 0 . 300 end * Save the dataset save "Trips_Districts_Final.dta", replace

Problem1 : I don't have price and distance information for modes other than the one chosen.

Any model recommendation and data code to reshape my wide data into a long format in stata will be highly appreciated.

Thank you in advance.

Tags: categorical, discrete choice modeling, fixed effects, logit, multinominal

Announcement

How to fit Multinomial logistic regression model with multilevel independent variables