Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to fit Multinomial logistic regression model with multilevel independent variables

    Hello everyone, I have travel data consisting of a person selecting a specific mode of transport (e.g., Car) over other options (e.g., Bus, Plane, and Train) and paying a certain price for the distance he travelled. Each row represents a trip that person made in their chosen mode.
    This data in wide format looks like this:
    Mode Mode_id Price Distance
    Car 1 4.5 109
    Train 2 2.1 34
    Bus 3 3.3 14
    Plain 4 8 150
    Car 1 5 20
    Code:
    * Define variables and add data input str15 Mode Mode_id Price Distance Car 1 4.5 109 Train 2 2.1 34 Bus 3 3.3 14 Plain 4 8 150 Car 1 5 20 end * Display the dataset list
    I have joined another data set of socio-economic variables at the district level, and the updated data looks like this:
    District District_id Mode Mode_id Price Distance Income
    Praha 1 1 Car 1 4.5 109 200
    Praha 1 1 Train 2 2.1 34 200
    Praha 2 2 Bus 3 3.3 14 300
    Praha1 1 Plain 4 8 150 200
    Praha 2 2 Car 1 5 20 300
    Code:
    * Define the dataset input str15 District District_id str10 Mode Mode_id Price Distance Income Praha 1 1 "Car" 1 4.5 109 200 Praha 1 1 "Train" 2 2.1 34 200 Praha 2 2 "Bus" 3 3.3 14 300 Praha 1 1 "Plain" 4 8 150 200 Praha 2 2 "Car" 1 5 20 300 end * Save the dataset save "Trips_Districts.dta", replace
    Now I want to predict people's mode of choice preference with alternative specific independent variables "price" and "distance" and with another set of independent variables "income" and "education" that are nested at the first level, which is district.

    The reshaped data in long format looks like this:
    District District_id Person_id Mode Mode_choice Choice_id Price Distance Income
    Praha 1 1 1 Car Car 1 4.5 109 200
    Praha 1 1 1 Train Car 0 200
    Praha 1 1 1 Bus Car 0 200
    Praha 1 1 1 Plane Car 0 200
    Praha 1 1 2 Train Train 1 2.1 34 200
    Praha 1 1 2 Car Train 0 200
    Praha 1 1 2 Bus Train 0 200
    Praha 1 1 2 Plane Train 0 200
    Praha 2 2 3 Bus Bus 1 3.3 14 300
    Praha 2 2 3 Plane Bus 0 300
    Praha 2 2 3 Car Bus 0 300
    Praha 2 2 3 Train Bus 0 300
    Praha1 1 4 Plane Plane 1 8 150 200
    Praha1 1 4 Car Plane 0 200
    Praha1 1 4 Train Plane 0 200
    Praha1 1 4 Bus Plane 0 200
    Praha 2 1 5 Car Car 1 20 21 300
    Praha 2 1 5 Train Car 0 300
    Praha 2 1 5 Bus Car 0 300
    Praha 2 1 5 Plane Car 0 300
    Code:
    * Define variable names and data input str10 District str10 District_id Person_id str10 Mode Choice Price Distance Income "Praha 1" 1 1 "Car" 1 4.5 109 200 "Praha 1" 1 1 "Train" 0 . 200 "Praha 1" 1 1 "Bus" 0 . 200 "Praha 1" 1 1 "Plane" 0 . 200 "Praha 1" 1 2 "Train" 1 2.1 34 200 "Praha 1" 1 2 "Car" 0 . 200 "Praha 1" 1 2 "Bus" 0 . 200 "Praha 1" 1 2 "Plane" 0 . 200 "Praha 2" 2 3 "Bus" 1 3.3 14 300 "Praha 2" 2 3 "Plane" 0 . 300 "Praha 2" 2 3 "Car" 0 . 300 "Praha 2" 2 3 "Train" 0 . 300 "Praha1" 1 4 "Plane" 1 8 150 200 "Praha1" 1 4 "Car" 0 . 200 "Praha1" 1 4 "Train" 0 . 200 "Praha1" 1 4 "Bus" 0 . 200 "Praha 2" 1 5 "Car" 1 20 21 300 "Praha 2" 1 5 "Train" 0 . 300 "Praha 2" 1 5 "Bus" 0 . 300 "Praha 2" 1 5 "Plane" 0 . 300 end * Save the dataset save "Trips_Districts_Final.dta", replace
    Problem1 : I don't have price and distance information for modes other than the one chosen.

    Any model recommendation and data code to reshape my wide data into a long format in stata will be highly appreciated.

    ​​​​​​​Thank you in advance.
Working...
X