Hello everyone, I have travel data consisting of a person selecting a specific mode of transport (e.g., Car) over other options (e.g., Bus, Plane, and Train) and paying a certain price for the distance he travelled. Each row represents a trip that person made in their chosen mode.
This data in wide format looks like this:
I have joined another data set of socio-economic variables at the district level, and the updated data looks like this:
Now I want to predict people's mode of choice preference with alternative specific independent variables "price" and "distance" and with another set of independent variables "income" and "education" that are nested at the first level, which is district.
The reshaped data in long format looks like this:
Problem1 : I don't have price and distance information for modes other than the one chosen.
Any model recommendation and data code to reshape my wide data into a long format in stata will be highly appreciated.
Thank you in advance.
This data in wide format looks like this:
Mode | Mode_id | Price | Distance |
Car | 1 | 4.5 | 109 |
Train | 2 | 2.1 | 34 |
Bus | 3 | 3.3 | 14 |
Plain | 4 | 8 | 150 |
Car | 1 | 5 | 20 |
Code:
* Define variables and add data input str15 Mode Mode_id Price Distance Car 1 4.5 109 Train 2 2.1 34 Bus 3 3.3 14 Plain 4 8 150 Car 1 5 20 end * Display the dataset list
District | District_id | Mode | Mode_id | Price | Distance | Income |
Praha 1 | 1 | Car | 1 | 4.5 | 109 | 200 |
Praha 1 | 1 | Train | 2 | 2.1 | 34 | 200 |
Praha 2 | 2 | Bus | 3 | 3.3 | 14 | 300 |
Praha1 | 1 | Plain | 4 | 8 | 150 | 200 |
Praha 2 | 2 | Car | 1 | 5 | 20 | 300 |
Code:
* Define the dataset input str15 District District_id str10 Mode Mode_id Price Distance Income Praha 1 1 "Car" 1 4.5 109 200 Praha 1 1 "Train" 2 2.1 34 200 Praha 2 2 "Bus" 3 3.3 14 300 Praha 1 1 "Plain" 4 8 150 200 Praha 2 2 "Car" 1 5 20 300 end * Save the dataset save "Trips_Districts.dta", replace
The reshaped data in long format looks like this:
District | District_id | Person_id | Mode | Mode_choice | Choice_id | Price | Distance | Income |
Praha 1 | 1 | 1 | Car | Car | 1 | 4.5 | 109 | 200 |
Praha 1 | 1 | 1 | Train | Car | 0 | 200 | ||
Praha 1 | 1 | 1 | Bus | Car | 0 | 200 | ||
Praha 1 | 1 | 1 | Plane | Car | 0 | 200 | ||
Praha 1 | 1 | 2 | Train | Train | 1 | 2.1 | 34 | 200 |
Praha 1 | 1 | 2 | Car | Train | 0 | 200 | ||
Praha 1 | 1 | 2 | Bus | Train | 0 | 200 | ||
Praha 1 | 1 | 2 | Plane | Train | 0 | 200 | ||
Praha 2 | 2 | 3 | Bus | Bus | 1 | 3.3 | 14 | 300 |
Praha 2 | 2 | 3 | Plane | Bus | 0 | 300 | ||
Praha 2 | 2 | 3 | Car | Bus | 0 | 300 | ||
Praha 2 | 2 | 3 | Train | Bus | 0 | 300 | ||
Praha1 | 1 | 4 | Plane | Plane | 1 | 8 | 150 | 200 |
Praha1 | 1 | 4 | Car | Plane | 0 | 200 | ||
Praha1 | 1 | 4 | Train | Plane | 0 | 200 | ||
Praha1 | 1 | 4 | Bus | Plane | 0 | 200 | ||
Praha 2 | 1 | 5 | Car | Car | 1 | 20 | 21 | 300 |
Praha 2 | 1 | 5 | Train | Car | 0 | 300 | ||
Praha 2 | 1 | 5 | Bus | Car | 0 | 300 | ||
Praha 2 | 1 | 5 | Plane | Car | 0 | 300 |
Code:
* Define variable names and data input str10 District str10 District_id Person_id str10 Mode Choice Price Distance Income "Praha 1" 1 1 "Car" 1 4.5 109 200 "Praha 1" 1 1 "Train" 0 . 200 "Praha 1" 1 1 "Bus" 0 . 200 "Praha 1" 1 1 "Plane" 0 . 200 "Praha 1" 1 2 "Train" 1 2.1 34 200 "Praha 1" 1 2 "Car" 0 . 200 "Praha 1" 1 2 "Bus" 0 . 200 "Praha 1" 1 2 "Plane" 0 . 200 "Praha 2" 2 3 "Bus" 1 3.3 14 300 "Praha 2" 2 3 "Plane" 0 . 300 "Praha 2" 2 3 "Car" 0 . 300 "Praha 2" 2 3 "Train" 0 . 300 "Praha1" 1 4 "Plane" 1 8 150 200 "Praha1" 1 4 "Car" 0 . 200 "Praha1" 1 4 "Train" 0 . 200 "Praha1" 1 4 "Bus" 0 . 200 "Praha 2" 1 5 "Car" 1 20 21 300 "Praha 2" 1 5 "Train" 0 . 300 "Praha 2" 1 5 "Bus" 0 . 300 "Praha 2" 1 5 "Plane" 0 . 300 end * Save the dataset save "Trips_Districts_Final.dta", replace
Any model recommendation and data code to reshape my wide data into a long format in stata will be highly appreciated.
Thank you in advance.