Fixed effects model, problem when merging datasets

Jo Lidman

Join Date: Mar 2023

Posts: 13
#1

Fixed effects model, problem when merging datasets

12 Apr 2023, 04:08

Hi,

I have two datasets that I have merged together. Dataset 1 includes personal data, including city and year and month when born. I have another dataset nr 2, that includes air pollution levels per month and year and city. I merged the two datasets on month and year.

The issue is that the dataset nr 2 includes the month and year variables as columns and then each city as a separate column (where the values for each city-column are the air pollution levels for the city that year and month).
However in dataset nr 1, the birthcity is a column with the city names as values.

I attempted using the following code to create another variable:
Where I created one variable for the pollution per city by grouping together the cities, essentially creating the same column as the one called birthcity. But, then I replace the city name by the air pollution level for that month and year.

Code:

egen pollution_city_value = group(birthcity) replace pollution_city_value = city1 if birthcity== 1 replace pollution_city_value = city2 if birthcity == 2 replace pollution_city_value = city3 if birthcity == 3 replace pollution_city_value = city4 if birthcity == 4

However, the code seems wrong. And I wondered if anyone knows a simpler way to approach this problem? In the end I would like to create a dummy for each city to explore the within-variation in pollution in each city.

Kind regards.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10188
#2

12 Apr 2023, 06:00

Originally posted by Jo Lidman View Post

The issue is that the dataset nr 2 includes the month and year variables as columns and then each city as a separate column (where the values for each city-column are the air pollution levels for the city that year and month).

Provide a data example of this using the dataex command.
Comment

Jo Lidman

Join Date: Mar 2023
Posts: 13

13 Apr 2023, 04:14

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(city_value1 city_value2) float pollution_city_value long pid
0 0 0.45 1
1 0 0.43 2
0 1 0.55 3

Comment

Jo Lidman

Join Date: Mar 2023

Posts: 13
#4

13 Apr 2023, 04:16

Originally posted by Jo Lidman View Post

Hi,

I have two datasets that I have merged together. Dataset 1 includes personal data, including city and year and month when born. I have another dataset nr 2, that includes air pollution levels per month and year and city. I merged the two datasets on month and year.

The issue is that the dataset nr 2 includes the month and year variables as columns and then each city as a separate column (where the values for each city-column are the air pollution levels for the city that year and month).
However in dataset nr 1, the birthcity is a column with the city names as values.

I attempted using the following code to create another variable:
Where I created one variable for the pollution per city by grouping together the cities, essentially creating the same column as the one called birthcity. But, then I replace the city name by the air pollution level for that month and year.

Code:

egen pollution_city_value = group(birthcity) replace pollution_city_value = city1 if birthcity== 1 replace pollution_city_value = city2 if birthcity == 2 replace pollution_city_value = city3 if birthcity == 3 replace pollution_city_value = city4 if birthcity == 4

However, the code seems wrong. And I wondered if anyone knows a simpler way to approach this problem? In the end I would like to create a dummy for each city to explore the within-variation in pollution in each city.

Kind regards.

The code should look like this

Code:

egen pollution_city_value = group(birthcity) tab pollution_city_value, gen(city_value) replace pollution_city_value = city1 if birthcity== 1 replace pollution_city_value = city2 if birthcity == 2 replace pollution_city_value = city3 if birthcity == 3 replace pollution_city_value = city4 if birthcity == 4
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10188

13 Apr 2023, 05:53

It appears that you have indicators city_value1, city_value_2, and so on. As you do not present the time variables, you will need to modify what is shown here. Remember, the solutions you get are just as good (or limited) as your data example.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(city_value1 city_value2) float pollution_city_value long pid
0 0 0.45 1
1 0 0.43 2
0 1 0.55 3
end

reshape long city_value, i(pid) j(which)
keep if city_value
drop pid city_value

Res.:

Code:

. l

     +------------------+
     | which   pollut~e |
     |------------------|
  1. |     1        .43 |
  2. |     2        .55 |
     +------------------+

Announcement

Fixed effects model, problem when merging datasets

Comment

Comment

Comment

Comment