Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects model, problem when merging datasets

    Hi,

    I have two datasets that I have merged together. Dataset 1 includes personal data, including city and year and month when born. I have another dataset nr 2, that includes air pollution levels per month and year and city. I merged the two datasets on month and year.

    The issue is that the dataset nr 2 includes the month and year variables as columns and then each city as a separate column (where the values for each city-column are the air pollution levels for the city that year and month).
    However in dataset nr 1, the birthcity is a column with the city names as values.

    I attempted using the following code to create another variable:
    Where I created one variable for the pollution per city by grouping together the cities, essentially creating the same column as the one called birthcity. But, then I replace the city name by the air pollution level for that month and year.

    Code:
    egen pollution_city_value = group(birthcity)
    
    replace pollution_city_value = city1 if birthcity== 1
    replace pollution_city_value = city2 if birthcity == 2
    replace pollution_city_value = city3 if birthcity == 3
    replace pollution_city_value = city4 if birthcity == 4
    However, the code seems wrong. And I wondered if anyone knows a simpler way to approach this problem? In the end I would like to create a dummy for each city to explore the within-variation in pollution in each city.

    Kind regards.

  • #2
    Originally posted by Jo Lidman View Post
    The issue is that the dataset nr 2 includes the month and year variables as columns and then each city as a separate column (where the values for each city-column are the air pollution levels for the city that year and month).

    Provide a data example of this using the dataex command.

    Comment


    • #3


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(city_value1 city_value2) float pollution_city_value long pid
      0 0 0.45 1
      1 0 0.43 2
      0 1 0.55 3

      Comment


      • #4
        Originally posted by Jo Lidman View Post
        Hi,

        I have two datasets that I have merged together. Dataset 1 includes personal data, including city and year and month when born. I have another dataset nr 2, that includes air pollution levels per month and year and city. I merged the two datasets on month and year.

        The issue is that the dataset nr 2 includes the month and year variables as columns and then each city as a separate column (where the values for each city-column are the air pollution levels for the city that year and month).
        However in dataset nr 1, the birthcity is a column with the city names as values.

        I attempted using the following code to create another variable:
        Where I created one variable for the pollution per city by grouping together the cities, essentially creating the same column as the one called birthcity. But, then I replace the city name by the air pollution level for that month and year.

        Code:
        egen pollution_city_value = group(birthcity)
        
        replace pollution_city_value = city1 if birthcity== 1
        replace pollution_city_value = city2 if birthcity == 2
        replace pollution_city_value = city3 if birthcity == 3
        replace pollution_city_value = city4 if birthcity == 4
        However, the code seems wrong. And I wondered if anyone knows a simpler way to approach this problem? In the end I would like to create a dummy for each city to explore the within-variation in pollution in each city.

        Kind regards.
        The code should look like this



        Code:
        egen pollution_city_value = group(birthcity)
        tab pollution_city_value, gen(city_value)
        
        
        replace pollution_city_value = city1 if birthcity== 1
        replace pollution_city_value = city2 if birthcity == 2
        replace pollution_city_value = city3 if birthcity == 3
        replace pollution_city_value = city4 if birthcity == 4

        Comment


        • #5
          It appears that you have indicators city_value1, city_value_2, and so on. As you do not present the time variables, you will need to modify what is shown here. Remember, the solutions you get are just as good (or limited) as your data example.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte(city_value1 city_value2) float pollution_city_value long pid
          0 0 0.45 1
          1 0 0.43 2
          0 1 0.55 3
          end
          
          reshape long city_value, i(pid) j(which)
          keep if city_value
          drop pid city_value
          Res.:

          Code:
          . l
          
               +------------------+
               | which   pollut~e |
               |------------------|
            1. |     1        .43 |
            2. |     2        .55 |
               +------------------+

          Comment

          Working...
          X