Dear members of the Statalist forum,
I have the following problem. Among other things, I am looking at temperature data from weather stations that I have matched with the corresponding electoral districts ( German federal election 2021) via geo-coordinates. The weather data is available daily (here simplified for two days: temp_a22312 and temp_a22313). I would like to use the temperature values of the three nearest weather stations to form a temperature average for those districts in which there are no stations (station==.) or which have missing values (districts_to_check==1). However, when I apply the code below, the exact same temperature values are calculated and applied for each missing district. I think this is because of the incorrectly formulated condition. However, after trying several times, I have not been able to find a solution. I would be very grateful for any help. The data looks like this. For example, there is no weather station located in districts 18, 19, 20..
use weather_data.dta, clear
tab district if station==. // identify districts without station
egen group_district = group(district)
egen has_missing_tmk = max(missing(temp_a22312 - temp_a22312)), by(group_district) // Identify districts with stations having missing values on temp* variables
egen has_data_tmk = max(temp_a22312 - temp_a22312 != .), by(group_district) // Identify districts with at least one station having data on temp* variables
gen districts_to_check = has_missing_tmk & !has_data_tmk & station !=. //Identify districts that meet the criteria
list district if districts_to_check
local date_list 22312 22313
levelsof district, local(district_list)
foreach district of local district_list {
gen closest_station1_`district' = closest_station1 if district == `district'
gen closest_station2_`district' = closest_station2 if district == `district'
gen closest_station3_`district' = closest_station3 if district == `district'
}
foreach district in `district_list' {
foreach date in `date_list' {
egen temp_avg`district'`date' = mean(temp_a`date') if (station == closest_station1_`district' | station == closest_station2_`district' | station == closest_station3_`district')
replace temp_a`date' = temp_avg`district'`date' if (missing(temp_a`date')) & station==. | (missing(temp_a`date')) & districts_to_check==1
}
}
The error seems to me to be in the last two lines of the command, as it is not possible in this way to calculate averages for districts to which no stations have been assigned. But as I already said, I have no solution to this.
Best regards and thanks in advance,
Jessica
I have the following problem. Among other things, I am looking at temperature data from weather stations that I have matched with the corresponding electoral districts ( German federal election 2021) via geo-coordinates. The weather data is available daily (here simplified for two days: temp_a22312 and temp_a22313). I would like to use the temperature values of the three nearest weather stations to form a temperature average for those districts in which there are no stations (station==.) or which have missing values (districts_to_check==1). However, when I apply the code below, the exact same temperature values are calculated and applied for each missing district. I think this is because of the incorrectly formulated condition. However, after trying several times, I have not been able to find a solution. I would be very grateful for any help. The data looks like this. For example, there is no weather station located in districts 18, 19, 20..
|
use weather_data.dta, clear
tab district if station==. // identify districts without station
egen group_district = group(district)
egen has_missing_tmk = max(missing(temp_a22312 - temp_a22312)), by(group_district) // Identify districts with stations having missing values on temp* variables
egen has_data_tmk = max(temp_a22312 - temp_a22312 != .), by(group_district) // Identify districts with at least one station having data on temp* variables
gen districts_to_check = has_missing_tmk & !has_data_tmk & station !=. //Identify districts that meet the criteria
list district if districts_to_check
local date_list 22312 22313
levelsof district, local(district_list)
foreach district of local district_list {
gen closest_station1_`district' = closest_station1 if district == `district'
gen closest_station2_`district' = closest_station2 if district == `district'
gen closest_station3_`district' = closest_station3 if district == `district'
}
foreach district in `district_list' {
foreach date in `date_list' {
egen temp_avg`district'`date' = mean(temp_a`date') if (station == closest_station1_`district' | station == closest_station2_`district' | station == closest_station3_`district')
replace temp_a`date' = temp_avg`district'`date' if (missing(temp_a`date')) & station==. | (missing(temp_a`date')) & districts_to_check==1
}
}
The error seems to me to be in the last two lines of the command, as it is not possible in this way to calculate averages for districts to which no stations have been assigned. But as I already said, I have no solution to this.
Best regards and thanks in advance,
Jessica
Comment