How to loop using date

Meng JI

Join Date: May 2021

Posts: 77
#1

How to loop using date

24 Sep 2021, 17:35

Hi everyone,

I have a dataset like this,

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 district str9 dates float(district_lat district_lon) byte population "a" "2021/10/1" 22.248306 114.15244 10 "a" "2021/10/2" 22.248306 114.15244 20 "a" "2021/10/3" 22.248306 114.15244 30 "a" "2021/10/4" 22.248306 114.15244 40 "b" "2021/10/1" 22.279636 114.1655 10 "b" "2021/10/2" 22.279636 114.1655 20 "b" "2021/10/3" 22.279636 114.1655 30 "b" "2021/10/4" 22.279636 114.1655 40 "b" "2021/10/5" 22.279636 114.1655 50 "c" "2021/10/1" 22.24186 114.15285 10 "c" "2021/10/2" 22.24186 114.15285 20 "d" "2021/10/1" 22.28582 114.1998 30 "e" "2021/10/1" 22.28598 114.1915 20 "e" "2021/10/2" 22.28598 114.1915 30 "f" "2021/10/1" 22.27999 114.1588 60 "g" "2021/10/2" 22.26781 114.23608 50 "h" "2021/10/2" 22.335806 114.1495 40 "i" "2021/10/4" 22.33608 114.20561 30 end

I have several different locations and I want to calculate the daily weighted population density for each location i.
First, I calculate the distance between i and j and generate d_ij
Second, I check if the distance between i and j (d_ij) is smaller than 1km
Third, if the distance between i and j (d_ij) is smaller than 1km, I calculate the inverse of the distance times the population density in location j, generate density=(1/d_ij)*populationj
Fourth, for location i, I sum up all the densities generated.
Fifth, the population of a location is changing each day and I need to do the calculation for location i in each day.

I have a code that looks like this

forval i = 1/`=_N' {
local olat = latitude[`i']
local olong = longitude[`i']
local pop=population[`i']
geodist latitude longitude `olat' `olong', gen(dist`i')
replace dist`i'=. if dist`i'>1
replace dist`i'=0.1 if dist`i'<0.1
replace dist`i'=1/dist`i'*`pop' if dist`i'<1
}

but the loop can't give me exactly what I want and I can't add the date element in this. I wonder if you know how to add the date element into the loop?

Thank you very much for your time!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

24 Sep 2021, 18:50

I would take a different approach altogether. Looping over observation number is rarely a good idea in Stata. There are usually, but not always, better approaches. In this case, it becomes much simpler with a different data structure: create a new data set that contains all relevant pairs of observations. Then you can use simple Stata constructs to get what you want.

Code:

isid district dates // CREATE A DATASET OF DISTRICT PAIRS MATCHED ON DATE preserve rename (district* population) =_2 tempfile holding save `holding' restore joinby dates using `holding' // CALCULATE ALL DISTANCES geodist district_lat district_lon district_lat_2 district_lon_2, gen(distance) // PERFORM THE REQUESTED DENSITY CALCULATIONS drop if distance > 1 replace distance = max(distance, 0.1) gen density = population_2/distance collapse (sum) density, by(district dates)

That said, I don't think this approach to estimate population density is what would normally be used by demographers or geographers. You might want to consult somebody in one of those disciplines about whether this approach is useful.

Added: While it is not necessary for this particular task, you probably should convert that string date variable to a Stata internal format date variable if you plan to make any real use of it.

Last edited by Clyde Schechter; 24 Sep 2021, 18:52.
Comment
Meng JI

Join Date: May 2021

Posts: 77
#3

24 Sep 2021, 19:00

Originally posted by Clyde Schechter View Post

I would take a different approach altogether. Looping over observation number is rarely a good idea in Stata. There are usually, but not always, better approaches. In this case, it becomes much simpler with a different data structure: create a new data set that contains all relevant pairs of observations. Then you can use simple Stata constructs to get what you want.

Code:

isid district dates // CREATE A DATASET OF DISTRICT PAIRS MATCHED ON DATE preserve rename (district* population) =_2 tempfile holding save `holding' restore joinby dates using `holding' // CALCULATE ALL DISTANCES geodist district_lat district_lon district_lat_2 district_lon_2, gen(distance) // PERFORM THE REQUESTED DENSITY CALCULATIONS drop if distance > 1 replace distance = max(distance, 0.1) gen density = population_2/distance collapse (sum) density, by(district dates)

That said, I don't think this approach to estimate population density is what would normally be used by demographers or geographers. You might want to consult somebody in one of those disciplines about whether this approach is useful.

Added: While it is not necessary for this particular task, you probably should convert that string date variable to a Stata internal format date variable if you plan to make any real use of it.

Hi Clyde,

Thank you so much for your help. The code looks so neat and solved my problem perfectly.

I'll think more about the suggestion that you gave.

Thanks a lot!
Comment
Meng JI

Join Date: May 2021

Posts: 77
#4

24 Sep 2021, 20:37

Originally posted by Clyde Schechter View Post

I would take a different approach altogether. Looping over observation number is rarely a good idea in Stata. There are usually, but not always, better approaches. In this case, it becomes much simpler with a different data structure: create a new data set that contains all relevant pairs of observations. Then you can use simple Stata constructs to get what you want.

Code:

isid district dates // CREATE A DATASET OF DISTRICT PAIRS MATCHED ON DATE preserve rename (district* population) =_2 tempfile holding save `holding' restore joinby dates using `holding' // CALCULATE ALL DISTANCES geodist district_lat district_lon district_lat_2 district_lon_2, gen(distance) // PERFORM THE REQUESTED DENSITY CALCULATIONS drop if distance > 1 replace distance = max(distance, 0.1) gen density = population_2/distance collapse (sum) density, by(district dates)

That said, I don't think this approach to estimate population density is what would normally be used by demographers or geographers. You might want to consult somebody in one of those disciplines about whether this approach is useful.

Added: While it is not necessary for this particular task, you probably should convert that string date variable to a Stata internal format date variable if you plan to make any real use of it.

Hi Clyde,

I have a follow-up question on this code.

The code works very well on my test sample. However, when I try to implement it on the entire dataset, I met an error using isid command:

r(459) variables estate_name_en order_date should never be missing

where estate_name_en is the district in previous sample and order_date is the dates in previous sample.

I could figure out why this happened given the data structure are the same. It would be great if you have any idea about this error.

Thank you!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#5

25 Sep 2021, 11:16

The message means exactly what it says. The key variables organizing this analysis are district and date (estate_name_en and order_date). If either one is missing, you have an observation that cannot be used. Apparently there are such observations in your data--and they cannot undergo the analysis you are looking for. You can find them with:

Code:

browse if missing(estate_name_en, order_date)

Then you should figure out why they are there. The problem might be fixable by supplying the correct values for the missing variables. Or their presence might be indicative of an error in the data management that led up to the creation of your file--you should review how this file was created and fix any errors you find. Or perhaps these are records that are actually correct in so far as they go, but the missing values of those variables cannot be found. In that case, they need to be -drop-ped from the data set before doing these calculations because they cannot be included in them.
Comment
Meng JI

Join Date: May 2021

Posts: 77
#6

27 Sep 2021, 09:43

Originally posted by Clyde Schechter View Post

The message means exactly what it says. The key variables organizing this analysis are district and date (estate_name_en and order_date). If either one is missing, you have an observation that cannot be used. Apparently there are such observations in your data--and they cannot undergo the analysis you are looking for. You can find them with:

Code:

browse if missing(estate_name_en, order_date)

Then you should figure out why they are there. The problem might be fixable by supplying the correct values for the missing variables. Or their presence might be indicative of an error in the data management that led up to the creation of your file--you should review how this file was created and fix any errors you find. Or perhaps these are records that are actually correct in so far as they go, but the missing values of those variables cannot be found. In that case, they need to be -drop-ped from the data set before doing these calculations because they cannot be included in them.

Hi Clyde,

I see. I'll look at the dataset to see if I miss anything and see how to fix it.

Thank you so much for your reply.
Comment

Announcement

How to loop using date

Comment

Comment

Comment

Comment

Comment