Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to loop using date

    Hi everyone,

    I have a dataset like this,

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 district str9 dates float(district_lat district_lon) byte population
    "a" "2021/10/1" 22.248306 114.15244 10
    "a" "2021/10/2" 22.248306 114.15244 20
    "a" "2021/10/3" 22.248306 114.15244 30
    "a" "2021/10/4" 22.248306 114.15244 40
    "b" "2021/10/1" 22.279636  114.1655 10
    "b" "2021/10/2" 22.279636  114.1655 20
    "b" "2021/10/3" 22.279636  114.1655 30
    "b" "2021/10/4" 22.279636  114.1655 40
    "b" "2021/10/5" 22.279636  114.1655 50
    "c" "2021/10/1"  22.24186 114.15285 10
    "c" "2021/10/2"  22.24186 114.15285 20
    "d" "2021/10/1"  22.28582  114.1998 30
    "e" "2021/10/1"  22.28598  114.1915 20
    "e" "2021/10/2"  22.28598  114.1915 30
    "f" "2021/10/1"  22.27999  114.1588 60
    "g" "2021/10/2"  22.26781 114.23608 50
    "h" "2021/10/2" 22.335806  114.1495 40
    "i" "2021/10/4"  22.33608 114.20561 30
    end
    I have several different locations and I want to calculate the daily weighted population density for each location i.
    First, I calculate the distance between i and j and generate d_ij
    Second, I check if the distance between i and j (d_ij) is smaller than 1km
    Third, if the distance between i and j (d_ij) is smaller than 1km, I calculate the inverse of the distance times the population density in location j, generate density=(1/d_ij)*populationj
    Fourth, for location i, I sum up all the densities generated.
    Fifth, the population of a location is changing each day and I need to do the calculation for location i in each day.

    I have a code that looks like this
    forval i = 1/`=_N' {
    local olat = latitude[`i']
    local olong = longitude[`i']
    local pop=population[`i']
    geodist latitude longitude `olat' `olong', gen(dist`i')
    replace dist`i'=. if dist`i'>1
    replace dist`i'=0.1 if dist`i'<0.1
    replace dist`i'=1/dist`i'*`pop' if dist`i'<1
    }
    but the loop can't give me exactly what I want and I can't add the date element in this. I wonder if you know how to add the date element into the loop?

    Thank you very much for your time!

  • #2
    I would take a different approach altogether. Looping over observation number is rarely a good idea in Stata. There are usually, but not always, better approaches. In this case, it becomes much simpler with a different data structure: create a new data set that contains all relevant pairs of observations. Then you can use simple Stata constructs to get what you want.

    Code:
    isid district dates
    
    // CREATE A DATASET OF DISTRICT PAIRS MATCHED ON DATE
    preserve
    rename (district* population) =_2
    tempfile holding
    save `holding'
    restore
    joinby dates using `holding'
    
    // CALCULATE ALL DISTANCES
    geodist district_lat district_lon district_lat_2 district_lon_2, gen(distance)
    
    // PERFORM THE REQUESTED DENSITY CALCULATIONS
    drop if distance > 1
    replace distance = max(distance, 0.1)
    gen density = population_2/distance
    collapse (sum) density, by(district dates)
    That said, I don't think this approach to estimate population density is what would normally be used by demographers or geographers. You might want to consult somebody in one of those disciplines about whether this approach is useful.

    Added: While it is not necessary for this particular task, you probably should convert that string date variable to a Stata internal format date variable if you plan to make any real use of it.
    Last edited by Clyde Schechter; 24 Sep 2021, 18:52.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I would take a different approach altogether. Looping over observation number is rarely a good idea in Stata. There are usually, but not always, better approaches. In this case, it becomes much simpler with a different data structure: create a new data set that contains all relevant pairs of observations. Then you can use simple Stata constructs to get what you want.

      Code:
      isid district dates
      
      // CREATE A DATASET OF DISTRICT PAIRS MATCHED ON DATE
      preserve
      rename (district* population) =_2
      tempfile holding
      save `holding'
      restore
      joinby dates using `holding'
      
      // CALCULATE ALL DISTANCES
      geodist district_lat district_lon district_lat_2 district_lon_2, gen(distance)
      
      // PERFORM THE REQUESTED DENSITY CALCULATIONS
      drop if distance > 1
      replace distance = max(distance, 0.1)
      gen density = population_2/distance
      collapse (sum) density, by(district dates)
      That said, I don't think this approach to estimate population density is what would normally be used by demographers or geographers. You might want to consult somebody in one of those disciplines about whether this approach is useful.

      Added: While it is not necessary for this particular task, you probably should convert that string date variable to a Stata internal format date variable if you plan to make any real use of it.
      Hi Clyde,

      Thank you so much for your help. The code looks so neat and solved my problem perfectly.

      I'll think more about the suggestion that you gave.

      Thanks a lot!



      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        I would take a different approach altogether. Looping over observation number is rarely a good idea in Stata. There are usually, but not always, better approaches. In this case, it becomes much simpler with a different data structure: create a new data set that contains all relevant pairs of observations. Then you can use simple Stata constructs to get what you want.

        Code:
        isid district dates
        
        // CREATE A DATASET OF DISTRICT PAIRS MATCHED ON DATE
        preserve
        rename (district* population) =_2
        tempfile holding
        save `holding'
        restore
        joinby dates using `holding'
        
        // CALCULATE ALL DISTANCES
        geodist district_lat district_lon district_lat_2 district_lon_2, gen(distance)
        
        // PERFORM THE REQUESTED DENSITY CALCULATIONS
        drop if distance > 1
        replace distance = max(distance, 0.1)
        gen density = population_2/distance
        collapse (sum) density, by(district dates)
        That said, I don't think this approach to estimate population density is what would normally be used by demographers or geographers. You might want to consult somebody in one of those disciplines about whether this approach is useful.

        Added: While it is not necessary for this particular task, you probably should convert that string date variable to a Stata internal format date variable if you plan to make any real use of it.
        Hi Clyde,

        I have a follow-up question on this code.

        The code works very well on my test sample. However, when I try to implement it on the entire dataset, I met an error using isid command:

        r(459) variables estate_name_en order_date should never be missing

        where estate_name_en is the district in previous sample and order_date is the dates in previous sample.

        I could figure out why this happened given the data structure are the same. It would be great if you have any idea about this error.

        Thank you!

        Comment


        • #5
          The message means exactly what it says. The key variables organizing this analysis are district and date (estate_name_en and order_date). If either one is missing, you have an observation that cannot be used. Apparently there are such observations in your data--and they cannot undergo the analysis you are looking for. You can find them with:

          Code:
          browse if missing(estate_name_en, order_date)
          Then you should figure out why they are there. The problem might be fixable by supplying the correct values for the missing variables. Or their presence might be indicative of an error in the data management that led up to the creation of your file--you should review how this file was created and fix any errors you find. Or perhaps these are records that are actually correct in so far as they go, but the missing values of those variables cannot be found. In that case, they need to be -drop-ped from the data set before doing these calculations because they cannot be included in them.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            The message means exactly what it says. The key variables organizing this analysis are district and date (estate_name_en and order_date). If either one is missing, you have an observation that cannot be used. Apparently there are such observations in your data--and they cannot undergo the analysis you are looking for. You can find them with:

            Code:
            browse if missing(estate_name_en, order_date)
            Then you should figure out why they are there. The problem might be fixable by supplying the correct values for the missing variables. Or their presence might be indicative of an error in the data management that led up to the creation of your file--you should review how this file was created and fix any errors you find. Or perhaps these are records that are actually correct in so far as they go, but the missing values of those variables cannot be found. In that case, they need to be -drop-ped from the data set before doing these calculations because they cannot be included in them.
            Hi Clyde,

            I see. I'll look at the dataset to see if I miss anything and see how to fix it.

            Thank you so much for your reply.

            Comment

            Working...
            X