Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inverse distance spatial matrix using Stata

    Dear All,

    I am using Stata 15 to carry out my spatial analysis. I generate an inverse distance function using spmatrix idistance.

    Specifically I type:


    spmatrix create idistance dW, replace

    spmatrix export dW using dW.txt, replace

    Then, I import the file in Stata:

    insheet using "dW.txt", delim(" ") clear
    drop in 1
    drop v1
    save "dW.dta", replace


    For some specific reasons, I need to use mata to create my own Matrix. Nonetheless, to be sure that I proceed correctly, I firstly try to replicate the inverse distance I have done previously. I am using an example that I found into the sp manual (p. 128). So i type:


    mata:
    id = st_data(., "_ID")
    location = st_data(., ("_CX", "_CY"))
    N = st_nobs()
    W = J(N, N, 0)
    for (i=1; i<=N; i++) {
    for (j=1; j<i; j++) {
    delta = location[i,.] - location[j,.]
    W[i,j] = W[j,i] = 1/sqrt(delta*delta')
    }
    }
    end


    spmatrix spfrommata dW2= W id,

    spmatrix export dW2 using dW2.txt, replace

    insheet using "dW2.txt", delim(" ") clear
    drop in 1
    drop v1
    save "dW2.dta", replace

    Now if everything works, I should observe the same entries in each ij position. I have been struggling all the day, but in fact I obtain different results. Why this is the case?

    If you may want to have a look at the files, this is the link to the dataset I am using (already spset): https://dl.dropboxusercontent.com/u/27015126/data.dta
    Here, instead is the linked shapefile: https://dl.dropboxusercontent.com/u/...M_2013_shp.dta

    Thanks in advance.

    Dario
    Last edited by Dario Maimone Ansaldo Patti; 19 Jun 2017, 15:34.

  • #2
    I downloaded the two files and ran the following:

    Code:
    clear all
    
    use "data.dta"
    spmatrix create idistance dW, replace
    spmatrix export dW using dW.txt, replace
    
    insheet using "dW.txt", delim(" ") clear
    save "dW.dta", replace
    
    use "data.dta"
    mata:
    id = st_data(., "_ID")
    location = st_data(., ("_CX", "_CY"))
    N = st_nobs()
    W = J(N, N, 0)
    for (i=1; i<=N; i++) {
    for (j=1; j<i; j++) {
    delta = location[i,.] - location[j,.]
    W[i,j] = W[j,i] = 1/sqrt(delta*delta')
    }
    }
    end
    spmatrix spfrommata dW2= W id,
    spmatrix export dW2 using dW2.txt, replace
    
    insheet using "dW2.txt", delim(" ") clear
    save "dW2.dta", replace
    
    * compare both 
    use "dW.dta"
    cf _all using "dW2.dta", all
    and the results show no difference. If you look at the text files themselves, there are a few differences at the least significant digit due to precision errors. So I would say that you are generating the same thing.

    Are you sure that these coordinates are planar? They look like geographic coordinates to me.

    Comment


    • #3
      Hi,

      Thanks. ​That it is strange. When i run it I get that the entries are different. Something like.0.37 and 0.41 the two coordinate are longitudinal and latitude. Actually i used to change them in kilometers.

      I will try again and in case i will post my results.

      Comment


      • #4
        I had a closer look and you indeed have lat/lon coordinates so you should use
        Code:
        spset, modify coordsys(latlong, kilometers)
        to indicate that distances should be calculated using great-circle distances (Haversine formula). See the recent announcement for geo2xy (from SSC) for visual illustrations of the difference between planar and geographic coordinates and why you should not use Euclidean distances with lat/lon coordinates.

        You can use geodist (from SSC) to calculate the distances but Stata's sp* command suite uses a slightly different radius of the earth when calculating distances using the Haversine formula. I could not find a description of the radius used but you can infer it using:

        Code:
        use "data.dta"
        sort _ID
        spset, modify coordsys(latlong, kilometers)
        
        * calculate distance using spdistance 
        spdistance 1 2
        scalar d_sp = r(distance)
        
        * redo using geodist (from SSC) using haversine formula
        scalar lat1 = _CY[1]
        scalar lon1 = _CX[1]
        scalar lat2 = _CY[2]
        scalar lon2 = _CX[2]
        geodist lat1 lon1 lat2 lon2, sphere
        scalar d_gd = r(distance)
        
        * spdistance uses a different earth radius
        dis %21.0g d_sp / d_gd * 6371
        
        * redo with the radius used by spdistance
        geodist lat1 lon1 lat2 lon2, sphere radius(6374.611584000008406)
        dis r(distance) - d_sp
        and the results:
        Code:
        . use "data.dta"
        
        . sort _ID
        
        . spset, modify coordsys(latlong, kilometers)
          Sp dataset data.dta
                        data:  cross sectional
             spatial-unit id:  _ID
                 coordinates:  _CY, _CX (latitude-and-longitude, kilometers)
            linked shapefile:  NUTS_RG_01M_2013_shp.dta
        
        
        . 
        . * calculate distance using spdistance 
        . spdistance 1 2
          (data currently use latitude and longitude)
        
                 _ID       (longitude, latitude)
          ---------------------------------------
                   1       (4.238881, 51.03818)
                   2       (4.370558, 50.836)
          ---------------------------------------
            distance       24.314504 kilometers
        
        . scalar d_sp = r(distance)
        
        . 
        . * redo using geodist (from SSC) using haversine formula
        . scalar lat1 = _CY[1]
        
        . scalar lon1 = _CX[1]
        
        . scalar lat2 = _CY[2]
        
        . scalar lon2 = _CX[2]
        
        . geodist lat1 lon1 lat2 lon2, sphere
        Great-circle distance (haversine formula, radius of 6371km) = 24.300728 km
        
        . scalar d_gd = r(distance)
        
        . 
        . * spdistance uses a different earth radius
        . dis %21.0g d_sp / d_gd * 6371
         6374.611584000008406
        
        . 
        . * redo with the radius used by spdistance
        . geodist lat1 lon1 lat2 lon2, sphere radius(6374.611584000008406)
        Great-circle distance (haversine formula, radius of 6374.611584000008406km) = 24.314504 km
        
        . dis r(distance) - d_sp 
        -3.553e-15
        
        .
        Now that you have a way to replicate the distances calculated by Stata's sp* commands, you can repeat what you were trying to do in #1 using great-circle distances. You could implement the Haversine formula in Mata but you can also do this in Stata using cross to form all pairwise combinations. In the following example, I specify that no normalization be done so that you can compare raw inverse distances.
        Code:
        * export the distance matrix without normalization
        clear all
        use "data.dta"
        sort _ID
        spset, modify coordsys(latlong, kilometers)
        spmatrix create idistance dW, replace normalize(none)
        spmatrix export dW using dW.txt, replace
        insheet using "dW.txt", delim(" ") clear
        save "dW.dta", replace
        
        * create a distance matrix using -geodist-
        use "data.dta"
        sort _ID
        keep _ID _CX _CY
        tempfile hold
        save "`hold'"
        rename _* _*0
        cross using "`hold'"
        geodist _CY0 _CX0 _CY _CX, sphere gen(v) radius(6374.611584000008406)
        drop _CY* _CX*
        replace v = 1/v if v != 0
        reshape wide v, i(_ID) j(_ID0)
        mata: id = st_data(., "_ID")
        drop _ID
        mata: W = st_data(.,.)
        
        * import the Mata matrix without normalization
        spmatrix spfrommata dW2=W id, normalize(none)
        
        * export and compare to the one generated by spmatrix create
        spmatrix export dW2 using dW2.txt, replace
        insheet using "dW2.txt", delim(" ") clear
        save "dW2.dta", replace
        
        * compare both 
        cf _all using "dW.dta", all
        Both datasets are identical. You can also remove both normalize(none) options and the results will also match.

        Comment


        • #5
          Dear Robert,

          thank you very much. Now it works with both normlize none and spectral. So, if I am correct the problem is with the way in which sp* calculates the radius. I have another question.

          I tried to generate an inverse of the squared distance - normalize (none). So basically, I modified your code writing

          replace v = 1/(v^2) if v != 0

          Then I compared the two matrices and in fact if I square the entries in dW, I obtain those in dW2.

          Instead, I cannot obtain the same if I impose normalize(spectral), the default. The numbers are pretty different. For instance v3 second row in dW.dta is 0.37601581, therefore I would expect that the same entry in dW2 should be 0.14139, while on the contrary it is 0.96809894. why there is such a large difference? I might guess that this is a consequence of the normalization process?

          Thanks again

          Dario

          Comment


          • #6
            I think you are just making an error somewhere when generating the weight matrices using the two approaches.

            In the following example, I create a custom weight matrix using sp commands. I first create a weight matrix without normalization so that I can recover the calculated distances (they are stored as the reciprocal of the distance). I export the sp matrix to Mata, generate new weights based on the actual distances, and copy back the Mata matrix to sp. Then I do the same thing from scratch using geodist. As you can see, both approaches generate exactly the same weight matrix (with spectral normalization).

            Code:
            * create a weight matrix based on geographic distances without normalization
            clear all
            use "data.dta"
            sort _ID
            spset, modify coordsys(latlong, kilometers)
            spmatrix create idistance dW, replace normalize(none)
            
            * export the sp matrix to Mata
            spmatrix matafromsp W id = dW
            
            * use reciprocal to recover distances and generate new weights
            mata: 
                D = 1 :/ W
                W2 =  editmissing(1 :/ (D:^2), 0)
            end
            
            * copy the W2 Mata matrix to sp, default is spectral normalization
            spmatrix spfrommata dW2 = W2 id, replace
            
            * export weight matrix and read it back to compare later
            spmatrix export dW2 using "dW2.txt", replace
            insheet using "dW2.txt", delim(" ") clear
            save "dW2.dta", replace
            
            
            
            * create custom distance matrix using -geodist-
            clear all
            use "data.dta"
            sort _ID
            keep _ID _CX _CY
            tempfile hold
            save "`hold'"
            rename _* _*0
            cross using "`hold'"
            geodist _CY0 _CX0 _CY _CX, sphere gen(v) radius(6374.611584000008406)
            drop _CY* _CX*
            replace v = 1/(v^2) if v != 0
            reshape wide v, i(_ID) j(_ID0)
            mata: id = st_data(., "_ID")
            drop _ID
            mata: W2_geo = st_data(.,.)
            
            * copy the W2_geo Mata matrix to sp, default is spectral normalization
            spmatrix spfrommata dW2_geo=W2_geo id
            
            * export weight matrix and read it back to compare with above
            spmatrix export dW2_geo using "dW2_geo.txt", replace
            insheet using "dW2_geo.txt", delim(" ") clear
            save "dW2_geo.dta", replace
            
            * compare both 
            cf _all using "dW2.dta", all

            Comment


            • #7
              Dear Robert,

              thank you very much for all your kind help. I tried to replicate everything and I was successful. Now all is much clearer.

              Best regards,

              Dario

              P.S. Your geodist is really interesting. I did not know it (I am a bit new to spatial econometrics). But I think, I will use it extensively.

              Comment


              • #8
                Dear Robert,

                Good afternoon, I started learning STATA this year. Could anyone help me to create a procedure for creating a space weight matrix for the following attached shapefile?

                There are 5570 lines, I do not know if the STATA 15 IC supports so many lines.

                Stata does not work after giving the command:
                Spmatrix create

                Does anyone know why this happens? Is it because 5570 lines are and the version is IC? How can I solve this problem and generate the matrix for so many lines?

                Thank you

                Daniela Souza

                Comment

                Working...
                X