Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the minimum of difference between all observations by group

    Hi, everybody,

    I am trying to calculate the minimum of differences between all observations by group.

    For example, I have this data,
    obs var group
    1 1 1
    2 3 1
    3 6 1
    4 2 2
    5 1 2
    6 2 2
    7 3 2
    and I want,
    obs var group Want
    1 1 1 2
    2 3 1 2
    3 6 1 3
    4 2 2 0
    5 1 2 1
    6 2 2 0
    7 3 2 1
    For observation 1, it belongs to group 1. So the difference between obs 1 and obs 2 is 2, between obs 1 and obs 3 is 5. So the min diff for obs 1 is 2. I am thinking of using a loop, the logic is within each group, for observation i, calculate the diff between i and all but i, and return the min of the diffs. And move to the next i until it is the last observation in that group. But I am not sure how to put it in Stata code. Please help, thank you so much!

    Kevin

  • #2
    No loop needed. After sorting, all you need to do is compare each value with the previous and the following values. Note that var[1] - var[0] will be returned as missing, which gets ignored by min() here, which is fine.

    Code:
    clear 
    input obs    var    group Want 
    1    1    1 2 
    2    3    1 2
    3    6    1 3
    4    2    2 0 
    5    1    2 1
    6    2    2 0 
    7    3    2 1 
    end 
    
    bysort group (var) : gen wanted = min(var - var[_n-1], var[_n+1] - var)  
    
    sort obs 
    
    assert Want == wanted 
    
    list, sepby(group) 
    
         +-----------------------------------+
         | obs   var   group   Want   wanted |
         |-----------------------------------|
      1. |   1     1       1      2        2 |
      2. |   2     3       1      2        2 |
      3. |   3     6       1      3        3 |
         |-----------------------------------|
      4. |   4     2       2      0        0 |
      5. |   5     1       2      1        1 |
      6. |   6     2       2      0        0 |
      7. |   7     3       2      1        1 |
         +-----------------------------------+

    Comment


    • #3
      Originally posted by Nick Cox View Post
      No loop needed. After sorting, all you need to do is compare each value with the previous and the following values. Note that var[1] - var[0] will be returned as missing, which gets ignored by min() here, which is fine.

      Code:
      clear
      input obs var group Want
      1 1 1 2
      2 3 1 2
      3 6 1 3
      4 2 2 0
      5 1 2 1
      6 2 2 0
      7 3 2 1
      end
      
      bysort group (var) : gen wanted = min(var - var[_n-1], var[_n+1] - var)
      
      sort obs
      
      assert Want == wanted
      
      list, sepby(group)
      
      +-----------------------------------+
      | obs var group Want wanted |
      |-----------------------------------|
      1. | 1 1 1 2 2 |
      2. | 2 3 1 2 2 |
      3. | 3 6 1 3 3 |
      |-----------------------------------|
      4. | 4 2 2 0 0 |
      5. | 5 1 2 1 1 |
      6. | 6 2 2 0 0 |
      7. | 7 3 2 1 1 |
      +-----------------------------------+
      Thanks, Nick! That helped me out.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        No loop needed. After sorting, all you need to do is compare each value with the previous and the following values. Note that var[1] - var[0] will be returned as missing, which gets ignored by min() here, which is fine.

        Code:
        clear
        input obs var group Want
        1 1 1 2
        2 3 1 2
        3 6 1 3
        4 2 2 0
        5 1 2 1
        6 2 2 0
        7 3 2 1
        end
        
        bysort group (var) : gen wanted = min(var - var[_n-1], var[_n+1] - var)
        
        sort obs
        
        assert Want == wanted
        
        list, sepby(group)
        
        +-----------------------------------+
        | obs var group Want wanted |
        |-----------------------------------|
        1. | 1 1 1 2 2 |
        2. | 2 3 1 2 2 |
        3. | 3 6 1 3 3 |
        |-----------------------------------|
        4. | 4 2 2 0 0 |
        5. | 5 1 2 1 1 |
        6. | 6 2 2 0 0 |
        7. | 7 3 2 1 1 |
        +-----------------------------------+
        Hi, Nick.

        I looked into your code and it actually worked. You used a trick to sort the var first. Then by comparing the differences between its nearest neighbors could get the result. But what if the var interested is based on 2 or more variables. Say, I have the longitude and latitude for each observation, and I want to find out the minimum distance between that observation and all the other observations in the same group. A sample would look like,

        input obs group lon lat
        1 1 -121.9672 37.37035
        2 1 -78.63793 35.77534
        3 1 -122.0404 37.39246
        4 1 -84.36759 33.85174
        5 1 -97.08716 32.90313
        6 2 -93.44968 44.85718
        7 2 -97.05071 32.82806
        8 2 -122.3253 37.56183
        9 2 -112.0678 33.45248
        10 2 -87.89692 42.29396

        end

        If it is like this, I can't sort the distance first because I don't have it in the data yet. A function to calculate the distance would be geodist. Could you please help me with that? Thanks!

        Comment


        • #5
          geodist is a command from SSC. Surely you should study its help, experiment and come back from a more precise question.

          Comment


          • #6
            Also, if the objective is to find the nearest neighbor(s), see geonear (from SSC). Further, if this is to be done within groups, the help file for runby (from SSC) shows how to do that.

            Comment


            • #7
              Originally posted by Robert Picard View Post
              Also, if the objective is to find the nearest neighbor(s), see geonear (from SSC). Further, if this is to be done within groups, the help file for runby (from SSC) shows how to do that.
              Thanks, Robert. I found the programs you wrote are very helpful, especially in my case.

              Comment

              Working...
              X