Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subtracting values between groups in panel data

    Hi everyone,

    My data looks something like this:

    group id | time | value | nearest neighbor group id

    1 | year1 | 10 | 3
    1 | year2 | 20 | 3
    1 | year3 | 30 | 3
    1 | year4 | 40 | 3
    2 | year1 | 100 | 1
    2 | year2 | 200 | 1
    2 | year3 | 300 | 1
    2 | year4 | 400 | 1
    3 | year1 | 1000 | 2
    3 | year2 | 2000 | 2
    3 | year3 | 3000 | 2
    3 | year4 | 4000 | 2

    What I'd like to do is generate differences of value between each group and its nearest neighbor. For the first row, I'd take group 1's year1 value and subtract that from the year1 value of its neighbor, group 3 (100-1000). What I'm stuck on is how to create a "neighbor value" column, which would match the neighbor id with a group id and pull the group's value for the correct year.
    Also, different groups may have the same neighbor.

    Want:

    group id | time | value | nearest neighbor group id | neighbor value | difference

    1 | year1 | 10 | 3 | 1000
    1 | year2 | 20 | 3 | 2000
    1 | year3 | 30 | 3 | 3000
    1 | year4 | 40 | 3 | 4000
    2 | year1 | 100 | 1 | 10
    2 | year2 | 200 | 1 | 20
    2 | year3 | 300 | 1 | 30
    2 | year4 | 400 | 1 | 40
    3 | year1 | 1000 | 2 | 100
    3 | year2 | 2000 | 2 | 200
    3 | year3 | 3000 | 2 | 300
    3 | year4 | 4000 | 2 | 400

    Using Stata 12

    Thanks,
    Brian

  • #2
    Code:
    clear
    input id time value n_id
    1  1  10  3
     1  2  20  3
     1  3  30  3
     1  4  40  3
     2  1  100  1
     2  2  200  1
     2  3  300  1
     2  4  400  1
     3  1  1000  2
     3  2  2000  2
     3  3  3000  2
     3  4  4000  2
    end
    save data1.dta , replace
    keep n_id time value
    rename n_id id
    rename value n_value
    merge m:1 id time using data1.dta
    drop _merge
    gen diff = n_value-value
    order id time value n_id . list , clean
           id   time   value   n_id   n_value    diff 
      1.    1      1      10      3       100      90 
      2.    1      2      20      3       200     180 
      3.    1      3      30      3       300     270 
      4.    1      4      40      3       400     360 
      5.    2      1     100      1      1000     900 
      6.    2      2     200      1      2000    1800 
      7.    2      3     300      1      3000    2700 
      8.    2      4     400      1      4000    3600 
      9.    3      1    1000      2        10    -990 
     10.    3      2    2000      2        20   -1980 
     11.    3      3    3000      2        30   -2970 
     12.    3      4    4000      2        40   -3960

    Comment


    • #3
      Thanks for the response. I think this would work if my neighbors were unique (it was just the groups in a different order), but there are some repeats. I'm getting the error message "variables do not uniquely identify observations in the using data".

      Comment


      • #4
        If you mean that one person (id) can have multiple neighbors, then you can create a key file dropping duplicate information and -merge- that into the original data:

        Code:
        clear
        input ///
        id time value n_id
         1  1  10  3
         1  2  20  3
         1  3  30  3
         1  4  40  3
         2  1  100  1
         2  2  200  1
         2  3  300  1
         2  4  400  1
         1  1  10  2
         1  2  20  2
         1  3  30  2
         1  4  40  2
         3  1  1000  2
         3  2  2000  2
         3  3  3000  2
         3  4  4000  2
        end
        
        tempfile data1
        save "`data1'" , replace
        
        // drop duplicate panels
        bysort id (n_id): gen num = sum(n_id[_n-1] != n_id)
        by id: keep if num == 1
        
        keep id time value
        rename (id value) (n_id n_value)
        
        tempfile key
        save "`key'", replace
        
        *-----
        
        use "`data1'", clear
        
        merge m:1 n_id time using "`key'", nogenerate
        
        sort id n_id time
        list, sepby(id n_id)
        
        <do subtractions>
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment


        • #5
          No, each person has one neighbor, but two people can have the same neighbor. Sorry for not being clear. The exact message is "variables id time do not uniquely identify observations in the using data".

          Comment


          • #6
            Originally posted by Brian Zhang View Post
            No, each person has one neighbor, but two people can have the same neighbor. Sorry for not being clear. The exact message is "variables id time do not uniquely identify observations in the using data".
            Svend's code does not issue that error with the setup you mention:

            Code:
            clear
            input id time value n_id
            1  1  10  3
             1  2  20  3
             1  3  30  3
             1  4  40  3
             2  1  100  3
             2  2  200  3
             2  3  300  3
             2  4  400  3
             3  1  1000  2
             3  2  2000  2
             3  3  3000  2
             3  4  4000  2
            end
            
            tempfile data1
            save "`data1'" , replace
            
            keep n_id time value
            rename n_id id
            rename value n_value
            
            merge m:1 id time using "`data1'"
            drop _merge
            
            list, sepby(id)
            (Although another problem arises.)
            You should:

            1. Read the FAQ carefully.

            2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

            3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

            4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

            Comment


            • #7
              Brian:
              In my merge setup, the original dataset is the using dataset, and the derived dataset with the neighbor id is the master dataset, hence the m:1 merge. Please show exactly what you typed, leading to the error message.

              To diagnose the problem, try this for both datasets:
              Code:
              duplicates report id time
              According to the information provided by you, there should be no duplicates in the original (using) dataset, but there may be duplicates in the derived (master) dataset.

              Comment


              • #8
                Thanks for the help Svend and Roberto. My data had some other problems that had to be cleaned up, but now I have Svend's solution working on my dataset. Sorry about the trouble.

                Comment

                Working...
                X