Subtracting values between groups in panel data

Brian Zhang

Join Date: Dec 2014

Posts: 5
#1

Subtracting values between groups in panel data

21 Dec 2014, 02:54

Hi everyone,

My data looks something like this:

group id | time | value | nearest neighbor group id

1 | year1 | 10 | 3
1 | year2 | 20 | 3
1 | year3 | 30 | 3
1 | year4 | 40 | 3
2 | year1 | 100 | 1
2 | year2 | 200 | 1
2 | year3 | 300 | 1
2 | year4 | 400 | 1
3 | year1 | 1000 | 2
3 | year2 | 2000 | 2
3 | year3 | 3000 | 2
3 | year4 | 4000 | 2

What I'd like to do is generate differences of value between each group and its nearest neighbor. For the first row, I'd take group 1's year1 value and subtract that from the year1 value of its neighbor, group 3 (100-1000). What I'm stuck on is how to create a "neighbor value" column, which would match the neighbor id with a group id and pull the group's value for the correct year.
Also, different groups may have the same neighbor.

Want:

group id | time | value | nearest neighbor group id | neighbor value | difference

1 | year1 | 10 | 3 | 1000
1 | year2 | 20 | 3 | 2000
1 | year3 | 30 | 3 | 3000
1 | year4 | 40 | 3 | 4000
2 | year1 | 100 | 1 | 10
2 | year2 | 200 | 1 | 20
2 | year3 | 300 | 1 | 30
2 | year4 | 400 | 1 | 40
3 | year1 | 1000 | 2 | 100
3 | year2 | 2000 | 2 | 200
3 | year3 | 3000 | 2 | 300
3 | year4 | 4000 | 2 | 400

Using Stata 12

Thanks,
Brian
Tags: None

Svend Juul

Join Date: Apr 2014
Posts: 515

21 Dec 2014, 03:40

Code:

clear
input id time value n_id
1  1  10  3
 1  2  20  3
 1  3  30  3
 1  4  40  3
 2  1  100  1
 2  2  200  1
 2  3  300  1
 2  4  400  1
 3  1  1000  2
 3  2  2000  2
 3  3  3000  2
 3  4  4000  2
end
save data1.dta , replace
keep n_id time value
rename n_id id
rename value n_value
merge m:1 id time using data1.dta
drop _merge
gen diff = n_value-value
order id time value n_id . list , clean
       id   time   value   n_id   n_value    diff 
  1.    1      1      10      3       100      90 
  2.    1      2      20      3       200     180 
  3.    1      3      30      3       300     270 
  4.    1      4      40      3       400     360 
  5.    2      1     100      1      1000     900 
  6.    2      2     200      1      2000    1800 
  7.    2      3     300      1      3000    2700 
  8.    2      4     400      1      4000    3600 
  9.    3      1    1000      2        10    -990 
 10.    3      2    2000      2        20   -1980 
 11.    3      3    3000      2        30   -2970 
 12.    3      4    4000      2        40   -3960

Comment

Brian Zhang

Join Date: Dec 2014

Posts: 5
#3

21 Dec 2014, 10:34

Thanks for the response. I think this would work if my neighbors were unique (it was just the groups in a different order), but there are some repeats. I'm getting the error message "variables do not uniquely identify observations in the using data".
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#4

21 Dec 2014, 11:09

If you mean that one person (id) can have multiple neighbors, then you can create a key file dropping duplicate information and -merge- that into the original data:

Code:

clear input /// id time value n_id 1 1 10 3 1 2 20 3 1 3 30 3 1 4 40 3 2 1 100 1 2 2 200 1 2 3 300 1 2 4 400 1 1 1 10 2 1 2 20 2 1 3 30 2 1 4 40 2 3 1 1000 2 3 2 2000 2 3 3 3000 2 3 4 4000 2 end tempfile data1 save "`data1'" , replace // drop duplicate panels bysort id (n_id): gen num = sum(n_id[_n-1] != n_id) by id: keep if num == 1 keep id time value rename (id value) (n_id n_value) tempfile key save "`key'", replace *----- use "`data1'", clear merge m:1 n_id time using "`key'", nogenerate sort id n_id time list, sepby(id n_id) <do subtractions>

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
Brian Zhang

Join Date: Dec 2014

Posts: 5
#5

21 Dec 2014, 11:19

No, each person has one neighbor, but two people can have the same neighbor. Sorry for not being clear. The exact message is "variables id time do not uniquely identify observations in the using data".
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#6

21 Dec 2014, 11:28

Originally posted by Brian Zhang View Post

No, each person has one neighbor, but two people can have the same neighbor. Sorry for not being clear. The exact message is "variables id time do not uniquely identify observations in the using data".

Svend's code does not issue that error with the setup you mention:

Code:

clear input id time value n_id 1 1 10 3 1 2 20 3 1 3 30 3 1 4 40 3 2 1 100 3 2 2 200 3 2 3 300 3 2 4 400 3 3 1 1000 2 3 2 2000 2 3 3 3000 2 3 4 4000 2 end tempfile data1 save "`data1'" , replace keep n_id time value rename n_id id rename value n_value merge m:1 id time using "`data1'" drop _merge list, sepby(id)

(Although another problem arises.)

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
1 like
Comment
Svend Juul

Join Date: Apr 2014

Posts: 515
#7

21 Dec 2014, 12:58

Brian:
In my merge setup, the original dataset is the using dataset, and the derived dataset with the neighbor id is the master dataset, hence the m:1 merge. Please show exactly what you typed, leading to the error message.

To diagnose the problem, try this for both datasets:

Code:

duplicates report id time

According to the information provided by you, there should be no duplicates in the original (using) dataset, but there may be duplicates in the derived (master) dataset.
Comment
Brian Zhang

Join Date: Dec 2014

Posts: 5
#8

21 Dec 2014, 17:11

Thanks for the help Svend and Roberto. My data had some other problems that had to be cleaned up, but now I have Svend's solution working on my dataset. Sorry about the trouble.
Comment

Announcement

Subtracting values between groups in panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment