Generating neighbors' values for panel data

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2129
#1

Generating neighbors' values for panel data

01 May 2016, 10:00

Hi All:

I have a problem where I have a (balanced) panel data set, and each cross-sectional unit in the data set comes with two neighbors, contained in the variables nb1 and nb2 (although some neighbor values are missing). I'd like to put the values of some of the neighbors' covariates on the proper line so I can include them in regression analysis. I know how to do this without a panel structure, and I found a recent helpful entry by Nick Cox that confirms what I thought. But I'm stuck for the panel data case.

The identities of the neighbors do not change over time.

The first block below gives a simple example of the structure, and the second block gives what I would like to have. I'd really appreciate any hints. Thanks, Jeff.

Code:

id year nb1 nb2 x 1 2000 2 3 100 1 2001 2 3 110 1 2002 2 3 120 2 2000 1 4 200 2 2001 1 4 210 2 2002 1 4 220 3 2000 4 . 300 3 2001 4 . 310 3 2002 4 . 320 4 2000 3 2 400 4 2001 3 2 410 4 2002 3 2 420 id year nb1 nb2 x x_nb1 x_nb2 1 2000 2 3 100 200 300 1 2001 2 3 110 210 310 1 2002 2 3 120 220 320 2 2000 1 4 200 100 400 2 2001 1 4 210 110 410 2 2002 1 4 220 120 420 3 2000 4 . 300 400 . 3 2001 4 . 310 410 . 3 2002 4 . 320 420 . 4 2000 3 2 400 300 200 4 2001 3 2 410 310 210 4 2002 3 2 420 320 220
Tags: None

1 like
Sebastian Kripfganz

Join Date: May 2014

Posts: 2577
#2

01 May 2016, 10:23

Hi Jeff, the following may not be the most elegant way of doing it but it gives you the desired results:

Code:

sort id year by id: gen n = _n local N = n[_N] gen x_nb1 = x[(nb1 - 1) * `N' + n] gen x_nb2 = x[(nb2 - 1) * `N' + n] drop n

https://www.kripfganz.de/stata/
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

01 May 2016, 10:32

The typical way to match values from other observations is to use merge.

Code:

clear
input id year nb1 nb2 x
1 2000 2 3 100
1 2001 2 3 110
1 2002 2 3 120
2 2000 1 4 200
2 2001 1 4 210
2 2002 1 4 220
3 2000 4 . 300
3 2001 4 . 310
3 2002 4 . 320
4 2000 3 2 400
4 2001 3 2 410
4 2002 3 2 420
end
save "main.dta", replace

* save neighbor data for nb1
keep id year x
rename (id x) (nb1 x_nb1)
save "nb1.dta", replace

* save neighbor data for nb2
use "main.dta", clear
keep id year x
rename (id x) (nb2 x_nb2)
save "nb2.dta", replace

* go back to main data and merge with neighbor data
use "main.dta", clear
merge m:1 nb1 year using "nb1.dta", keep(master match) nogen
merge m:1 nb2 year using "nb2.dta", keep(master match) nogen

isid id year, sort
list, sepby(id)

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2129
#4

01 May 2016, 11:20

Thanks Sebastian and Robert. Your suggestions are better than my brute force attempts. Much appreciated.

It does seem like Stata should allow double indexing for panel data sets so I could just reference x[nbd1,year]. At least that's what the computer programmer still lurking somewhere in me thinks.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

01 May 2016, 12:05

The programmer lurking in me since the late 1960's remembers having access to no more than a single record from of each several input files ("tapes") at a time and thus finds Robert's solution comfortingly familiar. The only index I knew was in the back of the FORTRAN manual.
1 like
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

02 May 2016, 12:11

If you conflate the id with the year, you can use rangestat (from SSC) to locate the observation that falls in the degenerate interval [nb1_year,nb1_year]. To install rangestat, type in Stata's Command window:

Code:

ssc install rangestat

A solution would look something like:

Code:

clear
input id year nb1 nb2 x
1 2000 2 3 100
1 2001 2 3 110
1 2002 2 3 120
2 2000 1 4 200
2 2001 1 4 210
2 2002 1 4 220
3 2000 4 . 300
3 2001 4 . 310
3 2002 4 . 320
4 2000 3 2 400
4 2001 3 2 410
4 2002 3 2 420
end

* create an observation id that conflates id and year
gen double bigid = id * 10000 + year

* get the value of x from the observation that falls in the interval
gen double idyear = nb1 * 10000 + year
rangestat (min) x_nb1 = x, interval(bigid idyear idyear)

* with -rangestat-, missing values exclude observations so use a value
* will never match
replace idyear = nb2 * 10000 + year
replace idyear = 0 if mi(idyear)
rangestat (min) x_nb2 = x, interval(bigid idyear idyear)

list, sepby(id)

Comment

Meng Zhang

Join Date: May 2016

Posts: 15
#7

11 May 2016, 06:41

Hi everyone,

My panel data have 66 countries, 20 industries and time 2006-2014, but my denpend_var is country-industry-year,that is OFDI_ikt, there only one origin country China, and 66 host countries. I want to find out the determinants of China's outward foreign direct investment.

Firstly, I tell the stata that this is a panel.

egen id=group(country industry)
xtset id year
xtreg lnofdi_ikt lngdp_i lngdp_china countryskill_i industryskill_k countryskill×industryskill_ik fta bits ruleoflaw i.year, cluster(industry)

Is it right to cluster with industry, not the panel id ?

Best Regards,

Meng

Last edited by Meng Zhang; 11 May 2016, 06:43.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#8

11 May 2016, 09:41

Dear Meng: your question has nothing to do with the topic of this thread. Start a new topic on the main forum page: http://www.statalist.org/forums/foru...ussion/general

Last edited by Steve Samuels; 11 May 2016, 09:44.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment

Announcement

Generating neighbors' values for panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment