county adjacency

Vats Prith

Join Date: Jul 2017
Posts: 30

31 Jul 2018, 08:26

Hello Statalist,
I needed some help on how to go about achieving the following:
I have a county year panel data that looks like the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double state_county float(activity_year county_xyz_score)
1001 2002 .3209993
1001 2002 .3209993
1001 2002 .3209993
1001 2002 .3209993
1001 2002 .3209993
1001 2002 .3209993
1001 2002 .3209993
1003 2002 .2289018
1003 2002 .2289018
1003 2002 .2289018
1003 2002 .2289018

Then I have the county adjacency file that list the neighbors (contiguous) of a given county: (the number of neighbors may obviously be different for different counties) in the following manner:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long county str37 n_name long n_county
1001 "Autauga County, AL" 1001
. "Chilton County, AL" 1021
. "Dallas County, AL" 1047
. "Elmore County, AL" 1051
. "Lowndes County, AL" 1085
. "Montgomery County, AL" 1101
1003 "Baldwin County, AL" 1003
. "Clarke County, AL" 1025
. "Escambia County, AL" 1053
. "Mobile County, AL" 1097
. "Monroe County, AL" 1099
. "Washington County, AL" 1129
. "Escambia County, FL" 12033

What I wanted is to create one more variable in my panel say "County Z score" that standardizes ( i.e.it contains the output of the following calculation: county_xyz_score (c,t) - average of county_xyz_score across all contiguous counties (t) / std deviation county_xyz_score across all contiguous counties (t)) "County XYZ score" among contiguous counties.

Last edited by Vats Prith; 31 Jul 2018, 09:00.

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2411
#2

31 Jul 2018, 09:03

Your example data doesn't seem quite right to me, and it complicates helping you:

1) Only one year is shown. Do you actually have a panel with multiple years? If so, please repost an example with multiple years.
2) You have multiple entries with identical data for each year. Is this intended?

(Small item: Note that the "end" command and the \CODE markers are necessary features of a -dataex- example.)

I believe that a combination of merge and expand will put your data into a useful so-called long format, but showing you that is difficult with the example you provided.
Comment

Vats Prith

Join Date: Jul 2017
Posts: 30

31 Jul 2018, 09:15

My bad.

a) Oh well yes the years going from 2002 through 2015 for each county.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double state_county float(activity_year county_xyz_score)
1001 2002   .3209993
1001 2003   .3126251
1001 2004   .3124779
1001 2005   .2987726
1001 2006   .2328533
1001 2007    .189686
1001 2008  .18497716
1001 2009   .1963434
1001 2010   .2033112
1001 2011   .1923633
1001 2012   .2170411
1001 2013   .1837994
1001 2014   .2182138
1001 2015   .2158565
1003 2002   .2289018
1003 2003   .2252383
1003 2004  .21015207
1003 2005  .20284607
1003 2006   .1633894
1003 2007  .18195954
1003 2008  .22226396
1003 2009  .17913736
1003 2010   .1822641
1003 2011   .2191114
1003 2012  .25743288
1003 2013  .26408195
1003 2014  .26883194
1003 2015  .27880597
1005 2002  .06851458
1005 2003  .08236194
1005 2004  .07272892
1005 2005  .06147835
1005 2006  .05597391
1005 2007  .05928341
1005 2008   .0797313
1005 2009  .11142473
1005 2010  .11336334
1005 2011  .11576447
1005 2012  .13491791
1005 2013  .15256736
1005 2014  .15857905
1005 2015    .162516
1007 2002  .18359493
1007 2003  .15015703
1007 2004  .18698786
1007 2005  .07901558
1007 2006  .07738449
1007 2009  .06204833
1007 2010  .06331446
1007 2011 .063444346
1007 2012  .06391565
1007 2013   .1426799
1007 2014   .1696554
1007 2015  .14589266
1009 2002  .20827615
1009 2003    .206558
1009 2004  .19695267
1009 2005   .1891255
1009 2006  .10935654
1009 2007  .11959002
1009 2008  .10967658
1009 2009  .08359557
1009 2010  .09795253
1009 2011 .069807164
1009 2012  .15509714
1009 2013   .1605351
1009 2014  .16562954
1009 2015   .1554921
1013 2002   .3163028
1013 2003  .30743375
1013 2004   .2968839
1013 2005   .2447439
1013 2006   .2294982
1013 2007   .1303356
1013 2008  .10278104
1013 2009  .17033716
1013 2010  .18520455
1013 2011  .18560126
1013 2012  .17257658
1013 2013   .1510959
1013 2014    .237792
1013 2015   .2351737
1015 2002   .2482731
1015 2003  .25525665
1015 2004   .2520648
1015 2005   .2493305
1015 2006   .1861142
1015 2007   .2215804
1015 2008  .21639498
1015 2009   .1660421
1015 2010  .14949854
1015 2011    .197847
1015 2012  .21987846
1015 2013  .21824928
1015 2014   .2125866
1015 2015  .20582367
1017 2002  .06637359
1017 2003  .07248068
1017 2004   .0962803
1017 2005  .07816491
end

[/CODE]

The second dataset remains the same

Comment

Jay Westreich

Join Date: Jun 2018

Posts: 12
#4

31 Jul 2018, 09:19

You'll need to create an indicator variable (0 or 1) for every county that determines if an observation is adjacent to that county.
gen adj_county_xyz=1 if county_abc is adjacent to county_xyz (repeat this for every county)
You'll then need to create a variable with the score of each county, but only fill it when adj_county_xyz is 1
gen adj_county_score_xyz=county_xyz_score if adj_county_xyz==1 (repeat this for every county as well)
You can then use egen to create the standard deviation and mean variables
egen sd_county_abc=rowsd(adj_county_score_*)
egen mean_county_abc=rowmean(adj_county_score_*)
Then use a simple generate to do the math
gen zscore_county_abc=(county_abc_score-mean_county_abc)/sd_county_abc

Last edited by Jay Westreich; 31 Jul 2018, 09:26.
Comment
Vats Prith

Join Date: Jul 2017

Posts: 30
#5

31 Jul 2018, 11:33

joinby works better for me thanks
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#6

31 Jul 2018, 13:48

Perhaps you have solved this, but I would note that there's still a problem with your example data, namely that xyz score are lacking for the neighbors. For example, counties 1021, 1047, 1051 are all neighbors, but you don't provide xyz scores for them. To my discredit, it took me quite a while of running into strange problems before I figured out that your example data appears not workable, unless I'm making some kind of mistake. I wish you and I had noticed that early on. Anyway, per the FAQ, it would be nice if you would post your solution to bring closure to the thread and provide an example for future users. I'd be interested to see how you chose to use -joinby- here.

My approach would be this:
1) construct a file with county-neighbor-year as the observations
2) construct a file with county-year-xyz as the observations
3) use 2 to merge neighbor xyz data onto 1
4) use the egen mean and standard deviation functions, by county and year, to get the relevant neighbor summaries you want.
Comment
Vats Prith

Join Date: Jul 2017

Posts: 30
#7

10 Aug 2018, 07:54

I am not sure if the data I provided was "that inadequate" , or that the "want of an exact numerical value next to the neighbor counties could be that much of show stopper" as we really are talking about the concept on how to go about solving this and any xyz score can be assumed for 1021 1047 1051 that are neighbors of 1001. .(the data is sorted by state_county activityyear).may be It would have been better had I not posted the data, but then someone or the other would have asked me to post the data. However, as always am I grateful to senior folks who decide to voluntarily help and spend their value time in helping folks like me in need. As for how I got joinby to work, A) rename state_county to n_county in the county year panel. ( one that contains xyz score) and B.) rename county to state_county in the adjacency file. . C) join by using n_county the panel file and the adjacency file.D) Duplicates drop. E) Sort by state_county the joined file. (which now comes from the adjacency file) and activity_year F)bysort state_county activity_year: the usual statistical steps using egen to get to z by state_county activity_year.
Comment

Announcement