Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • county adjacency

    Hello Statalist,
    I needed some help on how to go about achieving the following:
    I have a county year panel data that looks like the following:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double state_county float(activity_year county_xyz_score)
    1001 2002 .3209993
    1001 2002 .3209993
    1001 2002 .3209993
    1001 2002 .3209993
    1001 2002 .3209993
    1001 2002 .3209993
    1001 2002 .3209993
    1003 2002 .2289018
    1003 2002 .2289018
    1003 2002 .2289018
    1003 2002 .2289018
    Then I have the county adjacency file that list the neighbors (contiguous) of a given county: (the number of neighbors may obviously be different for different counties) in the following manner:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long county str37 n_name long n_county
    1001 "Autauga County, AL" 1001
    . "Chilton County, AL" 1021
    . "Dallas County, AL" 1047
    . "Elmore County, AL" 1051
    . "Lowndes County, AL" 1085
    . "Montgomery County, AL" 1101
    1003 "Baldwin County, AL" 1003
    . "Clarke County, AL" 1025
    . "Escambia County, AL" 1053
    . "Mobile County, AL" 1097
    . "Monroe County, AL" 1099
    . "Washington County, AL" 1129
    . "Escambia County, FL" 12033
    What I wanted is to create one more variable in my panel say "County Z score" that standardizes ( i.e.it contains the output of the following calculation: county_xyz_score (c,t) - average of county_xyz_score across all contiguous counties (t) / std deviation county_xyz_score across all contiguous counties (t)) "County XYZ score" among contiguous counties.
    Last edited by Vats Prith; 31 Jul 2018, 09:00.

  • #2
    Your example data doesn't seem quite right to me, and it complicates helping you:

    1) Only one year is shown. Do you actually have a panel with multiple years? If so, please repost an example with multiple years.
    2) You have multiple entries with identical data for each year. Is this intended?

    (Small item: Note that the "end" command and the \CODE markers are necessary features of a -dataex- example.)

    I believe that a combination of merge and expand will put your data into a useful so-called long format, but showing you that is difficult with the example you provided.

    Comment


    • #3
      My bad.

      a) Oh well yes the years going from 2002 through 2015 for each county.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double state_county float(activity_year county_xyz_score)
      1001 2002   .3209993
      1001 2003   .3126251
      1001 2004   .3124779
      1001 2005   .2987726
      1001 2006   .2328533
      1001 2007    .189686
      1001 2008  .18497716
      1001 2009   .1963434
      1001 2010   .2033112
      1001 2011   .1923633
      1001 2012   .2170411
      1001 2013   .1837994
      1001 2014   .2182138
      1001 2015   .2158565
      1003 2002   .2289018
      1003 2003   .2252383
      1003 2004  .21015207
      1003 2005  .20284607
      1003 2006   .1633894
      1003 2007  .18195954
      1003 2008  .22226396
      1003 2009  .17913736
      1003 2010   .1822641
      1003 2011   .2191114
      1003 2012  .25743288
      1003 2013  .26408195
      1003 2014  .26883194
      1003 2015  .27880597
      1005 2002  .06851458
      1005 2003  .08236194
      1005 2004  .07272892
      1005 2005  .06147835
      1005 2006  .05597391
      1005 2007  .05928341
      1005 2008   .0797313
      1005 2009  .11142473
      1005 2010  .11336334
      1005 2011  .11576447
      1005 2012  .13491791
      1005 2013  .15256736
      1005 2014  .15857905
      1005 2015    .162516
      1007 2002  .18359493
      1007 2003  .15015703
      1007 2004  .18698786
      1007 2005  .07901558
      1007 2006  .07738449
      1007 2009  .06204833
      1007 2010  .06331446
      1007 2011 .063444346
      1007 2012  .06391565
      1007 2013   .1426799
      1007 2014   .1696554
      1007 2015  .14589266
      1009 2002  .20827615
      1009 2003    .206558
      1009 2004  .19695267
      1009 2005   .1891255
      1009 2006  .10935654
      1009 2007  .11959002
      1009 2008  .10967658
      1009 2009  .08359557
      1009 2010  .09795253
      1009 2011 .069807164
      1009 2012  .15509714
      1009 2013   .1605351
      1009 2014  .16562954
      1009 2015   .1554921
      1013 2002   .3163028
      1013 2003  .30743375
      1013 2004   .2968839
      1013 2005   .2447439
      1013 2006   .2294982
      1013 2007   .1303356
      1013 2008  .10278104
      1013 2009  .17033716
      1013 2010  .18520455
      1013 2011  .18560126
      1013 2012  .17257658
      1013 2013   .1510959
      1013 2014    .237792
      1013 2015   .2351737
      1015 2002   .2482731
      1015 2003  .25525665
      1015 2004   .2520648
      1015 2005   .2493305
      1015 2006   .1861142
      1015 2007   .2215804
      1015 2008  .21639498
      1015 2009   .1660421
      1015 2010  .14949854
      1015 2011    .197847
      1015 2012  .21987846
      1015 2013  .21824928
      1015 2014   .2125866
      1015 2015  .20582367
      1017 2002  .06637359
      1017 2003  .07248068
      1017 2004   .0962803
      1017 2005  .07816491
      end
      [/CODE]

      The second dataset remains the same

      Comment


      • #4
        You'll need to create an indicator variable (0 or 1) for every county that determines if an observation is adjacent to that county.
        gen adj_county_xyz=1 if county_abc is adjacent to county_xyz (repeat this for every county)
        You'll then need to create a variable with the score of each county, but only fill it when adj_county_xyz is 1
        gen adj_county_score_xyz=county_xyz_score if adj_county_xyz==1 (repeat this for every county as well)
        You can then use egen to create the standard deviation and mean variables
        egen sd_county_abc=rowsd(adj_county_score_*)
        egen mean_county_abc=rowmean(adj_county_score_*)
        Then use a simple generate to do the math
        gen zscore_county_abc=(county_abc_score-mean_county_abc)/sd_county_abc
        Last edited by Jay Westreich; 31 Jul 2018, 09:26.

        Comment


        • #5
          joinby works better for me thanks

          Comment


          • #6
            Perhaps you have solved this, but I would note that there's still a problem with your example data, namely that xyz score are lacking for the neighbors. For example, counties 1021, 1047, 1051 are all neighbors, but you don't provide xyz scores for them. To my discredit, it took me quite a while of running into strange problems before I figured out that your example data appears not workable, unless I'm making some kind of mistake. I wish you and I had noticed that early on. Anyway, per the FAQ, it would be nice if you would post your solution to bring closure to the thread and provide an example for future users. I'd be interested to see how you chose to use -joinby- here.

            My approach would be this:
            1) construct a file with county-neighbor-year as the observations
            2) construct a file with county-year-xyz as the observations
            3) use 2 to merge neighbor xyz data onto 1
            4) use the egen mean and standard deviation functions, by county and year, to get the relevant neighbor summaries you want.

            Comment


            • #7
              I am not sure if the data I provided was "that inadequate" , or that the "want of an exact numerical value next to the neighbor counties could be that much of show stopper" as we really are talking about the concept on how to go about solving this and any xyz score can be assumed for 1021 1047 1051 that are neighbors of 1001. .(the data is sorted by state_county activityyear).may be It would have been better had I not posted the data, but then someone or the other would have asked me to post the data. However, as always am I grateful to senior folks who decide to voluntarily help and spend their value time in helping folks like me in need. As for how I got joinby to work, A) rename state_county to n_county in the county year panel. ( one that contains xyz score) and B.) rename county to state_county in the adjacency file. . C) join by using n_county the panel file and the adjacency file.D) Duplicates drop. E) Sort by state_county the joined file. (which now comes from the adjacency file) and activity_year F)bysort state_county activity_year: the usual statistical steps using egen to get to z by state_county activity_year.

              Comment

              Working...
              X