Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping 2017 US Census of Governments data from long format to a wide format

    Hi all,

    I am new to manipulating data in Stata, so I am hoping that someone here might be able to offer some advice. I am trying to reshape 2017 US Census of Governments data from long format to a wide format. I have been reading about the reshape command during the last few days but have yet to wrap my mind around a way to make it work with these data. My hope is to have one row for each government unit (denoted by the "name" column) that retains the variation in "census_id," "county," "fips_state," "fips_county" and "fips_place." I am aware that this will create a large number of columns in the data, though I am unsure if that will be a complicating factor.

    I have attached a subset of the data that I believe are illustrative of the issue. Washington Township in Ohio (included in the example data) has the most rows in the dataset by far.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double census_id str30 name str13 unit_type str2 state_ab str12 county byte fips_state int fips_county long fips_place
    17301801400000 "TOWNSHIP OF PLEASANT VALLEY"    "3 - TOWNSHIP"  "KS" "COWLEY"       20  35 56500
    17302002000000 "TOWNSHIP OF PLEASANT VALLEY"    "3 - TOWNSHIP"  "KS" "DECATUR"      20  39 56525
    17302800500000 "TOWNSHIP OF PLEASANT VALLEY"    "3 - TOWNSHIP"  "KS" "FINNEY"       20  55 56550
    17307301500000 "TOWNSHIP OF PLEASANT VALLEY"    "3 - TOWNSHIP"  "KS" "PAWNEE"       20 145 56600
    17308501100000 "TOWNSHIP OF PLEASANT VALLEY"    "3 - TOWNSHIP"  "KS" "SALINE"       20 169 56625
    17310301100000 "TOWNSHIP OF PLEASANT VALLEY"    "3 - TOWNSHIP"  "KS" "WILSON"       20 205 56650
    17301100900000 "TOWNSHIP OF PLEASANT VIEW"      "3 - TOWNSHIP"  "KS" "CHEROKEE"     20  21 56675
    17207800900000 "CITY OF PLEVNA"                 "2 - MUNICIPAL" "KS" "RENO"         20 155 56725
    17307802200000 "TOWNSHIP OF PLEVNA"             "3 - TOWNSHIP"  "KS" "RENO"         20 155 56750
    17307401800000 "TOWNSHIP OF PLUM"               "3 - TOWNSHIP"  "KS" "PHILLIPS"     20 147 56775
    17309900900000 "TOWNSHIP OF PLUMB"              "3 - TOWNSHIP"  "KS" "WABAUNSEE"    20 197 56800
    17306201500000 "TOWNSHIP OF PLUM CREEK"         "3 - TOWNSHIP"  "KS" "MITCHELL"     20 123 56825
    17300802000000 "TOWNSHIP OF PLUM GROVE"         "3 - TOWNSHIP"  "KS" "BUTLER"       20  15 56850
    17308400900000 "TOWNSHIP OF PLYMOUTH"           "3 - TOWNSHIP"  "KS" "RUSSELL"      20 167 56925
    17203000300000 "CITY OF POMONA"                 "2 - MUNICIPAL" "KS" "FRANKLIN"     20  59 57000
    17303001300000 "TOWNSHIP OF POMONA"             "3 - TOWNSHIP"  "KS" "FRANKLIN"     20  59 57025
    17207100500000 "CITY OF PORTIS"                 "2 - MUNICIPAL" "KS" "OSBORNE"      20 141 57100
    17305400700000 "TOWNSHIP OF POTOSI"             "3 - TOWNSHIP"  "KS" "LINN"         20 107 57175
    17107507500000 "COUNTY OF POTTAWATOMIE"         "1 - COUNTY"    "KS" "POTTAWATOMIE" 20 149 99149
    17301601200000 "TOWNSHIP OF POTTAWATOMIE"       "3 - TOWNSHIP"  "KS" "COFFEY"       20  31 57200
    17303001400000 "TOWNSHIP OF POTTAWATOMIE"       "3 - TOWNSHIP"  "KS" "FRANKLIN"     20  59 57225
    17307501300000 "TOWNSHIP OF POTTAWATOMIE"       "3 - TOWNSHIP"  "KS" "POTTAWATOMIE" 20 149 57250
    36307901900000 "TOWNSHIP OF WARWICK"            "3 - TOWNSHIP"  "OH" "TUSCARAWAS"   39 157 81060
    36108408400000 "COUNTY OF WASHINGTON"           "1 - COUNTY"    "OH" "WASHINGTON"   39 167 99167
    36300601300000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "AUGLAIZE"     39  11 81074
    36300701300000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "BELMONT"      39  13 81088
    36300801600000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "BROWN"        39  15 81102
    36301001400000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "CARROLL"      39  19 81116
    36301301200000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "CLERMONT"     39  25 81130
    36301401100000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "CLINTON"      39  27 81144
    36301501500000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "COLUMBIANA"   39  29 81158
    36301602100000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "COSHOCTON"    39  31 81172
    36301901800000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "DARKE"        39  37 81186
    36302001200000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "DEFIANCE"     39  39 81200
    36302501800000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "FRANKLIN"     39  49 81242
    36303001600000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "GUERNSEY"     39  59 81256
    36303201700000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "HANCOCK"      39  63 81284
    36303301500000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "HARDIN"       39  65 81298
    36303401500000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "HARRISON"     39  67 81312
    36303501300000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "HENRY"        39  69 81326
    36303601600000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "HIGHLAND"     39  71 81340
    36303701100000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "HOCKING"      39  73 81354
    36303801400000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "HOLMES"       39  75 81368
    36304001200000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "JACKSON"      39  79 81382
    36304401300000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "LAWRENCE"     39  87 81396
    36304502600000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "LICKING"      39  89 81410
    36304601600000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "LOGAN"        39  91 81424
    36304801200000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "LUCAS"        39  95 81438
    36305401400000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "MERCER"       39 107 81452
    36305501200000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "MIAMI"        39 109 81466
    36305601700000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "MONROE"       39 111 81480
    36305701300000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "MONTGOMERY"   39 113 81494
    36305901500000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "MORROW"       39 117 81508
    36306002400000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "MUSKINGUM"    39 119 81522
    36306301200000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "PAULDING"     39 125 81536
    36306501400000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "PICKAWAY"     39 129 81550
    36306801200000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "PREBLE"       39 135 81564
    36307001600000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "RICHLAND"     39 139 81578
    36307201000000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "SANDUSKY"     39 143 81592
    36307301600000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "SCIOTO"       39 145 81606
    36307501400000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "SHELBY"       39 149 81620
    36307601700000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "STARK"        39 151 81634
    36307902000000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "TUSCARAWAS"   39 157 81648
    36308001300000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "UNION"        39 159 81662
    36308101000000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "VAN WERT"     39 161 81676
    36308301000000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "WARREN"       39 165 81690
    36308701800000 "TOWNSHIP OF WASHINGTON"         "3 - TOWNSHIP"  "OH" "WOOD"         39 173 81704
    36202400500000 "CITY OF WASHINGTON COURT HOUSE" "2 - MUNICIPAL" "OH" "FAYETTE"      39  47 81718
    36201501200000 "VILLAGE OF WASHINGTONVILLE"     "2 - MUNICIPAL" "OH" "COLUMBIANA"   39  29 81732
    36308402000000 "TOWNSHIP OF WATERFORD"          "3 - TOWNSHIP"  "OH" "WASHINGTON"   39 167 81774
    end
    I hope I have provided enough information here - if not, please let me know and I will do my best to provide more information or clarification. Thanks much for reading, and for any advice that you may be able to provide.

    Best wishes,
    Matt

  • #2
    I am not entirely sure what you mean by retaining "the variation in "census_id," "county," "fips_state," "fips_county" and "fips_place.", but this will create one row for each government unit denoted by the "name" column:

    Code:
    sort name
    by name: gen namecnt=sum(1)
    reshape wide c* u s f*, i(name) j(namecnt)

    Comment


    • #3
      Hi Emil,

      Thanks so much for your response! Unfortunately that doesn't quite work...though it does seem it is heading in the right direction.

      What I meant by retaining "the variation in "census_id," "county," "fips_state," "fips_county" and "fips_place" " was that I don't want to lose this information when reshaping into wide; I want to create columns for each separate value of them.

      One potential snag that I failed to mention regarding creating a j variable (in your example, "namecnt") is that there are repeated place names across states. So, for example, states other than Ohio may also have a "Township of Washington" as well.

      I hope that helps to clarify the things that were unclear in my original post.

      Thanks again,
      Matt
      Last edited by Matt Barnett; 13 Mar 2023, 10:47.

      Comment


      • #4
        To my understanding, the code Emil Alnor offered *does* produce multiple different variables, e.g. census_id1, census_id2 ....., county1, county2, ... . So, I'd suggest you explain what you want that's different than what his code does for you.

        Regarding the presence of names sometimes being the same across state, that should be fixable by including "state_ab" in the i() specification and changing the sort and reshape accordingly:

        Code:
        sort name state_ab
        by name state_ab: gen namecnt = _n // Emil's numbering of observations was ok, too.
        reshape wide c* u f*, i(name) j(namecnt)
        Two side notes:

        1) Your census_ids are close to the limit of precision of a double variable, and seem to have a lot of extraneous zeros. I'd store those census_ids as string variables. In fact, I think there are some U.S. census identifiers that contain up to 17 distinctive digits, and if your census_id variable includes such, you may already have some ids where meaningful information has been lost. Checking whether the data was brought into Stata correctly in this respect would be worthwhile I have never worked with those long census_ids, so I could be wrong in my thinking here. Further, because in the immediate instance you're relying on place name and state and not actually using census_id, that doesn't matter right now, but might matter elsewhere.

        2) All this being said, there's a good chance that the wide layout isn't desirable for the analyses and manipulations you want to do with this data, so knowing *why* you want to "go wide" might help people offer you better advice, advice that goes beyond your immediate goal. My guess is that you want to construct variables that summarize information at the place name level, encompassing several observations in some instances, but contrary to what you may have experienced in other data management programs, that's likely to be easier in Stata if you stay with the long layout.

        Comment


        • #5
          Thanks both of you!

          The problems that I was running into stemmed from some of the state abbreviations being mislabeled in the Census of Governments data, which I am now in the process of correcting.

          I am converting to wide for the purposes of "fuzzy matching" with another dataset (otherwise I would wind up having to do a m:m merge, which I understand is practically always a bad idea) - but I didn't want to lose any of the FIPS codes or county names in the process.

          Thanks again!

          Comment


          • #6
            The fact of a desire to do a fuzzy match doesn't make me think of a wide layout being better; I'd actually guess it to be worse. I'd suggest explain which variable(s) you want to match on, and I'd suggest framing your question in terms of the goal rather than the means of getting there. Merging m:m will not be useful. -joinby- *might* be, but I'd withhold judgment until we know substantively how you want things to match up. That really takes your question in a new direction, but I think that might be more useful to you in the end than the current focus.

            Comment

            Working...
            X