Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two datasets by same county name (missing counties in one set, different IDs

    Hi all, this is my first post here and also new to Stata so apologies if not clear!

    I'm trying to merge the two datasets below - one lists just two variables (county name "mun" and county ID "cvegeo"), and the other is agricultural production data by county.

    I want to assign the county ID variable "cvegeo" in the first dataset to the county name in the agricultural data set using the common county name variable, "mun".

    At first, I tried encoding "mun" in both datasets and merging on that new numeric ID, but I am missing 21 counties from the agriculture dataset that are included in the cvegeo one (2305 vs. 2326), so the numeric values assigned to "mun" are different between sets.

    Is it possible for me to merge based strictly on the actual words of the county name (i.e. "aguascalientes", "asientos", etc. - and probably "force" if there are any spelling discrepancies)? Or a way I can identify the 21 counties that are in the cvegeo dataset and drop them so when I encode "mun" the numeric values are the same between sets (apart from manually)? Different government agencies use different ID systems for counties, but the cvegeo code is one of the most commonly used identifiers across datasets I'm looking at, so I expect similar issues to keep popping up.

    Thank you for your help!

    Best,
    Katie

    CVEGEO:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str77 mun long cvegeo
    "AGUASCALIENTES" 1001
    "ASIENTOS"       1002
    "CALVILLO"       1003
    "COSÍO"         1004
    "JESÚS MARÍA"  1005
    end
    AGRICULTURAL PRODUCTION:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year byte idstate str22 state int idmun str51 mun byte idcycle str16 agseason byte idwater str8 water long idmeasure str14 measure long idcrop str33 crop float(sown harvest wrecked prod yield rurprice) double mexprodval
    2018 1 "Aguascalientes" 1 "Aguascalientes" 1 "Otoño-Invierno"  1 "Riego" 200201 "Tonelada"  5490000 "Avena forrajera en verde"     1258 1258 0   28861 22.94 647.73    18694136
    2018 1 "Aguascalientes" 1 "Aguascalientes" 1 "Otoño-Invierno"  1 "Riego" 200201 "Tonelada"  5900000 "Cebada forrajera en verde"      50   50 0    1389 27.78    600      833400
    2018 1 "Aguascalientes" 1 "Aguascalientes" 1 "Otoño-Invierno"  1 "Riego" 200201 "Tonelada"  9090000 "Triticale forrajero en verde"  235  235 0 7735.73 32.92 598.39   4628983.5
    2018 1 "Aguascalientes" 1 "Aguascalientes" 1 "Otoño-Invierno"  1 "Riego" 200201 "Tonelada" 15050000 "Pastos y praderas"             595  595 0 29809.5  50.1 594.48    17721152
    2018 1 "Aguascalientes" 1 "Aguascalientes" 2 "Primavera-Verano" 1 "Riego" 200201 "Tonelada"  5490000 "Avena forrajera en verde"       36   36 0   965.7 26.83 580.84 560917.1875
    end
    Last edited by Katie Campbell; 08 Apr 2020, 16:37.
Working...
X