Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unique Geographic Identifiers from Pooled Data Samples

    Hello,

    I am using pooled data from selected samples of DHS data. Each sample(country) has a set of unique regions - sometimes ranging from 2 to 10. Given that these regions are different from country to country, I want to create a single variable that harmonizes all the region codes across the sample pool.

    Below is the dofile I am using to generate the harmonized regionID.



    Code:
    set more off
    
    * Create variable ct that is a string of the country value (e.g., Nigeria, Senegal)
    decode(country), gen(ct)
    levelsof ct, local(ctstring)
    
    
    * Create a temporary variable that combines all the country-specific region codes.
    egen region_temp = rowmax(geo_*)
    
    * Use the temporary variable you created above, plus the country variable, to
    * create a unique number for each region in your pooled dataset.  
    gen subnational = country*100 + region_temp
    
    * The next command creates string variables for each geographic code.
    * Find the geo_ variables in your variable list. Substitute your first and last
    * geo variables for geoalt_ke2014 and geo_ug2016, respectively:
    foreach var of varlist geo_ao2015_2015 - geo_ls2004_2014 {
        decode `var', gen(`var'str)
    }
    
    * The following code create idregion, a single variable with values for every sample.  
    egen region_label_t_gen = concat(geo*str)
    gen region_label_gen = ct + " " + substr(region_label_t_gen,1,100)
    egen idregion = group(region_label_gen)
    
    * This look attaches the proper region labels to regionid.
    sort subnational
    lab def subnational_gen 1 "temp", replace
    levelsof(idregion), local(levels)
    
    foreach 1 of local levels {
        gen temp = ""
        replace temp = region_label_gen if idregion *== `1'
        levelsof(temp), local(templabel)
        lab def reg_label_gen `1' `templabel', modify
        drop temp
    }
    
    label values idregion reg_label_gen
    label variable idregion "Subnational regions"
    
    * Get rid of the temporary variables
    drop subnational geo*str region_temp region_label_t_gen region_label_gen
    
    fre idregion



    When I run it, I get the following error:



    Code:
    . foreach 1 of local levels {
      2.         gen temp = ""
      3.         replace temp = region_label_gen if idregion *== `1'
      4.         levelsof(temp), local(templabel)
      5.         lab def reg_label_gen `1' `templabel', modify
      6.         drop temp
      7. }
    (55 missing values generated)
    idregion* invalid name
    r(198);
    
    end of do-file
    
    r(198);



    Thanks for your attention - CY


    Below is Sample of my data from dataex - note the number of regions in the extract is limited to 5, though it is more in the full dataset.



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double sample float country byte(geo_ao2015_2015 geo_bi2010_2016 geo_ch2014_2014 geo_cm2004_2011 geo_cg2014_2014 geo_et2000_2016 geo_gb2012_2012 geo_gh1988_2014 geo_gu2018_2018 geo_ke1989_2014 geo_ls2004_2014) long v001 int(v002 v003)
     2401  24  2 .  . . .  . . . . .  .  57    8  5
     2401  24  9 .  . . .  . . . . .  . 542    7  2
     2401  24 14 .  . . .  . . . . .  . 174   18  6
     2401  24 14 .  . . .  . . . . .  .  61   14  2
     2401  24 18 .  . . .  . . . . .  . 608    7  1
    10803 108  . 1  . . .  . . . . .  . 140 8259  3
    10803 108  . 3  . . .  . . . . .  . 488 4492  2
    10803 108  . 3  . . .  . . . . .  . 549 5502  2
    10803 108  . 4  . . .  . . . . .  . 153 5157  2
    10803 108  . 4  . . .  . . . . .  . 358 5234  2
    12004 120  . .  . 5 .  . . . . .  . 550   17  1
    12004 120  . .  . 5 .  . . . . .  .  64   24  1
    12004 120  . .  . 6 .  . . . . .  . 105   15  1
    12004 120  . .  . 8 .  . . . . .  . 165   10  4
    12004 120  . .  . 8 .  . . . . .  . 467   22  1
    14803 148  . . 19 . .  . . . . .  . 523    6  9
    23104 231  . .  . . .  3 . . . .  . 327  212  2
    23104 231  . .  . . .  4 . . . .  . 518  264  2
    23104 231  . .  . . .  7 . . . .  . 223  451  3
    23104 231  . .  . . .  7 . . . .  . 466  166  5
    23104 231  . .  . . . 10 . . . .  . 211  385  2
    28806 288  . .  . . .  . . 3 . .  . 148   20  2
    28806 288  . .  . . .  . . 4 . .  . 410   19  1
    28806 288  . .  . . .  . . 4 . .  . 176    6  2
    28806 288  . .  . . .  . . 6 . .  . 236   10  1
    28806 288  . .  . . .  . . 7 . .  .  12   30  2
    32404 324  . .  . . .  . . . 1 .  . 362   70  3
    32404 324  . .  . . .  . . . 1 .  . 158   58  3
    32404 324  . .  . . .  . . . 3 .  . 244   30  1
    32404 324  . .  . . .  . . . 5 .  .  21   49  1
    32404 324  . .  . . .  . . . 6 .  .  61   23  2
    42603 426  . .  . . .  . . . . .  2 149  181  2
    42603 426  . .  . . .  . . . . .  3  72  185  1
    42603 426  . .  . . .  . . . . .  3 240  208  2
    42603 426  . .  . . .  . . . . .  5 350  208  3
    42603 426  . .  . . .  . . . . . 10 170  186  3
    99901 999  . .  . . .  . 3 . . .  . 146    9  4
    18002 180  . .  . . .  . . . . .  . 149    1  1
    99901 999  . .  . . .  . . . . .  .  17   13  2
    14803 148  . .  . . .  . . . . .  . 365    9 10
    18002 180  . .  . . .  . . . . .  .  18   27  4
    38403 384  . .  . . .  . . . . .  . 273    4  1
    38403 384  . .  . . .  . . . . .  .  54    3  4
    14803 148  . .  . . .  . . . . .  . 277   14  4
    18002 180  . .  . . .  . . . . .  . 253    3  1
    38403 384  . .  . . .  . . . . .  . 248   47  7
    99901 999  . .  . . .  . . . . .  . 304   12  3
    18002 180  . .  . . .  . . . . .  . 211   22  2
    18002 180  . .  . . .  . . . . .  . 137   11  2
    99901 999  . .  . . .  . . . . .  . 113   15  2
    14803 148  . .  . . .  . . . . .  . 361    3  3
    14803 148  . .  . . .  . . . . .  . 617   19  2
    99901 999  . .  . . .  . . . . .  . 332    8  5
    38403 384  . .  . . .  . . . . .  . 238   47  2
    38403 384  . .  . . .  . . . . .  . 338    5  1
    end
    label values sample sample_lbl
    label def sample_lbl 2401 "Angola 2015", modify
    label def sample_lbl 10803 "Burundi 2016", modify
    label def sample_lbl 12004 "Cameroon 2011", modify
    label def sample_lbl 14803 "Chad 2014", modify
    label def sample_lbl 18002 "Congo Democratic Republic 2013-14", modify
    label def sample_lbl 23104 "Ethiopia 2016", modify
    label def sample_lbl 28806 "Ghana 2014", modify
    label def sample_lbl 32404 "Guinea 2018", modify
    label def sample_lbl 38403 "Cote d'Ivoire 2011", modify
    label def sample_lbl 42603 "Lesotho 2014", modify
    label def sample_lbl 99901 "Gabon 2012", modify
    label values country COUNTRY
    label def COUNTRY 24 "angola", modify
    label def COUNTRY 108 "burundi", modify
    label def COUNTRY 120 "cameroon", modify
    label def COUNTRY 148 "chad", modify
    label def COUNTRY 180 "congo democratic republic", modify
    label def COUNTRY 231 "ethiopia", modify
    label def COUNTRY 288 "ghana", modify
    label def COUNTRY 324 "guinea", modify
    label def COUNTRY 384 "cote d'ivoire", modify
    label def COUNTRY 426 "lesotho", modify
    label def COUNTRY 999 "gabon", modify
    label values geo_ao2015_2015 GEO_AO2015
    label def GEO_AO2015 2 "zaire", modify
    label def GEO_AO2015 9 "benguela", modify
    label def GEO_AO2015 14 "namibe", modify
    label def GEO_AO2015 18 "bengo", modify
    label values geo_bi2010_2016 GEO_BI2010_2016
    label def GEO_BI2010_2016 1 "bujumbura mairie", modify
    label def GEO_BI2010_2016 3 "cankuzo, gitega, karusi, muramvya, ruyigi", modify
    label def GEO_BI2010_2016 4 "bubanza, bujumbura rural,  bururi, cibitoke, makamba, mwaro, rutana, rumonge", modify
    label values geo_ch2014_2014 MV024
    label def MV024 19 "barh el gazal", modify
    label values geo_cm2004_2011 GEO_CM2004_2011
    label def GEO_CM2004_2011 5 "extrãªme-nord", modify
    label def GEO_CM2004_2011 6 "littoral", modify
    label def GEO_CM2004_2011 8 "nord-ouest", modify
    label values geo_cg2014_2014 mv024
    label values geo_gb2012_2012 mv024
    label def mv024 3 "haut-ogoou�", modify
    label values geo_et2000_2016 GEO_ET2000_2016
    label def GEO_ET2000_2016 3 "amhara", modify
    label def GEO_ET2000_2016 4 "oromia", modify
    label def GEO_ET2000_2016 7 "southern nations, nationalities and peoples", modify
    label def GEO_ET2000_2016 10 "addis ababa", modify
    label values geo_gh1988_2014 GEO_GH1988_2014
    label def GEO_GH1988_2014 3 "greater accra", modify
    label def GEO_GH1988_2014 4 "volta", modify
    label def GEO_GH1988_2014 6 "ashanti", modify
    label def GEO_GH1988_2014 7 "brong-ahafo", modify
    label values geo_gu2018_2018 MV101
    label def MV101 1 "Boke", modify
    label def MV101 3 "faranah", modify
    label def MV101 5 "kindia", modify
    label def MV101 6 "labe", modify
    label values geo_ke1989_2014 GEO_KE1989_2014
    label values geo_ls2004_2014 GEO_LS2004_2014
    label def GEO_LS2004_2014 2 "leribe", modify
    label def GEO_LS2004_2014 3 "berea", modify
    label def GEO_LS2004_2014 5 "mafeteng", modify
    label def GEO_LS2004_2014 10 "thaba tseka", modify
    label values v003 LINENO
    label def LINENO 1 "1", modify
    label def LINENO 2 "2", modify
    label def LINENO 3 "3", modify
    label def LINENO 4 "4", modify
    label def LINENO 5 "5", modify
    label def LINENO 6 "6", modify
    label def LINENO 7 "7", modify
    label def LINENO 9 "9", modify
    label def LINENO 10 "10", modify
    Last edited by Yawo Kokuvi; 31 May 2020, 00:08. Reason: Further explanation about regional codes

  • #2
    please ignore post.

    I caught the error - there was an errant * in the variable name within the loop.

    Thanks for any attention. cY

    Comment

    Working...
    X