Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging statefip and county codes in combined form

    I nned to merge my statefip code and county code in a specific form.

    Like if the statefip = 1 and county code = 9, then the combined state_county code should look like = 1009
    If the statefip = 1 and county code = 21, then the combined state_county code should look like = 1021
    If the statefip = 1 and county code = 132, then the combined state_county code should look like = 1132.

    May I kindly have some idea how I can do that? I have almost 60,000 observations, and doing this manually will take up a lot of time. I'm giving a sample of my data here

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte statefip int county
    1   1
    1   3
    1   5
    1   7
    1   9
    1  11
    1  13
    1  15
    1  17
    1  19
    1  21
    1  23
    1  125
    1  127
    1  129
    end

  • #2
    Code:
    gen state_county = real(string(statefip) + string(county,"%03.0f"))
    Last edited by Øyvind Snilsberg; 06 Jul 2022, 02:02.

    Comment


    • #3
      Why would you want to do this? I've seen people do this before and I don't understand why. For me, whenever I'm working with sub national data, I always have the state, region, county, whatever set of identifiers as separate variables.

      Comment


      • #4
        Code:
        gen state_county = statefip*1000 + county
        Or if you are merging to Census data with a 5-digit FIPS state-county code stored as a string with leading zero for state codes 1-9 you might prefer
        Code:
        gen state_county = string(statefip*1000 + county,"%05.0f")
        Last edited by William Lisowski; 06 Jul 2022, 07:41.

        Comment


        • #5
          In my master data ( IPUMS CPS data ) county looks like this. In numeric format
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input long county
          0
          0
          1
          23
          23
          123
          end

          But, with the data I'm going to merge with my master data ( IPUMS CPS) , I have county as string ( str5 ) following the formula below:

          Code:
          gen county = string(statefip*1000 + countyfip,"%05.0f")
          And, after using the snippet of cpde above my county data looks like in this format

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str5 county
          "01001"
          "01003"
          "01005"
          "01007"
          "01009"
          "01011"
          "01013"
          "01015"
          "01017"
          For merging, I need to keep consistent with master data ( numeric format like above ), but despite encoding and destring I couldn't convert the str5 into the format I was looking for like 1005, 1023, 1123 , and so on. After applying the encode and destring command it turns into numeric but it appears as 1 , 2 , 3 , 4 , and so on.

          After applying destring or encode my data looks like following. It appears in chronlogical order , and doesn't appear as the true code of county anymore. how can I merge it keeping it consistent with my master data ?

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input long county
            3
            4
            5
            6
            7
            8
            9
           10
           11
           12

          Comment


          • #6
            In my master data statefip and county appear as following.

            dataex statefip county if county==4013


            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte statefip long county
            4 4013
            4 4013
            4 4013
            4 4013
            4 4013
            4 4013
            4 4013
            4 4013
            4 4013
            4 4013
            Last edited by Tariq Abdullah; 06 Jul 2022, 13:38.

            Comment


            • #7
              Your "county" above is already a combination of county and state. You can convert it into a 5-digit string with
              Code:
              gen state_county = string(county,"%05.0f")
              You do not show us what your state and county data look like in the IPUMS CPS data.

              Please use dataex to present examples of the geographic identifiers that are in the IPUMS CPS data, and the geographic identifiers that are in your data - in both cases as they were before you started trying to create matching codes so you could merge - and we can then tell you what to create in your data that will match the identifiers in the IPUMS CPS data.

              And, an important lesson here for you to take to heart.

              After applying ... encode my data looks like following. It appears in chronlogical order , and doesn't appear as the true code of county anymore
              The encode command is designed for assigning numerical codes to non-numeric strings like "France", "Germany", "United States". The output of help encode instructs us

              Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
              That's why I ask that you prepare examples of the data you had before you started trying to solve the problem.
              Last edited by William Lisowski; 06 Jul 2022, 13:32.

              Comment


              • #8
                In my master data ( IPUMS CPS ) statefip and county appear as following.

                dataex statefip county if county==4013


                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte statefip long county
                4 4013
                4 4013
                4 4013
                4 4013
                4 4013
                4 4013
                4 4013
                4 4013
                4 4013
                4 4013
                I apolgize for the unintentional mistake. Since, with my master data I did already a lot of analysis, therefore it would benefit me if I can keep the format of county in master file as it has been so far. And, to serve the purpose of merging with other data, if I can know how I can change the format of county in other data keeping consistency with the master data , that'll really help me.
                Last edited by Tariq Abdullah; 06 Jul 2022, 13:43.

                Comment


                • #9
                  This is the geographic identifiers in my other data. In the command above, I gave the geographic identifiers of my master data ( IPUMS CPS )
                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input byte statefip int county
                  1   1
                  1   3
                  1   5
                  1   7
                  1   9
                  1  11
                  1  13
                  1  15
                  1  17
                  1  19
                  1  21
                  1  23
                  1  25
                  2   1
                  2   2
                  2   3
                  2 999
                  4   1
                  4   3
                  4   5
                  4   7
                  4   9
                  4  11
                  4  12
                  4  13
                  end

                  Comment


                  • #10
                    Here is an example based on your data in posts 8 and 9.
                    Code:
                    // set up invented example datasets
                    
                    // master dataset from IPUMS CPS
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input byte statefip long county float age
                    1 1001 10
                    1 1003  5
                    1 1003 15
                    1 1007 12
                    1 1007  3
                    1 1009 42
                    end
                    save master, replace
                    
                    // other data to merge
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input byte statefip int county float whatever
                    1   1  10000
                    1   3  50000
                    1   5 100000
                    1   7  30000
                    1   9 200000
                    1 999     42
                    end
                    save other, replace
                    
                    // the work starts here
                    
                    use master, clear
                    // save the current county variable
                    rename county state_county
                    // create a county variable that matches to the other dataset
                    generate county = mod(state_county,1000)
                    // merge
                    merge m:1 statefip county using other
                    list, clean abbreviate(16)
                    Code:
                    . list, clean abbreviate(16)
                    
                           statefip   state_county   age   county   whatever           _merge  
                      1.          1           1001    10        1      10000      Matched (3)  
                      2.          1           1003     5        3      50000      Matched (3)  
                      3.          1           1003    15        3      50000      Matched (3)  
                      4.          1           1007    12        7      30000      Matched (3)  
                      5.          1           1007     3        7      30000      Matched (3)  
                      6.          1           1009    42        9     200000      Matched (3)  
                      7.          1              .     .        5     100000   Using only (2)  
                      8.          1              .     .      999         42   Using only (2)
                    Last edited by William Lisowski; 06 Jul 2022, 14:34.

                    Comment


                    • #11
                      Thanks so much for this very helpful code. It did the trick smoothly, but I've faced a new problem where it says :

                      variables year statefip county do not uniquely identify observations in the using data. Since this is panel data I need to merge on basis of year state and county. So far my using data is not showing any sign of that in the same year a particular county from a specific state has repeated observation. Then, I don't understand why it's saying that variables year statefip county do not uniquely identify observations in the using data.

                      Comment


                      • #12
                        Just for clarification my master data is repeated cross-section, and as far as my knowledge my other data ( using data ) has been created by me as panel data ( though I tried using the xtset year statefip county to check if its panel or not but that's too many variables)

                        Comment


                        • #13
                          To prove it, do
                          Code:
                          isid year statefip county
                          in your using data

                          Comment


                          • #14
                            Because Stata is never wrong and there are duplicated observations in your using data.

                            I suggest you use your using data so it is in memory and then
                            Code:
                            sort year statefip county
                            duplicates tag year statefip county, gen(flag)
                            browse if flag
                            That will show you the offending observations and then you can figure out what needs to be done.

                            Comment


                            • #15
                              thanks everyone for this such kind and helpful feedback ! From here, I'll be able to deal with it. Again, much obliged for such patience !

                              Comment

                              Working...
                              X