Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging zero observation while matching master with using data based on two variables

    My master dataset has one variable county which I'm generating ( using - gen county1 = county- command. Changed the number in master data so that you can merge with using. But, that's the command I used ) in a different name as county1 to match with my using data. My master data looks like the following with county and the same variable in a different name county1

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long county float county1
    10001 24003
    10003 42101
    20045 20045
    20045 20045
    11001 51107
    21045 21045
    20047 20047
    20047 20045
    12009 12095
    12019 12031
    end
    My using data has three columns. Column 1 is a fixed amount of counties ( 100 ) out of all the census counties. Column 2 is the minimum distance of each county from any of those fixed numbers of counties in column 1. Column 3 is all the census counties available. Therefore, using data is identified uniquely based on county1(from column1 of using data ) and county ( column 3 of using data). Therefore, I regenerated the county1 variable in my master data to match with using data uniquely.

    My using data looks like following

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float county1 double mi_to_county long county
    24003  57.19865840354873 10001
    42101  40.40149007443556 10003
    24003 25.296911561215097 11001
    51107  35.55806556932951 11001
    12031 39.019664192183974 12003
    12031  40.99176960439295 12007
    12095  40.69641745688267 12009
    12086 40.307247788423894 12011
    12071 22.085989388972475 12015
    12031  27.12930958102275 12019
    13063  32.12455074883614 13217
    13063  59.35042862727376 13219
    13063  36.45950620886324 13223
    end
    After merging I'm getting the following result. Is this happening because of long and float format of county and county1? I tried changing both of them to float using recast but eventually one of them automatically changes to long format.

    Code:
     Result                      Number of obs
        -----------------------------------------
        Not matched                    25,098,392
            from master                25,098,392  
            from using                          0  
    
        Matched                                 0  
        -----------------------------------------
    Last edited by Tariq Abdullah; 12 Jul 2022, 23:57.

  • #2
    Tariq,
    You did not show the code used in merging the two files. Try the this,
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long county float county1
    10001 24003
    10003 42101
    20045 20045
    20045 20045
    11001 51107
    21045 21045
    20047 20047
    20047 20045
    12009 12095
    12019 12031
    end
    duplicates drop
    isid county1 county
    save A, replace
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float county1 double mi_to_county long county
    24003 57.19865840354873 10001
    42101 40.40149007443556 10003
    24003 25.296911561215097 11001
    51107 35.55806556932951 11001
    12031 39.019664192183974 12003
    12031 40.99176960439295 12007
    12095 40.69641745688267 12009
    12086 40.307247788423894 12011
    12071 22.085989388972475 12015
    12031 27.12930958102275 12019
    13063 32.12455074883614 13217
    13063 59.35042862727376 13219
    13063 36.45950620886324 13223
    end
    isid county1 county
    merge 1:1 county1 county using A
    keep if _merge==3

    Comment


    • #3
      Dropping the duplicates is not an option since there is no duplicate. In my using data county county1 - both together identify a unique observation. When I'm using the command - duplicates drop this is what I'm getting

      Code:
      duplicates drop
      
      Duplicates in terms of all variables
      
      (0 observations are duplicates)
      That's why uniquely identifying the using observation I'm doing the following merging command:

      Code:
      merge m:1 county1 county using "distance.dta", keep(master match) nogen

      Comment


      • #4
        Can you explain why you chose to present county and county1 in blue in your example of your master dataset shown in post #1?

        Added in edit: It appears that the blue was provided by my web browser, which interpreted the two numbers as some sort of telephone number.

        Perhaps it would help if you were to present the output of
        Code:
        describe county county1
        from your master dataset and also from your using dataset.
        Last edited by William Lisowski; 13 Jul 2022, 06:09.

        Comment


        • #5
          The following is my master. I generated the county1 variable in my master data ( using - gen county1 = county- command.)to match with using data uniquely.

          Code:
          describe county county1
          
          Variable      Storage   Display    Value
              name         type    format    label      Variable label
          ------------------------------------------------------------------
          county          long    %12.0g              * FIPS county code
          county1         float   %9.0g               *
          
           notes list county1
          
          county1:
            1.  County7, 8 COUNTY 3 30 A/N
            2.  001-507 County or equivalent area code
            3.  510-840 Independent city code
          The following is from my using data

          Code:
           describe county county1
          
          Variable      Storage   Display    Value
              name         type    format    label      Variable label
          --------------------------------------------------------------------------
          county          long    %10.0g              * County7, 8
          county1         float   %10.0g              * County7, 8
          Last edited by Tariq Abdullah; 13 Jul 2022, 10:22.

          Comment


          • #6
            This is again the sample of master data for your convenience.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long county float county1
            10001 24003
            10003 42101
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            20045 20045
            12019 12031
            end

            Comment


            • #7
              The merge command shown in post #3 is not the merge command that produced the output in post #1.

              Running your merge command in post #3 using as input the master data in post #6 and the using data in post #1 produces three matches, as shown in the first merge command below.

              However, if I change the keep() option on the merge command from keep(master match) to keep(master) I get results similar to what you show in post #1. And that is no surprise, because keep(master) means to retain only those observations that appeared on the master dataset and were not matched on the using dataset.
              Code:
              clear all
              cls
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input long county float county1
              10001 24003
              10003 42101
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              20045 20045
              12019 12031
              end
              save "~/Downloads/master", replace
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float county1 double mi_to_county long county
              24003  57.19865840354873 10001
              42101  40.40149007443556 10003
              24003 25.296911561215097 11001
              51107  35.55806556932951 11001
              12031 39.019664192183974 12003
              12031  40.99176960439295 12007
              12095  40.69641745688267 12009
              12086 40.307247788423894 12011
              12071 22.085989388972475 12015
              12031  27.12930958102275 12019
              13063  32.12455074883614 13217
              13063  59.35042862727376 13219
              13063  36.45950620886324 13223
              end
              save "~/Downloads/distance", replace
              
              use "~/Downloads/master", clear
              merge m:1 county1 county using "~/Downloads/distance.dta", keep(master match) nogen
              
              use "~/Downloads/master", clear
              merge m:1 county1 county using "~/Downloads/distance.dta", keep(master) nogen
              Code:
              . use "~/Downloads/master", clear
              
              . merge m:1 county1 county using "~/Downloads/distance.dta", keep(master match) nogen
              
                  Result                      Number of obs
                  -----------------------------------------
                  Not matched                            11
                      from master                        11  
                      from using                          0  
              
                  Matched                                 3  
                  -----------------------------------------
              
              . 
              . use "~/Downloads/master", clear
              
              . merge m:1 county1 county using "~/Downloads/distance.dta", keep(master) nogen
              
                  Result                      Number of obs
                  -----------------------------------------
                  Not matched                            11
                      from master                        11  
                      from using                          0  
              
                  Matched                                 0  
                  -----------------------------------------
              
              .

              Comment


              • #8
                You are totally right. It's just I was looking at the data in a wrong way. Much appreciate this valuable feedback !

                Comment

                Working...
                X