Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging with missing identifying variables

    Hi!

    I am trying to merge two datasets. In the master dataset, there are a number of missing observations in the identifying variable. When I try to merge, the error specifying that the identifying variable does not uniquely identify observations in master data came up. In the past, I have manually changed these missing values to unique missing value like .a and .b. But, this solution is not ideal when there are many (i.e. hundreds) of missing values for the identifying variable. Any ideas?

    Thank you,
    Vanessa

  • #2
    If they are missing on the merge variable nothing will be merged so why do you want to keep them?

    Comment


    • #3
      Vanessa:
      welcome to the list.
      I would fix the missingness, first (if feasible).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        I guess if you really want to keep them and you have numerical variables for the identifying variables, you can generate arbitrary values for the missing values that are outside the values in your good merging observations. Then recode them to missing after the merge.

        Comment


        • #5
          This does not seem to have been resolved (the suggestion of arbitrarily assigning unique incremental values to missing identifying variables may work but becomes cumbersome with large datasets). Has anyone figured out how to ignore missing identifying variables when merging?

          Currently if identifying_variable = . for many rows in master and using datasets, then 1:1 merge fails because "variables identifying_variable does not uniquely identify observations"

          Comment


          • #6
            Originally posted by Jonathan Afilalo View Post
            Currently if identifying_variable = . for many rows in master and using datasets, then 1:1 merge fails because "variables identifying_variable does not uniquely identify observations"
            Probably then you may consider a combination of merge and append. frames makes this easier.

            Code:
            tempfile using append
            clear
            input float var1 byte var3
            1 10
            2 20
            3 30
            . 40
            . 50
            . 60
            end
            frame put * if missing(var1), into(append1)
            drop if missing(var1)
            save `using'
            
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float var1 str1 var2
            1 "a"
            2 "b"
            3 "c"
            . "d"
            4 "e"
            . "f"
            end
            frame put * if missing(var1), into(append2)
            drop if missing(var1)
            merge 1:1 var1 using `using', nogen
            frame append1: save `append', replace
            append using `append'
            frame append2: save `append', replace
            append using `append'
            frame drop append1
            frame drop append2
            Res.:

            Code:
            . l, sep(0)
            
                 +--------------------+
                 | var1   var2   var3 |
                 |--------------------|
              1. |    1      a     10 |
              2. |    2      b     20 |
              3. |    3      c     30 |
              4. |    4      e      . |
              5. |    .            40 |
              6. |    .            50 |
              7. |    .            60 |
              8. |    .      d      . |
              9. |    .      f      . |
                 +--------------------+

            Comment


            • #7
              Hi,

              I am trying to merge two datasets. household contains one line per each hhid. ETHConsum contains 73 repetitions of the same hhid. I used the code:

              ************************************************** *****************************
              *Merging two datasets via household_Id**********************************
              ************************************************** *****************************

              use "C:\Users\household.dta", clear
              sort household_id
              encode household_id, generate (hhid)
              save"C:\User\household.dta", replace


              use "C:\Users\ETHConsum.dta", clear
              sort household_id
              encode household_id, generate (hhid)
              sort hhid
              collapse (sum) quantityKG, by (household_id item_cd saq14)
              save"C:\Users\\ETHConsum.dta", replace

              tab hhid
              merge 1:1 hhid using "C:\Users\household.dta"

              If somebody could help me identifying what the error is, I would be grateful
              it is my first time posting here

              Juliana

              Comment


              • #8
                Hi,

                I am trying to merge two datasets. household contains one line per each hhid. ETHConsum contains 73 repetitions of the same hhid. I used the code:

                Code:
                *******************************************************************************
                *Merging two datasets via household_Id**********************************
                *******************************************************************************
                
                use "C:\Users\household.dta", clear
                sort household_id
                encode household_id, generate (hhid)
                save"C:\User\household.dta", replace
                
                
                use "C:\Users\ETHConsum.dta", clear
                sort household_id
                encode household_id, generate (hhid)
                sort hhid
                collapse (sum) quantityKG, by (household_id item_cd saq14)
                save"C:\Users\\ETHConsum.dta", replace
                
                tab hhid
                merge 1:1 hhid using "C:\Users\household.dta"
                If somebody could help me identifying what the error is, I would be grateful
                it is my first time posting here

                Juliana

                Comment


                • #9
                  Originally posted by Juliana Minetto View Post
                  Hi,

                  I am trying to merge two datasets. household contains one line per each hhid. ETHConsum contains 73 repetitions of the same hhid. I used the code:

                  Code:
                  *******************************************************************************
                  *Merging two datasets via household_Id**********************************
                  *******************************************************************************
                  
                  use "C:\Users\household.dta", clear
                  sort household_id
                  encode household_id, generate (hhid)
                  save"C:\User\household.dta", replace
                  
                  
                  use "C:\Users\ETHConsum.dta", clear
                  sort household_id
                  encode household_id, generate (hhid)
                  sort hhid
                  collapse (sum) quantityKG, by (household_id item_cd saq14)
                  save"C:\Users\\ETHConsum.dta", replace
                  
                  tab hhid
                  merge 1:1 hhid using "C:\Users\household.dta"
                  Never encode variables in separate datasets and then merge on the encoded variable. Unless the variables in both datasets have exactly the same levels, encode will assign different values to the same levels in each of the datasets. There is no problem merging using a string variable, then encode the variable after merging. On your error, you need either


                  Code:
                  merge m:1 household_id ...
                  or


                  Code:
                  merge 1:m household_id ...
                  depending on the order of your master and using datasets. If this is not useful, read FAQ Advice #12 on how to present a data example using the dataex command.

                  Comment

                  Working...
                  X